High performance: close to roofline fp16 TensorCore (NVIDIA GPU) / MatrixCore (AMD GPU) performance on major models, including ResNet, MaskRCNN, BERT, VisionTransformer, Stable Diffusion, etc. Unified ...
Replay (rehearsal) dataset generation for mitigating catastrophic forgetting during SFT. Instead of mixing public SFT datasets (which are distributionally mismatched), this pipeline reconstructs the ...