Training instability: Periodic loss fluctuations and sinusoidal grad_norm patterns in Llama 3.2 (1B/500M) pre-training
April 4, 2026 ยท #1679
Python
Difficulty: Medium
Labels
bug community-request
Parent Repository
NVIDIA-NeMo/Automodel
Python repository
427 124