Training instability: Periodic loss fluctuations and sinusoidal grad_norm patterns in Llama 3.2 (1B/500M) pre-training

April 4, 2026 ยท #1679
View on GitHub
Python Difficulty: Medium

Labels

bug community-request

Sign in required

Authenticate to use favourites & bookmarks

5