NaN loss with SDPA attention when local_batch_size > 1 on Gemma 4 4B
April 16, 2026 ยท #1883
Python
Difficulty: Medium
Parent Repository
NVIDIA-NeMo/Automodel
Python repository
447 133
NVIDIA-NeMo/Automodel
Python repository
Sign in required
Authenticate to use favourites & bookmarks