NaN loss with SDPA attention when local_batch_size > 1 on Gemma 4 4B

April 16, 2026 ยท #1883
View on GitHub
Python Difficulty: Medium

Sign in required

Authenticate to use favourites & bookmarks

5