Correctness issue when Gemma4 E2B/E4B models (KV-sharing models) training has activation_checkpointing enabled
April 7, 2026 ยท #1705
Python
Difficulty: Medium
Labels
bug
Parent Repository
NVIDIA-NeMo/Automodel
Python repository
427 124