Mamba2Mixer: use_cache with seq_len > 1 silently produces incorrect results (both CPU and GPU paths)
May 18, 2026 · #46032
Python
Difficulty: Medium
Labels
bug
Parent Repository
huggingface/transformers
Python repository
160,724 33,246
Ready to work on this? Walk through the full fork-to-PR workflow so your first contribution goes smoothly.