Mamba2Mixer: use_cache with seq_len > 1 silently produces incorrect results (both CPU and GPU paths)

May 18, 2026 · #46032
View on GitHub
Python Difficulty: Medium

Labels

bug
Ready to work on this? Walk through the full fork-to-PR workflow so your first contribution goes smoothly.

Sign in required

Authenticate to use favourites & bookmarks

5