[Bug] FlashInfer + MTP speculative decoding crashes on SM121 (DGX Spark) with GQA=16 model
March 21, 2026 ยท #37754
Python
Difficulty: Easy
Parent Repository
vllm-project/vllm
Python repository
75,721 15,332
vllm-project/vllm
Python repository
Sign in required
Authenticate to use favourites & bookmarks