[Bug] GLM-5.1-FP8 on H200x8 with TP=8 + EAGLE speculative decoding fails CUDA graph capture: Unsupported h_q: 8 in flash_mla_with_kvcache (NSA backend)
April 8, 2026 ยท #22359
Python
Difficulty: Medium
Parent Repository
sgl-project/sglang
Python repository
25,689 5,301