[Bug] GLM-5.1-FP8 on H200x8 with TP=8 + EAGLE speculative decoding fails CUDA graph capture: Unsupported h_q: 8 in flash_mla_with_kvcache (NSA backend)

April 8, 2026 ยท #22359
View on GitHub
Python Difficulty: Medium

Sign in required

Authenticate to use favourites & bookmarks

5