[Bug] MiMo-V2.5-Pro fails to load with TP=16 on 2 nodes with 8 Nvidia H200 card in each: fused qkv_proj shard shape mismatch when tp_size > num_key_value_heads
May 7, 2026 ยท #24607
Python
Difficulty: Medium
Parent Repository
sgl-project/sglang
Python repository
27,500 5,782