[Bug]: CUDA assert in triton attention for MolmoWeb models (Molmo2 architecture with different max_position_embeddings)
March 31, 2026 ยท #38660
Python
Difficulty: Easy
Parent Repository
vllm-project/vllm
Python repository
75,721 15,332