[Bug]: MLA attention casts activations to int32 when using Marlin FP8 on GPUs without native FP8 support (sm < 89)

March 31, 2026 · #38658

Python Difficulty: Easy

Labels

bug

Parent Repository

vllm-project/vllm

Python repository

All Issues Back to vllm

Sign in required

Authenticate to use favourites & bookmarks

5