[RFC]: O(1) KV Cache for vLLM: 4.8x Speedup & 22x More Accurate than TurboQuant on Qwen2.5-7B
April 1, 2026 ยท #38694
Python
Difficulty: Medium
Labels
feature request
Parent Repository
vllm-project/vllm
Python repository
75,721 15,332