[Bug]: Gemma 4 31B INT4 on 2×24GB GPUs (TP=2): GPU KV cache size is 25,200 tokens at max_model_len=131072, gpu_memory_utilization=0.96, BF16 KV
April 7, 2026 · #39133
Python
Difficulty: Easy
Parent Repository
vllm-project/vllm
Python repository
75,721 15,332