[Bug]: Gemma 4 31B INT4 on 2×24GB GPUs (TP=2): GPU KV cache size is 25,200 tokens at max_model_len=131072, gpu_memory_utilization=0.96, BF16 KV

April 7, 2026 · #39133
View on GitHub
Python Difficulty: Easy

Sign in required

Authenticate to use favourites & bookmarks

5