vllm
Python Easyvllm-project/vllm
75,721 stars
15,332 forks
182 open issues
Active Apr 2026
Beginner-Friendly Issues 182
Issues tagged for new contributors
bug
performance
performance
[RFC]: Selective KV Cache offload
#39305 · Apr 8, 2026
RFC
[Feature]: Can you achieve this
#39304 · Apr 8, 2026
feature request
bug rocm
feature request
[RFC]: Handle GDN prefill kernel JIT compilation failures - seeking community input
#39287 · Apr 8, 2026
RFC
[Feature]: Qwen3.5 训练是否计划支持 Multi-Token Prediction (MTP)
#39282 · Apr 8, 2026
feature request
[Bug]: Qwen3.5 crashes when using suffix-decoding
#39271 · Apr 8, 2026
bug
[Bug]: Gemma 4 offline inference outputs gibberish
#39265 · Apr 8, 2026
bug
bug
[Bug]: vLLM 0.19.0 fails to load Cohere ASR mode
#39252 · Apr 8, 2026
bug
bug
[Feature]: Add LoRA support for Gemma4ForConditionalGeneration / Gemma 4 models
#39246 · Apr 8, 2026
good first issue feature request
[Bug]: VLLM_USE_FLASHINFER_MOE_INT4 broken
#39245 · Apr 7, 2026
bug
bug
KV cache compression via E8 lattice VQ — 10-33x with PagedAttention integration
#39241 · Apr 7, 2026
[Bug]: Qwen3.5 Text Only Model (Qwen3_5ForCausalLM)
#39231 · Apr 7, 2026
bug
[Bug]: Nemotron 3 super has corrupted output on 0.19.0, no issues on 0.18.1
#39223 · Apr 7, 2026
bug
bug
bug
[Feature]: Use torch.compile Dynamo to see full trace for model forward pass
#39215 · Apr 7, 2026
performance feature request torch.compile
FP8 MoE ep_scatter Triton illegal-address on H200 in GLM-5-FP8 prefill path
#39211 · Apr 7, 2026
bug
[Bug]: Tokenization/Generation fails for Ministral3 with documented configuration
#39207 · Apr 7, 2026
bug
[Installation]: New 0.19.0 docker build to run gemma4: transformers outdated.
#39204 · Apr 7, 2026
installation
bug
[Security] Unpinned Third-Party GitHub Action in macOS Workflow
#39199 · Apr 7, 2026
bug
[Bug]: HFValidationError when trying to run a GGUF model with quants
#39198 · Apr 7, 2026
bug
[Bug]: NCCL Error: unhandled cuda error
#39196 · Apr 7, 2026
bug
[Feature]: Tri-attention merge?
#39193 · Apr 7, 2026
feature request
[Bug]: GLM5 on B300 generates garbage output
#39179 · Apr 7, 2026
bug
installation intel-gpu
[Bug]: First request after startup is unexpectedly slow with Qwen3.5-27B-FP8
#39163 · Apr 7, 2026
bug
[Bug]: There is "rocprofiler_configure" in libtorch_cpu.so
#39162 · Apr 7, 2026
bug
[RFC][Test]: Unified Platform-Aware Test Skip Mechanism
#39158 · Apr 7, 2026
RFC
bug
bug
[Usage]: Qwen3-VL inference on video complains of lack of metadata
#38811 · Apr 2, 2026
usage
[Feature]: How to disable chat template when using vllm serve
#38809 · Apr 2, 2026
feature request
bug
usage
usage
feature request
bug
[vLLM IR] Op test & benchmark infra
#38782 · Apr 2, 2026
vllm-ir
feature request
[Feature]: General LL GEMMs with PDL Support
#38772 · Apr 2, 2026
feature request
[vLLM IR] OOT migration guide
#38765 · Apr 2, 2026
[RFC]: Per-iteration forward pass metrics with accurate engine-level timing
#38760 · Apr 1, 2026
RFC
[vLLM IR] Port RoPE ops to IR
#38756 · Apr 1, 2026
vllm-ir
[Bug]: GPT OSS Router GEMM Causing NaNs
#38754 · Apr 1, 2026
bug
[vLLM IR] Port QuantFP8 to IR op
#38745 · Apr 1, 2026
vllm-ir
[RFC][vLLM IR]: Automatically compile native impl for IR ops
#38744 · Apr 1, 2026
RFC vllm-ir
[Transformers v5] NemotronParseForConditionalGeneration
#38740 · Apr 1, 2026
help wanted good first issue
bug
[Transformers v5] ColBERTJinaRobertaModel
#38737 · Apr 1, 2026
help wanted good first issue
[Transformers v5] Tarsier2ForConditionalGeneration
#38736 · Apr 1, 2026
help wanted good first issue
[Transformers v5] Ernie4_5_VLMoeForConditionalGenerati
#38735 · Apr 1, 2026
help wanted good first issue
[Transformers v5] SarvamMLAForCausalLM
#38734 · Apr 1, 2026
help wanted good first issue
[Bug] All models hang on GB300 (SM103) with FlashInfer 0.6.7
#38729 · Apr 1, 2026
[Proposal] Topology-Aware KV Cache Compression for Memory-Efficient Inference
#38725 · Apr 1, 2026
performance
[Bug]: Bench Serve encounter utf-8 UnicodeDecodeError
#38717 · Apr 1, 2026
bug
[Bug]: RuntimeError: failed to map GGUF parameters (18288)
#38716 · Apr 1, 2026
bug
[Bug]: Error when trying to serve MiniMax 2.5 on 4 H100 nodes with 4 GPUS
#38713 · Apr 1, 2026
bug
bug
[Usage]: How to launch the Qwen3.5 service using vLLM on a V100 GPU
#38706 · Apr 1, 2026
usage
[Bug]: vLLM fails to start with LMCache + Qwen3-Coder-Next-FP8 (nightly image)
#38700 · Apr 1, 2026
bug
[Performance]: llmcompressor W8A8 Inference: decoding stage speed is lower than FP16
#38697 · Apr 1, 2026
performance
[Bug]: qwen3.5 when enable response_format json_schema outputs garbled spaces
#38696 · Apr 1, 2026
bug
feature request
[Feature]: Parity with CUDA: vLLM router should have ROCm CI
#38693 · Apr 1, 2026
feature request rocm
bug rocm
bug rocm
bug
bug
bug
bug
[Bug]: qwen 3.5 model launch get stuck for quite a long time
#38656 · Mar 31, 2026
bug
[Bug]: kimi-k2 tool parser regex is off a tiny bit
#38441 · Mar 28, 2026
bug
[Bug]: NVFP4 + MLA error during processing
#38439 · Mar 28, 2026
bug
[Installation]: torch 2.11 is not supported
#38431 · Mar 28, 2026
installation
bug
[Transformers v5] InternVL2
#38425 · Mar 28, 2026
help wanted good first issue
bug
[Bug]: vllm 0.18 kimi k2.5 way worse than h200 single node
#38406 · Mar 27, 2026
bug rocm
[Transformers v5] IsaacForConditionalGeneration
#38389 · Mar 27, 2026
help wanted good first issue
[Transformers v5] HCXVisionForCausalLM
#38387 · Mar 27, 2026
help wanted good first issue
[Transformers v5] Base model and LoRA used in test has incorrect `tokenizer_config.json`
#38386 · Mar 27, 2026
help wanted good first issue
[Transformers v5] MiniCPMV cannot apply processor
#38385 · Mar 27, 2026
help wanted good first issue
[Transformers v5] Distributed shutdown test timtout
#38384 · Mar 27, 2026
help wanted good first issue
[Transformers v5] Mistral multimodal models
#38382 · Mar 27, 2026
help wanted good first issue
bug
[Bug]: IndexError when `--renderer-num-workers` + `--mm-processor-cache-type shm`
#38375 · Mar 27, 2026
bug
bug
[Bug]: 使用swift rollout启动vllm,推理结果乱码
#38349 · Mar 27, 2026
bug
bug speculative-decoding
usage
[Bug]: microsoft/Phi-4-reasoning-vision-15B Fails to startup
#38309 · Mar 27, 2026
bug
[Feature]: support affinity settings in helm chart
#38308 · Mar 27, 2026
feature request
feature request rocm
[Bug]: minimax nvfp4 model crash
#38303 · Mar 27, 2026
bug
[Feature]: Add Rotorquant support
#38291 · Mar 26, 2026
feature request
feature request
bug
[RFC]: Multi-tier KV offloading via the vLLM offloading connector
#38260 · Mar 26, 2026
RFC
[Usage]: How to do offline inference on one rank in a distributed environment?
#38258 · Mar 26, 2026
usage
[Bug]: Qwen3-VL-235B OOM with multi-image long multiturn inputs
#38257 · Mar 26, 2026
bug
[RFC]: Incremental MoE Expert Offloading — GPU Cache + Async Pipeline
#38256 · Mar 26, 2026
[Bug]: VLLM_CPU_OMP_THREADS_BIND=nobind cannot be used with tp>1 on CPU backends
#38250 · Mar 26, 2026
bug
bug
[Bug]: After upgrading to v0.18.0, the logs no longer display token output speed
#37876 · Mar 23, 2026
bug
[Bug]: bge-m3 /pooling endpoint breaks in the latest main branch
#37868 · Mar 23, 2026
bug
[Bug]: sleep mode not releasing GPU memory
#37860 · Mar 23, 2026
bug
[Bug]: does not have the attribute 'FakeTensorMode'
#37858 · Mar 23, 2026
bug
bug
[Bug]: Qwen3-VL-Embedding-8B Image embedding failed
#37855 · Mar 23, 2026
bug
[Bug]: Phi qk_layernorm appears to be unsupported in vLLM
#37852 · Mar 23, 2026
bug
[RFC]: Unify the function of getting device count
#37849 · Mar 23, 2026
RFC
[Installation]: Documented v0.18.0 cu128 release wheel URL returns 404
#37847 · Mar 23, 2026
installation
[Bug]: 0.17.0rc1在A2部署GLM-4.7,开启MTP后工具调用异常
#37846 · Mar 23, 2026
bug
_update_request_as_session does not update max_tokens from StreamingUpdate
#37842 · Mar 23, 2026
[Feature Request] Support chat_template in tokenizer_config.json for DeepSeekV32
#37839 · Mar 23, 2026
Why is an assertion used here?
#37837 · Mar 23, 2026
[Bug] Potential incorrect tokenizer source path in RunAI object storage pull
#37836 · Mar 23, 2026
bug
[Performance]: Deepseek performance regressing with norm fusion enabled
#37832 · Mar 23, 2026
performance
[Bug]: Intel ARC 140v not supported as XE2 cutlass kernel
#37828 · Mar 22, 2026
bug intel-gpu
RFC
[Bug]: CUDA 13 LMCache KV connector install path still resolves CUDA 12 artifacts
#37801 · Mar 22, 2026
[Bug]: [OOM] DeepSeek-R1 Out of Memory
#37777 · Mar 21, 2026
bug
[Bug]: FLASHINFER_CUTLASS and FLASHINFER_TRTLLM do not work for Qwen3.5 Bf16 DP/EP
#37758 · Mar 21, 2026
bug
[Bug]: Qwen 3.5 stops working after upgrade to v0.18.0
#37749 · Mar 21, 2026
bug
[Feature]: Is there docker image support vllm and rocm 7.1+
#37748 · Mar 21, 2026
feature request rocm
[Bug] prompt_logprobs causes livelock with IsHybrid models (Qwen3.5) in DP mode
#37746 · Mar 21, 2026
bug
[Bug]: Missing logprobs for `<tool_call>` in streaming chat completions
#37737 · Mar 21, 2026
bug
[CI Failure]: Gemma3 OOMs with transformers backend
#37736 · Mar 21, 2026
rocm ci-failure
[Bug] Garbage output for long prompts after #35216
#37732 · Mar 21, 2026
bug
[Bug]: V1 engine core deadlocks under concurrent load (fp8 + prefix caching + Qwen3.5)
#37729 · Mar 21, 2026
bug
[CI Failure]: mi355_1: Quantization
#37724 · Mar 20, 2026
rocm ci-failure
installation
[CI Failure]: mi355_1: Entrypoints Integration (API Server 1)
#37710 · Mar 20, 2026
rocm ci-failure
[CI Failure]: mi325_2: Distributed Tests (2 GPUs)(H100-MI325)
#37709 · Mar 20, 2026
rocm ci-failure
[CI Failure]: mi325_1: Quantized MoE Test (B200-MI325)
#37708 · Mar 20, 2026
rocm ci-failure
[CI Failure]: mi325_1: PyTorch Compilation Passes Unit Tests
#37707 · Mar 20, 2026
ci-failure
[Bug]: Structured output crashes on CPU with pin_memory=True in apply_grammar_bitmask()
#37705 · Mar 20, 2026
[CI Failure]: mi250_1: Kernels Core Operation Test
#37704 · Mar 20, 2026
rocm ci-failure
bug
bug
[Feature]: add ParoQuant quantization
#37687 · Mar 20, 2026
feature request
[Feature]: IndexCache support for DSA models
#37684 · Mar 20, 2026
feature request
[Bug]: 使用vllm+lmcache部署glm4.7,在多并发+长上下文导致服务挂掉
#37680 · Mar 20, 2026
bug
[Bug]: deepgemm compile error
#37675 · Mar 20, 2026
bug
[Usage]: serve部署模型后,调用chat.completions输入给模型的text中image_pad token被提到了prompt开头
#37674 · Mar 20, 2026
usage
[Usage]: How to use Env in yaml
#36465 · Mar 9, 2026
usage
bug
[RFC]: vLLM IR Out-of-Tree (OOT) Kernel Registration
#36459 · Mar 9, 2026
RFC
bug
[Bug]: Unable to run Qwen3.5 on RTX5090
#36455 · Mar 9, 2026
bug
[Performance]: vLLM v0.15.0 throughput regression compared to ROCm vLLM v0.14.0
#36454 · Mar 9, 2026
performance rocm
[Bug]: Qwen3.5-35B-A3B-FP8 AssertionError
#36452 · Mar 9, 2026
bug
bug
bug
[Bug]: prefix cache bug happens when use w4a16 for GLM5
#36441 · Mar 9, 2026
bug
[Feature] Add energy consumption metrics to benchmark suite
#36440 · Mar 9, 2026
bug
[Bug]: ValueError: No user query found in messages QWEN 3.5 27B VLLM 0.16.0 NIGHTLY
#36432 · Mar 9, 2026
bug