vllm-omni
Python Mediumvllm-project/vllm-omni
4,180 stars
718 forks
121 open issues
Active Apr 2026
Beginner-Friendly Issues 121
Issues tagged for new contributors
bug
bug
[Bug]: qwen3-tts
#2597 · Apr 8, 2026
bug
[Bug]: Qwen3 TTS max_model_len is greater than the derived max_model_len
#2595 · Apr 8, 2026
bug
[New Model]: VoxCPM2
#2594 · Apr 8, 2026
help wanted good first issue new model
[RFC]: support DiT MM online FP8 quantization on NPU
#2592 · Apr 8, 2026
[New Model]: happyhorse-1
#2591 · Apr 8, 2026
new model
bug ci-failure
bug
[Bug]: ignore_eos not taking effect for benchmarks / OpenAI requests
#2578 · Apr 8, 2026
bug
[RFC]: vLLM-Omni XPU 2026 Q2 Roadmap
#2570 · Apr 7, 2026
help wanted Hardware Plugin roadmap
bug
[Bug]: Fish Speech S2 Pro: How to achieve voice consistency, without cloning.
#2552 · Apr 7, 2026
bug
bug help wanted
[Tracking] Follow-ups for Omni Sleep Mode (#2022)
#2545 · Apr 7, 2026
[Feature]: Support LoRA-based inference acceleration for CosyVoice3 in vLLM-Omni
#2543 · Apr 7, 2026
[Bug]: --distributed-executor-backend option is ignored for diffusion models
#2539 · Apr 7, 2026
bug
[RFC]: CacheDiT Refactor
#2535 · Apr 7, 2026
[Bug]: SD3 Doesn't Handle Dtypes Correctly
#2525 · Apr 6, 2026
bug
bug
bug
[RFC]: Refactoring audio_in_video implementation
#2469 · Apr 3, 2026
[RFC]: Optimize the HY-Video1.5 performance
#2468 · Apr 3, 2026
[Feature]: TP MistralEncoder for FLUX.2
#2464 · Apr 2, 2026
[New Model]: LongCat-AudioDiT (Meituan) — Waveform Latent Space Diffusion TTS
#2462 · Apr 2, 2026
help wanted good first issue new model tts
[New Model]: Gemma 4 from Google Deepmind
#2460 · Apr 2, 2026
new model
[Bug]:shm_broadcast.py raise 'cancelled' error on serving BAGEL-7B-MoT
#2440 · Apr 2, 2026
bug
[Bug]: QWen3-TTS, 0.6B-Custom generate audio with some noise in the audio.
#2439 · Apr 2, 2026
bug
[RFC]: Continuous Quantization Support for NPU
#2438 · Apr 2, 2026
help wanted good first issue NPU high priority diffusion
[RFC]: Unify Rotary Position Embedding Implementations Across Models
#2436 · Apr 2, 2026
help wanted
bug ci-failure
[Doc]: commit id `d781902ce9` of vllm-ascend does not exist
#2434 · Apr 2, 2026
documentation
[Feature]: Establish baseline and profile fish-speech's performance
#2432 · Apr 1, 2026
help wanted good first issue tts
[Bug]: 从v0.16.0开始Qwen3-Omni 无法 TP=8运行了
#2421 · Apr 1, 2026
bug
bug ci-failure
[CI failure]: nightly Omni model test with H100 fails due to missing keywords
#2415 · Apr 1, 2026
bug
[RFC]: vLLM-Omni ROCm 2026 Q2 Roadmap
#2413 · Apr 1, 2026
ROCm Hardware Plugin
[Feature]: Trigger Model-Specific Performance Tests via Tags in vLLM-Omni
#2410 · Apr 1, 2026
help wanted
[Bug]: vLLM model crashes when using runai_streamer (qwen-tts)
#2408 · Apr 1, 2026
bug
[RFC]: Diffusers Backend Integration for Extended Model Coverage
#2403 · Apr 1, 2026
bug
[RFC]: Unified failure semantics and request isolation for async generation
#2392 · Apr 1, 2026
bug ci-failure
bug
bug
[RFC]: Single-Node D2D Transfer - CUDA IPC Connector
#2379 · Mar 31, 2026
[Performance]: Enable torch.compile for Qwen3-TTS code_predictor on Intel XPU
#2374 · Mar 31, 2026
[RFC]: Support Multi-branch CFG in TeaCache Hook
#2371 · Mar 31, 2026
[RFC]: L5 Reliability Test
#2366 · Mar 31, 2026
[RFC]: Add support for Pipeline Parallel
#2363 · Mar 31, 2026
[Feature]: Generate the video and then save it to S3 object storage
#2361 · Mar 31, 2026
[Feature]: The benchmark of Qwen3-TTS-12Hz-0.6B-Base is expected.
#2348 · Mar 31, 2026
[Roadmap][Feature] Support Moore Threads (MUSA) GPUs
#2347 · Mar 31, 2026
Hardware Plugin
[RFC][Draft]: Large-Scale Multi-Stage Serving Architecture for vLLM-Omni
#2336 · Mar 30, 2026
[RFC]: diffusion engine clean up
#2335 · Mar 30, 2026
help wanted high priority
bug
[RFC]: Improving Qwen3-TTS Performance on NPU
#2328 · Mar 30, 2026
NPU
bug
[New Model] Add support for VibeVoice TTS family (Realtime-0.5B and TTS-1.5B)
#2319 · Mar 30, 2026
help wanted good first issue new model tts
[Bug]: 22GB VRAM usage for 0.6B Qwen3-TTS model (2-stage architecture overhead)
#2318 · Mar 30, 2026
bug
[Bug]: Worker processes persist after docker rm -f, holding GPU memory indefinitely
#2317 · Mar 30, 2026
bug
bug
[RFC]: PagedAttention and KV Cache for Autoregressive Diffusion
#2305 · Mar 28, 2026
bug
[RFC]: Restructure vLLM-Omni Test Layout, Fixture Scope, and Support Modules
#2299 · Mar 28, 2026
help wanted high priority roadmap
[Bug]: 负载请求Qwen3-TTS online service会崩溃
#2295 · Mar 28, 2026
bug
[RFC]: Omni-modality model accuracy benchmark
#2284 · Mar 28, 2026
bug
bug
[RFC]: Pipeline Parallelism & Stream Batch for Real-Time Video Generation
#2280 · Mar 27, 2026
good first issue
[Feature]: Auto-syncing example/*/*/README.md to docs/user_guide/examples/*/*.md
#2269 · Mar 27, 2026
[Bug]: Qwen3-Omni benchmark fails
#2253 · Mar 27, 2026
bug
[Bug]: start failed with vllm-align branch
#2238 · Mar 26, 2026
bug
[RFC]: vLLM-Omni support online mxFP8 quantization for FA
#2236 · Mar 26, 2026
bug NPU
[RFC]: Plugin-based Sparse Attention Interface for DiT Modules
#2233 · Mar 26, 2026
bug NPU
[RFC]: vLLM-Omni Diffusion Module — Q2 2026 Roadmap
#2226 · Mar 26, 2026
diffusion
[RFC]: vLLM-Omni NPU 2026 Q2 Roadmap
#2223 · Mar 26, 2026
NPU
[Bug]: GLM-Image tensor size error for 512x512 and 1280x1280 input/output images
#2222 · Mar 26, 2026
bug
[Bug]:Noise in output when both USP and layerwise offloading enabled
#2218 · Mar 26, 2026
bug
[RFC]: TurboQuant — Sub-4-bit KV Cache Quantization for Long-Context Omni Models
#2215 · Mar 26, 2026
enhancement
[RFC] Streaming Video Input for Omni-Modal Real-Time Interaction
#2201 · Mar 25, 2026
[Performance]: Redundant LoRA file I/O in multi-GPU diffusion inference
#2198 · Mar 25, 2026
[Bug]: Cache Refresh Requires num_inference_steps
#2194 · Mar 25, 2026
bug
[RFC]: Add Diffusion Pipeline Protocol / Base Class
#2189 · Mar 25, 2026
[Bug]: NPU OOM Error During Offline Wan2.2 Inference in vLLM-Omni Framework
#2186 · Mar 25, 2026
bug
[Feature]: To be able to use vllm-omni with tritonserver
#2177 · Mar 25, 2026
bug
[Feature]: support sleep/wake HTTP api
#2169 · Mar 25, 2026
bug
bug
[Bug]: Failed to inference Qwen3-Omni-30B-A3B-Instruct on NPU
#2157 · Mar 25, 2026
bug
[RFC]: LyCORIS Adapter Support for Diffusion Models (LoKr, LoHa, and beyond)
#2150 · Mar 24, 2026
[RFC]: Multi-LoRA Composition for Diffusion Models
#2149 · Mar 24, 2026
[Bug]: black-forest-labs/FLUX.2-klein-9B image generation n=2 sometimes hangs
#2144 · Mar 24, 2026
bug
[New Model] PrismAudio (Video-to-Audio Generation)
#2140 · Mar 24, 2026
help wanted good first issue new model
bug
[Bug] [HunyuanImage3.0]: Text2Image quality regression after rebasing
#2127 · Mar 24, 2026
bug
bug
[RFC]: TTS Development Roadmap - Q2 2026
#2115 · Mar 24, 2026
help wanted good first issue high priority
[Bug]: started wrong stages of GLM-Image
#2113 · Mar 24, 2026
bug
[RFC]: Support Wan2.2-I2V-A14B Model in vllm-omni Multimodal Generation Framework
#2093 · Mar 23, 2026
help wanted
[New Model]: GAIR/daVinci-MagiHuman
#2084 · Mar 23, 2026
new model
[New Model]: MOVA from OpenMOSS
#2079 · Mar 23, 2026
new model
[Feature]: Resolve security vulnerability from the dependancy `gradio`
#2064 · Mar 21, 2026
bug
[Bug]: [Qwen3-Omni] When using mix-modalities, the image description is incorrect
#1990 · Mar 19, 2026
bug
[Question]: Why does the demo of hunyuan_image3 take so long to run once?
#1989 · Mar 19, 2026
bug
[New Model]: nvidia comos predict、transfer、reason
#1747 · Mar 9, 2026
new model
[Bug]: Failed with GLM-Image Online serving
#1745 · Mar 9, 2026
bug
bug
bug
[Bug]: wan2.2-14B benchmark sends requests, but the serve does not respond.
#1736 · Mar 9, 2026
bug