sglang
Python Mediumsgl-project/sglang
25,412 stars
5,173 forks
94 open issues
Active Apr 2026
Beginner-Friendly Issues 94
Issues tagged for new contributors
bug high priority
[Tracking][Diffusion] Batching Design Space
#22093 · Apr 4, 2026
[Feature] Parity with CUDA - AMD when will it support DWDP based parallelism too
#22092 · Apr 4, 2026
[Feature] Distributed Weight Data Parallelism (DWDP) for Sparse MoE Models
#22084 · Apr 4, 2026
[Bug] thinking: {"type":"disabled"} is ignored in Anthropic API format
#22050 · Apr 3, 2026
[Bug] install version detect
#22034 · Apr 3, 2026
Fix multimodal cache hash collisions causing embedding corruption and DoS
#22032 · Apr 3, 2026
[Bug] [NPU] Qwen3.5 doesn't work with long-context on NPU
#22023 · Apr 3, 2026
[Bug] bench_speculative missing logprob_start_len for direct benchmark calls
#22013 · Apr 3, 2026
[Feature] Using Prefill node idle cycles for Decoding in PD disaggregation
#21995 · Apr 3, 2026
[Feature] Enable Piecewise CUDA Graph with EP
#21994 · Apr 3, 2026
[Bug] PD xgrammar bug.... Grammar accept_token failed for
#21945 · Apr 2, 2026
high priority
[Bug] [AMD] spec v2 + DP Memory access fault
#21942 · Apr 2, 2026
[Bug] QwenVLImageProcessor design is not elegant and not extensible
#21939 · Apr 2, 2026
[Bug] missing 1 required positional argument: 'attn_cp_size'
#21935 · Apr 2, 2026
[Feature] Add forward count metric for dLLM algorithms
#21926 · Apr 2, 2026
[Bug] Any model hangs at high concurrency on (G)B300 (SM103) with TRTLLM attention
#21904 · Apr 2, 2026
[Feature] Parity with Cuda: ROCm MoRI Disagg Pollara support GLM5 & Qwen3.5 yet
#21886 · Apr 2, 2026
[Bug] Extremely bad HiCache performance in containers
#21880 · Apr 1, 2026
[Investigation] Timestep Caching for Diffusion Schedulers (Low ROI)
#21879 · Apr 1, 2026
[Model] Support MiMo-Audio model
#21857 · Apr 1, 2026
[Feature][MPS] Better memory management for Apple Silicon Macs
#21443 · Mar 26, 2026
[Bug] radix cache not working deepseek v3.2
#21426 · Mar 25, 2026
Rename decode log field #transfer-req to reduce confusion
#21396 · Mar 25, 2026
[HELP] PD Disaggregation with mooncake store deploy
#21395 · Mar 25, 2026
[Bug] Flux2 Klein uses incorrect max_length=77 instead of 512 for prompt tokenization
#21372 · Mar 25, 2026
[Bug] Dead code in SamplingParams.verify(): top_k == -1 check is unreachable
#21353 · Mar 25, 2026
[diffusion][Question] How to launch_server with only a pre-edited config file?
#21352 · Mar 25, 2026
[AMD] 4-GPU tests: lowered accuracy thresholds on ROCm with triton backend
#21340 · Mar 24, 2026
amd
[Bug] IndexError in `embed_mm_inputs` when images and videos coexist with deepstack
#21327 · Mar 24, 2026
[Bug] Qwen3.5-35B has accuracy issue with flashinfer_trtllm FP8 MoE on SM100
#21317 · Mar 24, 2026
[Bug] Diffusers fallback fails with "No model info found" for unregistered models
#21311 · Mar 24, 2026
Why Nemotron Nano 30B A3B is much slower than gpt-oss 20b?
#21307 · Mar 24, 2026
[Bug] PD disaggregation can hang with total_requests load balancing
#21297 · Mar 24, 2026
[Bug] Timeout failure in PD-disaggregated DeepSeek-V3-2 deployment
#21292 · Mar 24, 2026
[Bug] GLM-5 accuracy drop on B200 with DP
#21291 · Mar 24, 2026
bug deepseek
[New Model Support] MiMo Audio
#21289 · Mar 24, 2026
[Feature] Implement IndexCache for GLM-5/DeepSeek V3.2
#21286 · Mar 24, 2026
deepseek Good Pro Issue good second issue
[PCG] Enable piecewise CUDA graph by default for VLM models
#21282 · Mar 24, 2026
TestReturnRoutedExperts flaky in ci
#21266 · Mar 24, 2026
[Bug] /pause_generation and /continue_generation wrong for --tokenizer-worker-num > 1
#21235 · Mar 23, 2026
Comparison of TTFT latency between cache_aware and round_robin strategies.
#21199 · Mar 23, 2026
[Bug] fix pp for qwen3_5 (KeyError when reading params)
#21184 · Mar 23, 2026
_handle_abort_req crashes with KeyError when request already completed
#21173 · Mar 23, 2026
Grammar accept_token failure leaves request in corrupted state in running batch
#21171 · Mar 23, 2026
handle_embedding_request does not abort over-length requests before enqueuing
#21168 · Mar 23, 2026
[Bug] KV Cache upload metrics are currently returning null/empty
#21167 · Mar 23, 2026
[Bug] I found that the flow control in sgl-model-gateway does not seem to be working.
#21163 · Mar 23, 2026
[RFC][Diffusion] Intel Auto-Round x SGL Diffusion Quantization Support (2026 H1)
#21159 · Mar 23, 2026
feat(moe): CUTLASS and Marlin MoE backends only support gated SiLU activation
#21149 · Mar 22, 2026
[Bug] Does SGLang have good optimization support for the B300 server?
#21105 · Mar 21, 2026
[NPU] Cache-DiT fails to work on Wan2.2 with 16 NPUs
#21095 · Mar 21, 2026
[Bug] flashinfer gdn kernel not working
#21085 · Mar 21, 2026
good first issue
[Bug]: GLM5 FP8: AMD current gen MI355 slower than last gen H200
#21071 · Mar 21, 2026
Benchmark: SGLang vs. vLLM Scaling under High Concurrency
#21061 · Mar 21, 2026
[Bug] Qwen3.5-4B produces garbled output with TP=2
#21039 · Mar 20, 2026
[Bug] resume_memory_occupation crashes on Blackwell with InferenceMode error
#21036 · Mar 20, 2026
[Feature] Ask if there is a CUDA 13.1-based docker image for SGLang
#21033 · Mar 20, 2026
[Bug] [Diffusion] Z-Image-Turbo only works with some resolutions when sharding
#21021 · Mar 20, 2026
[Bug] Model loading time for WAN 2.2
#21000 · Mar 20, 2026