sglang

[HiCache] Input length validation rejects requests that fit in L1+L2 combined capacity

#22105 · Apr 4, 2026

Medium

[Performance Regression] ~2x throughput drop in disaggregated PD mode with Wide-EP (DeepSeek-R1 FP4, GB200) between SGLang v0.5.8 and latest nightly

#22095 · Apr 4, 2026

Medium

bug high priority

[Tracking][Diffusion] Batching Design Space

#22093 · Apr 4, 2026

Medium

[Feature] Parity with CUDA - AMD when will it support DWDP based parallelism too

#22092 · Apr 4, 2026

Medium

Triton GDN kernel produces garbled text (foreign language token mixing) for dense Qwen 3.5 models

#22087 · Apr 4, 2026

Medium

[Feature] Distributed Weight Data Parallelism (DWDP) for Sparse MoE Models

#22084 · Apr 4, 2026

Medium

[Bug] EP/DP decode server hangs at startup on MI325X with Broadcom Thor 2 (bnxt_re) + MoRI a2a backend

#22072 · Apr 3, 2026

Medium

[Bug] Scheduler subprocess causes a memory leak during `torch.compile` when PyTorch versions >= 2.9

#22056 · Apr 3, 2026

Medium

[Bug] thinking: {"type":"disabled"} is ignored in Anthropic API format

#22050 · Apr 3, 2026

Medium

[Feature][RFC] SPECTRE: Parallel Speculative Decoding with a Multi-Tenant Remote Drafter

#22044 · Apr 3, 2026

Medium

[Bug] GLM model with n>1 + JSON schema returns content=None (content misrouted to reasoning_content)

#22042 · Apr 3, 2026

Medium

[Bug] install version detect

#22034 · Apr 3, 2026

Medium

Fix multimodal cache hash collisions causing embedding corruption and DoS

#22032 · Apr 3, 2026

Medium

[Bug] [NPU] Qwen3.5 doesn't work with long-context on NPU

#22023 · Apr 3, 2026

Medium

[Bug] Cache-DiT `refresh_context` crashes on second inference when `SGLANG_CACHE_DIT_SCM_PRESET` is not set

#22021 · Apr 3, 2026

Medium

[Bug] bench_speculative missing logprob_start_len for direct benchmark calls

#22013 · Apr 3, 2026

Medium

[Feature] Using Prefill node idle cycles for Decoding in PD disaggregation

#21995 · Apr 3, 2026

Medium

[Feature] Enable Piecewise CUDA Graph with EP

#21994 · Apr 3, 2026

Medium

[Bug] PD xgrammar bug.... Grammar accept_token failed for

#21945 · Apr 2, 2026

Medium

high priority

[Bug] [AMD] spec v2 + DP Memory access fault

#21942 · Apr 2, 2026

Medium

[Bug] QwenVLImageProcessor design is not elegant and not extensible

#21939 · Apr 2, 2026

Medium

[Bug] PR #21436 regression: piecewise CUDA graph crashes for NemotronH hybrid models (Mamba2 Triton kernel + cuBLAS failures)

#21938 · Apr 2, 2026

Medium

[Bug] missing 1 required positional argument: 'attn_cp_size'

#21935 · Apr 2, 2026

Medium

[Bug] Running GLM-5 with the latest dev image reports an error: RuntimeError: Unsupported h_q: 4

#21934 · Apr 2, 2026

Medium

[Feature] Add forward count metric for dLLM algorithms

#21926 · Apr 2, 2026

Medium

[Bug] --dist-timeout is only applied to init_process_group, but not propagated to NCCL subgroups

#21911 · Apr 2, 2026

Medium

[Bug] Any model hangs at high concurrency on (G)B300 (SM103) with TRTLLM attention

#21904 · Apr 2, 2026

Medium

[Feature] Eliminate redundant text encoding and VAE decoding across TP/SP/CFG rank groups

#21894 · Apr 2, 2026

Medium

[Feature] Parity with Cuda: ROCm MoRI Disagg Pollara support GLM5 & Qwen3.5 yet

#21886 · Apr 2, 2026

Medium

[Bug] Extremely bad HiCache performance in containers

#21880 · Apr 1, 2026

Medium

[Investigation] Timestep Caching for Diffusion Schedulers (Low ROI)

#21879 · Apr 1, 2026

Medium

[Bug] AttributeError: 'Qwen3_5MoeConfig' has no attribute 'num_hidden_layers' when enabling LoRA with Qwen3.5 MoE multimodal model

#21876 · Apr 1, 2026

Medium

[Bug] cudaMemcpyBatchAsync change in CUDA 13 incompatible with transfer_kv_page_first_direct_impl in sglang/sgl-kernel/csrc/kvcacheio/transfer.cu

#21869 · Apr 1, 2026

Medium

[Model] Support MiMo-Audio model

#21857 · Apr 1, 2026

Medium

[Feature] Replace memory-bound custom CUDA kernels with torch.compile generated fusions

#21855 · Apr 1, 2026

Medium

[Bug] `_pickle.PicklingError` when using `--backend diffusers` with registered native models

#21453 · Mar 26, 2026

Easy

[Feature][MPS] Better memory management for Apple Silicon Macs

#21443 · Mar 26, 2026

Easy

[Bug] radix cache not working deepseek v3.2

#21426 · Mar 25, 2026

Easy

Rename decode log field #transfer-req to reduce confusion

#21396 · Mar 25, 2026

Easy

[HELP] PD Disaggregation with mooncake store deploy

#21395 · Mar 25, 2026

Easy

[Feature Request] Native support for GLM-4 MoE (Glm4MoeForCausalLM) to prevent Transformers fallback OOM

#21389 · Mar 25, 2026

Easy

[Bug] LoRA adapter unload leaks GPU memory pool slot, causing ghost-uid slot exhaustion and eviction-policy crashes

#21380 · Mar 25, 2026

Easy

[Bug] evict_mamba crashes with AssertionError: evict leaf node invalid when leaf's KV is locked by in-flight request

#21379 · Mar 25, 2026

Easy

[Bug] Flux2 Klein uses incorrect max_length=77 instead of 512 for prompt tokenization

#21372 · Mar 25, 2026

Easy

[Bug] Dead code in SamplingParams.verify(): top_k == -1 check is unreachable

#21353 · Mar 25, 2026

Easy

[diffusion][Question] How to launch_server with only a pre-edited config file?

#21352 · Mar 25, 2026

Medium

[AMD] 4-GPU tests: lowered accuracy thresholds on ROCm with triton backend

#21340 · Mar 24, 2026

Easy

amd

[Bug] IndexError in `embed_mm_inputs` when images and videos coexist with deepstack

#21327 · Mar 24, 2026

Easy

[Bug] Qwen3.5-35B has accuracy issue with flashinfer_trtllm FP8 MoE on SM100

#21317 · Mar 24, 2026

Easy

[Bug] Diffusers fallback fails with "No model info found" for unregistered models

#21311 · Mar 24, 2026

Easy

Why Nemotron Nano 30B A3B is much slower than gpt-oss 20b?

#21307 · Mar 24, 2026

Easy

[Bug] PD disaggregation can hang with total_requests load balancing

#21297 · Mar 24, 2026

Easy

[Bug] Timeout failure in PD-disaggregated DeepSeek-V3-2 deployment

#21292 · Mar 24, 2026

Easy

[Bug] GLM-5 accuracy drop on B200 with DP

#21291 · Mar 24, 2026

Easy

bug deepseek

[New Model Support] MiMo Audio

#21289 · Mar 24, 2026

Easy

[Feature] Implement IndexCache for GLM-5/DeepSeek V3.2

#21286 · Mar 24, 2026

Easy

deepseek Good Pro Issue good second issue

[PCG] Enable piecewise CUDA graph by default for VLM models

#21282 · Mar 24, 2026

Easy

TestReturnRoutedExperts flaky in ci

#21266 · Mar 24, 2026

Easy

[Bug] `tree_speculative_sampling_target_only` produces non-deterministic results with `deterministic=True` in tensor parallel mode

#21256 · Mar 24, 2026

Easy

[Bug] SGLang produces degenerate output with Qwen2.5-Math-7B at temperature=1.0 (HuggingFace baseline is correct)

#21238 · Mar 23, 2026

Easy

[Bug] /pause_generation and /continue_generation wrong for --tokenizer-worker-num > 1

#21235 · Mar 23, 2026

Easy

[Bug] Piecewise CUDA graph replay crashes with FlashInfer ≥0.6.6: q.shape[0] does not match qo_indptr[-1] in paged prefill

#21218 · Mar 23, 2026

Easy

[Bug] EAGLE draft_forward leaks padded state across decode steps under DP attention, causing negative-dimension crash

#21210 · Mar 23, 2026

Easy

[Feature] Autotune between trtllm-gen and cute-dsl in trtllm_batch_decode_with_kv_cache_mla

#21208 · Mar 23, 2026

Medium

Comparison of TTFT latency between cache_aware and round_robin strategies.

#21199 · Mar 23, 2026

Medium

[Bug] fix pp for qwen3_5 (KeyError when reading params)

#21184 · Mar 23, 2026

Medium

[Bug] DeepSeek-V3.2 tool-call parsing exception or empty output in Disaggregated (PD) mode with SGLang 0.5.9

#21176 · Mar 23, 2026

Medium

_handle_abort_req crashes with KeyError when request already completed

#21173 · Mar 23, 2026

Medium

Grammar accept_token failure leaves request in corrupted state in running batch

#21171 · Mar 23, 2026

Easy

handle_embedding_request does not abort over-length requests before enqueuing

#21168 · Mar 23, 2026

Medium

[Bug] KV Cache upload metrics are currently returning null/empty

#21167 · Mar 23, 2026

Medium

[Bug] I found that the flow control in sgl-model-gateway does not seem to be working.

#21163 · Mar 23, 2026

Medium

[RFC][Diffusion] Intel Auto-Round x SGL Diffusion Quantization Support (2026 H1)

#21159 · Mar 23, 2026

Medium

[Bug] The cache hit rate of Qwen3.5 MambaRadixTree remains zero when i use the same prompt.

#21158 · Mar 23, 2026

Medium

feat(moe): CUTLASS and Marlin MoE backends only support gated SiLU activation

#21149 · Mar 22, 2026

Medium

[Bug] MTP speculative decoding always rejects draft tokens for NemotronH (accept_rate=0.33)

#21138 · Mar 22, 2026

Medium

[Bug] --allow-auto-truncate results in zero generation tokens when input exceeds context length

#21136 · Mar 22, 2026

Medium

EP=8 on RTX PRO 6000 Blackwell Server Edition (SM120): garbage output with FP8 and INT4 models

#21132 · Mar 22, 2026

Medium

[Bug] HY-MT1.5-7B-FP8 PCG error: mat1 and mat2 shapes cannot be multiplied (128x128 and 4096x4096)

#21127 · Mar 22, 2026

Medium

[Bug] Piecewise CUDA Graph crashes with illegal memory access on H100 (FA3 backend) during warmup_compile

#21112 · Mar 21, 2026

Medium

[Bug] Does SGLang have good optimization support for the B300 server？

#21105 · Mar 21, 2026

Medium

[NPU] Cache-DiT fails to work on Wan2.2 with 16 NPUs

#21095 · Mar 21, 2026

Medium

[Bug] flashinfer gdn kernel not working

#21085 · Mar 21, 2026

Medium

good first issue

[Bug]: GLM5 FP8: AMD current gen MI355 slower than last gen H200

#21071 · Mar 21, 2026

Medium

Benchmark: SGLang vs. vLLM Scaling under High Concurrency

#21061 · Mar 21, 2026

Medium

[Bug] Qwen3.5-4B produces garbled output with TP=2

#21039 · Mar 20, 2026

Medium

[Bug] resume_memory_occupation crashes on Blackwell with InferenceMode error

#21036 · Mar 20, 2026

Medium

[Feature] Ask if there is a CUDA 13.1-based docker image for SGLang

#21033 · Mar 20, 2026

Medium

[Bug] [Diffusion] Z-Image-Turbo only works with some resolutions when sharding

#21021 · Mar 20, 2026

Medium

[Bug] Model loading time for WAN 2.2

#21000 · Mar 20, 2026

Medium

[Bug][NPU][AWQ] UnquantizedFusedMoEMethod object has no attribute 'apply_without_routing_weights'

#20980 · Mar 20, 2026

Medium

[Bug] can't load AxionML/Qwen3.5-35B-A3B-NVFP4 on fresh `lmsysorg/sglang:dev-cu13` on Nvidia DGX Spark

#20973 · Mar 20, 2026

Medium

[Bug] MiniMax M2.5 throws a CUDA out of memory error when running with speculative decoding

#20966 · Mar 20, 2026

Medium

[Bug] `get_routed_experts` loses the last input token's routing info in prefill-only scenarios

#20964 · Mar 20, 2026

Medium

Beginner-Friendly Issues 94