candle
Python Mediumcandle-org/candle
22 stars
1 forks
34 open issues
Active Mar 2026
Beginner-Friendly Issues 34
Issues tagged for new contributors
[Task][Megatron] 4.4: CUDA-target Megatron compatibility
#266 · Mar 25, 2026
enhancement
[Task][Megatron] 4.3: FP8 path and TransformerEngine-class integration
#265 · Mar 25, 2026
enhancement
[Task][Megatron] 4.2: Expert parallel / MoE communication path
#264 · Mar 25, 2026
enhancement
[Task][Megatron] 4.1: Pipeline parallel 1F1B / interleaved schedules
#263 · Mar 25, 2026
enhancement
[Task][Megatron] 3.2: Distributed checkpoint compatibility validation
#262 · Mar 25, 2026
enhancement
[Task][Megatron] 3.4: Performance benchmark suite
#261 · Mar 25, 2026
enhancement
[Task][Megatron] 3.3: Extend DeviceMesh beyond 2D
#260 · Mar 25, 2026
enhancement
[Task][Megatron] 3.1: NPU-first end-to-end Megatron training smoke test
#259 · Mar 25, 2026
enhancement
[Task][Megatron] 2.4: Validate activation checkpointing compatibility
#258 · Mar 25, 2026
enhancement
[Task][Megatron] 2.3: Megatron-compatible gradient buffer path
#257 · Mar 25, 2026
enhancement
[Task][Megatron] 2.2: Sequence / context parallel building blocks
#256 · Mar 25, 2026
enhancement
enhancement
[Task][Megatron] 1.3: Validate async P2P semantics for pipeline communication
#254 · Mar 25, 2026
enhancement
[Task][Megatron] 1.2: Differentiable collective autograd.Functions
#253 · Mar 25, 2026
enhancement
[Task][Megatron] 1.1: NCCL/HCCL backend routing strategy for Megatron
#252 · Mar 25, 2026
enhancement
[Task][vLLM] 4.4: Multi-model benchmark suite
#251 · Mar 25, 2026
enhancement
enhancement
[Task][vLLM] 4.2: Quantized serving (GPTQ/AWQ/FP8)
#249 · Mar 25, 2026
enhancement
[Task][vLLM] 4.1: Tensor parallel > 1 end-to-end
#248 · Mar 25, 2026
enhancement
[Task][vLLM] 3.4: Benchmark serving throughput / latency
#247 · Mar 25, 2026
enhancement
[Task][vLLM] 3.3: Add quantization entry-point stubs
#246 · Mar 25, 2026
enhancement
[Task][vLLM] 3.2: Add end-to-end vLLM integration tests
#245 · Mar 25, 2026
enhancement
[Task][vLLM] 3.1: Validate tensor-parallel execution with HCCL
#244 · Mar 25, 2026
enhancement
[Task][vLLM] 2.4: Validate and harden NPUGraph for decode replay
#243 · Mar 25, 2026
enhancement
[Task][vLLM] 2.3: Add fused RoPE / rotary embedding kernel for NPU
#242 · Mar 25, 2026
enhancement
[Task][vLLM] 2.2: Implement npu_fusion_attention compatibility wrapper
#241 · Mar 25, 2026
enhancement
[Task][vLLM] 2.1: Implement PagedAttention for NPU
#240 · Mar 25, 2026
enhancement
[Task][vLLM] 1.5: Make torch.compile(..., backend="eager") a valid no-op
#239 · Mar 25, 2026
enhancement
[Task][vLLM] 1.4: Expose Tensor.record_stream(stream)
#238 · Mar 25, 2026
enhancement
[Task][vLLM] 1.3: Add meta/fake tensor support
#237 · Mar 25, 2026
enhancement
🗺️ Roadmap: Megatron-LM / Megatron-Core Full Compatibility (NPU-first)
#206 · Mar 24, 2026
🗺️ Roadmap: vLLM Full Compatibility (NPU-first)
#205 · Mar 24, 2026
Questions regarding performance benchmarks and vLLM support
#202 · Mar 24, 2026
good first issue question