LLM-as-judge evaluation using a separate model to score agent output quality
April 10, 2026 ยท #688
Python
Difficulty: Medium
Labels
help wanted agentic-engineering ai-coding size/m hacktoberfest
Parent Repository
chernistry/bernstein
Python repository
104 15