LLM-as-judge evaluation using a separate model to score agent output quality

April 10, 2026 ยท #688
View on GitHub
Python Difficulty: Medium

Labels

help wanted agentic-engineering ai-coding size/m hacktoberfest

Sign in required

Authenticate to use favourites & bookmarks

5