LLM-as-judge evaluation using a separate model to score agent output quality

April 10, 2026 · #688

Python Difficulty: Medium

Labels

help wanted agentic-engineering ai-coding size/m hacktoberfest

Parent Repository

chernistry/bernstein

Python repository

All Issues Back to bernstein

Sign in required

Authenticate to use favourites & bookmarks

5