Why are old_logprob and new_logprob different and the coef_1!=1 in GRPO when num_iterations=1

March 21, 2026 ยท #4502
View on GitHub
Python Difficulty: Medium

Sign in required

Authenticate to use favourites & bookmarks

5