Evals and Espresso: A Perfect Blend

I've started to think about Evals like dialing in an espresso shot.

Just as the meter on the espresso machine checks for time and extraction, LLM-as-Judge scores your traces for the scale, but human still needs to taste-test ☕️👨‍💻