1 article
Cline’s Ara Khan does not sell evals as a perfect metric, but as the most useful imperfect instrument for improving AI agents.