Why AI Evaluation Science Can't Keep Up (with Carina Prunkl)
Inria researcher Carina Prunkl discusses why AI evaluation struggles to keep pace with general-purpose systems, including jagged capabilities, missed real-world behavior, misuse risks, de-skilling, red teaming, and layered safeguards.
View episode