Why AI Evaluation Science Can't Keep Up (with Carina Prunkl)

Apr 17, 2026 · Existential Risk

Why AI Evaluation Science Can't Keep Up (with Carina Prunkl)

Inria researcher Carina Prunkl discusses why AI evaluation struggles to keep pace with general-purpose systems, including jagged capabilities, missed real-world behavior, misuse risks, de-skilling, red teaming, and layered safeguards.

Gus Docker

Watch Episode Here

Listen to Episode Here

Show Notes

Carina Prunkl is a researcher at Inria. She joins the podcast to discuss how to assess the capabilities and risks of general-purpose AI. We examine why systems can solve hard coding and math problems yet still fail at simple tasks, why pre-deployment tests often miss real-world behavior, and how faster capability gains can increase misuse risks. The conversation also covers de-skilling, red teaming, layered safeguards, and warning signs that AIs might undermine oversight.

LINKS:

Carina Prunkl personal website

CHAPTERS:

(00:00) Episode Preview

(01:04) Introducing the report

(02:10) Jagged frontier capabilities

(05:29) Formal reasoning progress

(12:36) Risks and evaluation science

(19:00) Funding evaluation capacity

(24:03) Autonomy and de-skilling

(31:32) Authenticity and AI companions

(41:00) Defense in depth methods

(48:34) Loss of control risks

(53:16) Where to read report

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Website: https://podcast.futureoflife.org

Twitter (FLI): https://x.com/FLI_org

Twitter (Gus): https://x.com/gusdocker

LinkedIn: https://www.linkedin.com/company/future-of-life-institute/

YouTube: https://www.youtube.com/channel/UC-rCCy3FQ-GItDimSR9lhzw/

Apple: https://geo.itunes.apple.com/us/podcast/id1170991978

Spotify: https://open.spotify.com/show/2Op1WO3gwVwCrYHg4eoGyP

Share Share Share Share Share Email

Related episodes

Why AI Chatbots Are a Rival to the Family (with Michael Toscano)

May 26, 2026 · Governance & Policy

Why AI Chatbots Are a Rival to the Family (with Michael Toscano)

Why We Should Build AI Tools, Not AI Replacements (with Anthony Aguirre)

May 11, 2026 · Governance & Policy

Why We Should Build AI Tools, Not AI Replacements (with Anthony Aguirre)

How to Govern AI When You Can't Predict the Future (with Charlie Bullock)

May 7, 2026 · Governance & Policy

How to Govern AI When You Can't Predict the Future (with Charlie Bullock)

No matter your level of experience or seniority, there is something you can do to help us ensure the future of life is positive.