Listen to Episode Here
Show Notes
The space of AI alignment research is highly dynamic, and it's often difficult to get a bird's eye view of the landscape. This podcast is the second of two parts attempting to partially remedy this by providing an overview of technical AI alignment efforts. In particular, this episode seeks to continue the discussion from Part 1 by going in more depth with regards to the specific approaches to AI alignment. In this podcast, Lucas spoke with Rohin Shah. Rohin is a 5th year PhD student at UC Berkeley with the Center for Human-Compatible AI, working with Anca Dragan, Pieter Abbeel and Stuart Russell. Every week, he collects and summarizes recent progress relevant to AI alignment in the Alignment Newsletter.
Topics discussed in this episode include:
- Embedded agency
- The field of "getting AI systems to do what we want"
- Ambitious value learning
- Corrigibility, including iterated amplification, debate, and factored cognition
- AI boxing and impact measures
- Robustness through verification, adverserial ML, and adverserial examples
- Interpretability research
- Comprehensive AI Services
- Rohin's relative optimism about the state of AI alignment
You can take a short (3 minute) survey to share your feedback about the podcast here.
We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, iTunes, Google Play, Stitcher, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.
Recommended/mentioned reading
Iterated Amplification sequence
AI Alignment Newsletter database
Reframing Superintelligence: CAIS as General Intelligence
Penalizing side effects using stepwise relative reachability
Techniques for optimizing worst-case performance
Cooperative Inverse Reinforcement Learning
Deep reinforcement learning from human preferences
Supervising strong learners by amplifying weak experts
The Building Blocks of Interpretability
Good and safe uses of AI Oracles
You can learn more about Rohin’s work here and follow his Alignment Newsletter here.