Skip to content
AIAP: An Overview of Technical AI Alignment with Rohin Shah (Part 2)
· Technology & Future

AIAP: An Overview of Technical AI Alignment with Rohin Shah (Part 2)

Listen to Episode Here


Show Notes

The space of AI alignment research is highly dynamic, and it's often difficult to get a bird's eye view of the landscape. This podcast is the second of two parts attempting to partially remedy this by providing an overview of technical AI alignment efforts. In particular, this episode seeks to continue the discussion from Part 1 by going in more depth with regards to the specific approaches to AI alignment. In this podcast, Lucas spoke with Rohin Shah. Rohin is a 5th year PhD student at UC Berkeley with the Center for Human-Compatible AI, working with Anca Dragan, Pieter Abbeel and Stuart Russell. Every week, he collects and summarizes recent progress relevant to AI alignment in the Alignment Newsletter.

Topics discussed in this episode include:

You can take a short (3 minute) survey to share your feedback about the podcast here.

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, iTunes, Google Play, Stitcher, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.

Recommended/mentioned reading

Value Learning sequence

Embedded Agency sequence

Iterated Amplification sequence

AI Alignment Newsletter database

Reframing Superintelligence: CAIS as General Intelligence

Guidelines for AI Containment

Penalizing side effects using stepwise relative reachability

Towards a New Impact Measure

Techniques for optimizing worst-case performance

Cooperative Inverse Reinforcement Learning

Deep reinforcement learning from human preferences

Inverse Reward Design

Clarifying “AI alignment”

Supervising strong learners by amplifying weak experts

AI safety via debate

Factored Cognition

The Building Blocks of Interpretability

Feature Visualization

Good and safe uses of AI Oracles

You can learn more about Rohin’s work here and follow his Alignment Newsletter here.


Related episodes

No matter your level of experience or seniority, there is something you can do to help us ensure the future of life is positive.