Skip to content
AIAP: Synthesizing a human's preferences into a utility function with Stuart Armstrong
· Technology & Future

AIAP: Synthesizing a human's preferences into a utility function with Stuart Armstrong

Listen to Episode Here


Show Notes

In his Research Agenda v0.9: Synthesizing a human's preferences into a utility function, Stuart Armstrong develops an approach for generating friendly artificial intelligence. His alignment proposal can broadly be understood as a kind of inverse reinforcement learning where most of the task of inferring human preferences is left to the AI itself. It's up to us to build the correct assumptions, definitions, preference learning methodology, and synthesis process into the AI system such that it will be able to meaningfully learn human preferences and synthesize them into an adequate utility function. In order to get this all right, his agenda looks at how to understand and identify human partial preferences, how to ultimately synthesize these learned preferences into an "adequate" utility function, the practicalities of developing and estimating the human utility function, and how this agenda can assist in other methods of AI alignment.

Topics discussed in this episode include:

Last chance to take a short (4 minute) survey to share your feedback about the podcast.

Key points from Stuart:

Important timestamps:

0:00 Introductions

3:24 A story of evolution (inspiring just-so story)

6:30 How does your “inspiring just-so story” help to inform this research agenda?

8:53 The two core parts to the research agenda

10:00 How this research agenda is contextualized in the AI alignment landscape

12:45 The fundamental ideas behind the research project

15:10 What are partial preferences?

17:50 Why reflexive self-consistency isn’t enough

20:05 How are humans contradictory and how does this affect the difficulty of the agenda?

25:30 Why human values being underdefined presents the greatest challenge

33:55 Expanding on the synthesis process

35:20 How to extract the partial preferences of the person

36:50 Why a utility function?

41:45 Are there alternative goal ordering or action producing methods for agents other than utility functions?

44:40 Extending and normalizing partial preferences and covering the rest of section 2

50:00 Moving into section 3, synthesizing the utility function in practice

52:00 Why this research agenda is helpful for other alignment methodologies

55:50 Limits of the agenda and other problems

58:40 Synthesizing a species wide utility function

1:01:20 Concerns over the alignment methodology containing leaky abstractions

1:06:10 Reflective equilibrium and the agenda not being a philosophical ideal

1:08:10 Can we check the result of the synthesis process?

01:09:55 How did the Mahatma Armstrong idealization process fail?

01:14:40 Any clarifications for the AI alignment community?

Works referenced:

Research Agenda v0.9: Synthesising a human's preferences into a utility function

Some Comments on Stuart Armstrong's "Research Agenda v0.9"

Mahatma Armstrong: CEVed to death

The Bitter Lesson

We hope that you will continue to join in the conversations by following us or subscribing to our podcasts on Youtube, Spotify, SoundCloud, iTunes, Google Play, Stitcher, iHeartRadio, or your preferred podcast site/application. You can find all the AI Alignment Podcasts here.


Related episodes

No matter your level of experience or seniority, there is something you can do to help us ensure the future of life is positive.