Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AXRP Episode 32 - Understanding Agency with Jan Kulveit, published by DanielFilan on May 30, 2024 on The AI Alignment Forum.
YouTube link
What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group.
Topics we discuss:
What is active inference?
Preferences in active inference
Action vs perception in active inference
Feedback loops
Active inference vs LLMs
Hierarchical agency
The Alignment of Complex Systems group
Daniel Filan: Hello, everybody. This episode, I'll be speaking with Jan Kulveit. Jan is the co-founder and principal investigator of the Alignment of Complex Systems Research Group, where he works on mathematically understanding complex systems composed of both humans and AIs. Previously, he was a research fellow at the Future of Humanity Institute focused on macrostrategy, AI alignment, and existential risk.
For links to what we're discussing you can check the description of this episode and you can read the transcript at axrp.net. Okay. Well Jan, welcome to the podcast.
Jan Kulveit: Yeah, thanks for the invitation.
What is active inference?
Daniel Filan: I'd like to start off with this paper that you've published in December of this last year. It was called "Predictive Minds: Large Language Models as Atypical Active Inference Agents." Can you tell me roughly what was that paper about? What's it doing?
Jan Kulveit: The basic idea is: there's active inference as a field originating in neuroscience, started by people like Karl Friston, and it's very ambitious. The active inference folks claim roughly that they have a super general theory of agency in living systems and so on. And there are LLMs, which are not living systems, but they're pretty smart. So we're looking into how close the models actually are.
Also, it was in part motivated by… If you look at, for example, the 'simulators' series or frame by Janus and these people on sites like the Alignment Forum, there's this idea that LLMs are something like simulators - or there is another frame on this, that LLMs are predictive systems.
And I think this terminology… a lot of what's going on there is basically reinventing stuff which was previously described in active inference or predictive processing, which is another term for minds which are broadly trying to predict their sensory inputs.
And it seems like there is a lot of similarity, and actually, a lot of what was invented in the alignment community seems basically the same concepts just given different names. So noticing the similarity, the actual question is: in what ways are current LLMs different, or to what extent are they similar or to what extent are they different? And the main insight of the paper is… the main defense is: currently LLMs, they lack the fast feedback loop between action and perception.
So if I have now changed the position of my hand, what I see immediately changes. So you can think about [it with] this metaphor, or if you look on how the systems are similar, you could look at base model training of LLMs as some sort of strange edge case of active inference or predictive processing system, which is just receiving sensor inputs, where the sensor inputs are tokens, but it's not acting, it's not changing some data.
And then the model is trained, and it maybe changes a bit in instruct fine-tuning, but ultimately when the model is deployed, we claim that you can think about the interactions of the model with users as actions, because what the model outputs ultimately can change stuff in the world. People will post it on the internet or take actions based on what the LLM is saying.
So the arrow from the system to the world, changing the world, exists, but th...
view more