Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Spatial attention as a "tell" for empathetic simulation?, published by Steven Byrnes on April 26, 2024 on LessWrong.
(Half-baked work-in-progress. There might be a "version 2" of this post at some point, with fewer mistakes, and more neuroscience details, and nice illustrations and pedagogy etc. But it's fun to chat and see if anyone has thoughts.)
1. Background
There's a neuroscience problem that's had me stumped since almost the very beginning of when I became interested in neuroscience at all (as a lens into AGI safety) back in 2019. But I think I might finally have "a foot in the door" towards a solution!
What is this problem? As described in my post
Symbol Grounding and Human Social Instincts, I believe the following:
(1) We can divide the brain into a "Learning Subsystem" (cortex, striatum, amygdala, cerebellum and a few other areas) on the one hand, and a "Steering Subsystem" (mostly hypothalamus and brainstem) on the other hand; and a human's "innate drives" (roughly equivalent to the reward function in reinforcement learning) are calculated by a bunch of specific, genetically-specified
"business logic" housed in the latter subsystem;
(2) Some of those "innate drives" are related to human social instincts - a suite of reactions that are upstream of things like envy and compassion;
(3) It might be helpful for AGI safety (for reasons briefly summarized
here) if we understood exactly how those particular drives worked. Ideally this would look like legible pseudocode that's simultaneously compatible with behavioral observations (including everyday experience), with evolutionary considerations, and with a neuroscience-based story of how that pseudocode is actually implemented by neurons in the brain. (
Different example of what I think it looks like to make progress towards that kind of pseudocode.)
(4) Explaining how those innate drives work is tricky in part because of the "symbol grounding problem", but it probably centrally involves "transient empathetic simulations" (see
§13.5 of the post linked at the top);
(5) …and therefore there needs to be some mechanism in the brain by which the "Steering Subsystem" (hypothalamus & brainstem) can tell whether the "Learning Subsystem" (cortex etc.) world-model is being queried for the purpose of a "transient empathetic simulation", or whether that same world-model is instead being queried for some other purpose, like recalling a memory, considering a possible plan, or perceiving what's happening right now.
As an example of (5), if Zoe is yelling at me, then when I look at Zoe, a thought might flash across my mind, for a fraction of a second, wherein I mentally simulate Zoe's angry feelings. Alternatively, I might imagine myself potentially feeling angry in the future. Both of those possible thoughts involve my cortex sending a weak but legible-to-the-brainstem ("grounded") anger-related signal to the hypothalamus and brainstem (mainly via the amygdala) (I claim).
But the hypothalamus and brainstem have presumably evolved to trigger different reactions in those two cases, because the former but not the latter calls for a specific social reaction to Zoe's anger. For example, in the former case, maybe Zoe's anger would trigger in me a reaction to feel anger back at Zoe in turn, although not necessarily because there are other inputs to the calculation as well.
So I think there has to be some mechanism by which the hypothalamus and/or brainstem can figure out whether or not a (transient) empathetic simulation was upstream of those anger-related signals. And I don't know what that mechanism is.
I came into those five beliefs above rather quickly - the first time I mentioned that I was confused about how (5) works, it was way back in my
second-ever neuroscience blog post, maybe within the first 50 hours of my trying to teach m...
view more