Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Principled Satisficing To Avoid Goodhart, published by JenniferRM on August 17, 2024 on LessWrong.
There's an admirable LW post with (currently) zero net upvotes titled Goodhart's Law and Emotions where a relatively new user re-invents concepts related to super-stimuli. In the comments, noggin-scratcher explains in more detail:
The technical meaning is a stimulus that produces a stronger response than the stimulus for which that response originally evolved.
So for example a candy bar having a carefully engineered combination of sugar, fat, salt, and flavour in proportions that make it more appetising than any naturally occurring food. Or outrage-baiting infotainment "news" capturing attention more effectively than anything that one villager could have said to another about important recent events.
In my opinion, there's a danger that arises when applying the dictum to know thyself, where one can do this so successfully that one begins to perceive the logical structure of the parts of yourself that generate subjectively accessible emotional feedback signals.
In the face of this, you face a sort of a choice: (1) optimize these to get more hedons AT ALL as a coherent intrinsic good, or (2) something else which is not that.
In general, for myself, when I was younger and possibly more foolish than I am now, I decided that I was going to be explicitly NOT A HEDONIST.
What I meant by this has changed over time, but I haven't given up on it.
In a single paragraph, I might "shoot from the hip" and say that when you are "not a hedonist (and are satisficing in ways that you hope avoid Goodhart)" it doesn't necessarily mean that you throw away joy, it just means that that WHEN you "put on your scientist hat", and try to take baby steps, and incrementally modify your quickly-deployable habits, to make them more robust and give you better outcomes, you treat joy as a measurement, rather than a desiderata.
You treat subjective joy like some third pary scientist (who might have a collaborator who filled their spreadsheet with fake data because they want a Nature paper, that they are still defending the accuracy of, at a cocktail party, in an ego-invested way) saying "the thing the joy is about is good for you and you should get more of it".
When I first played around with this approach I found that it worked to think of myself as abstractly-wanting to explore "conscious optimization of all the things" via methods that try to only pay attention to the plausible semantic understandings (inside of the feeling generating submodules?) that could plausibly have existed back when the hedonic apparatus inside my head was being constructed.
(Evolution is pretty dumb, so these semantic understandings were likely to be quite coarse. Cultural evolution is also pretty dumb, and often actively inimical to virtue and freedom and happiness and lovingkindness and wisdom, so those semantic understandings also might be worth some amount of mistrust.)
Then, given a model of a modeling process that built a feeling in my head, I wanted to try to figure out what things in the world that that modeling process might have been pointing to, and think about the relatively universal instrumental utility concerns that arise proximate to the things that the hedonic subsystem reacts to. Then maybe just... optimize those things in instrumentally reasonable ways?
This would predictably "leave hedons on the table"!
But it would predictably stay aligned with my hedonic subsystems (at least for a while, at least for small amounts of optimization pressure) in cases where maybe I was going totally off the rails because "my theory of what I should optimize for" had deep and profound flaws.
Like suppose I reasoned (and to be clear, this is silly, and the error is there on purpose):
1. Making glucose feel good is simply a w...
view more