Download - LW - [LDSL#0] Some epistemological conundrums by tailcalled

Discover

Podcast Features
Your all-in-one podcasting solution.

Blog to Podcast
Turn your blog into an engaging podcast.
Livestream
High-performing audio live, without limits.

Podcast Studio
Easy-to-use audio recorder app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Podcast App
The best podcast player & podcast app.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.
Live Streaming
Receive livestream rewards from your audience and earn
recurring income from your Fan Club membership.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Advertisers
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - [LDSL#0] Some epistemological conundrums by tailcalled

2024-08-08

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [LDSL#0] Some epistemological conundrums, published by tailcalled on August 8, 2024 on LessWrong.
This post is also available on my Substack.
When you deal with statistical science, causal inference, measurement, philosophy, rationalism, discourse, and similar, there's some different questions that pop up, and I think I've discovered that there's a shared answer behind a lot of the questions that I have been thinking about. In this post, I will briefly present the questions, and then in a followup post I will try to give my answer for them.
Why are people so insistent about outliers?
A common statistical method is to assume an outcome is due to a mixture of observed factors and unobserved factors, and then model how much of an effect the observed factors have, and attribute all remaining variation to unobserved factors. And then one makes claims about the effects of the observed factors.
But some people then pick an outlier and demand an explanation for that outlier, rather than just accepting the general statistical finding:
In fact, aren't outliers almost by definition anti-informative? No model is perfect, so there's always going to be cases we can't model. By insisting on explaining all those rare cases, we're basically throwing away the signal we can model.
A similar point applies to reading the news. Almost by definition, the news is about uncommon stuff like terrorist attacks, rather than common stuff like heart disease. Doesn't reading such things invert your perception, such that you end up focusing on exactly the least relevant things?
Why isn't factor analysis considered the main research tool?
Typically if you have a ton of variables, you can perform a factor analysis which identifies a set of variables which explain a huge chunk of variation across those variables. If you are used to performing factor analysis, this feels like a great way to get an overview over the subject matter. After all, what could be better than knowing the main dimensions of variation?
Yet a lot of people think of factor analysis as being superficial and uninformative. Often people insist that it only yields aggregates rather than causes, and while that might seem plausible at first, once you dig into it enough, you will see that usually the factors identified are actually causal, so that can't be the real problem.
A related question is why people tend to talk in funky discrete ways when careful quantitative analysis generally finds everything to be continuous. Why do people want clusters more than they want factors? Especially since cluster models tend to be more fiddly and less robust.
Why do people want "the" cause?
There's a big gap between how people intuitively view causal inference (often searching for "the" cause of something), versus how statistics views causal inference. The main frameworks for causal inference in statistics are Rubin's Potential Outcomes framework and Pearl's DAG approach, and both of these view causality as a function from inputs to outputs.
In these frameworks, causality is about functional input/output relationships, and there are many different notions of causal effects, not simply one canonical "cause" of something.
Why are people dissatisfied with GWAS?
In genome-wide association searches, researchers use statistics to identify alleles that are associated with specific outcomes of interest (e.g. health, psychological characteristics, SES outcomes). They've been making consistent progress over time, finding tons of different genetic associations and gradually becoming able to explain more and more variance between people.
Yet GWAS is heavily criticized as "not causal". While there are certain biases that can occur, those biases are usually found to be much smaller than seems justified by these critiques. So what gives?
What value does qualitative r...