Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [LDSL#0] Some epistemological conundrums, published by tailcalled on August 8, 2024 on LessWrong.
This post is also available on my Substack.
When you deal with statistical science, causal inference, measurement, philosophy, rationalism, discourse, and similar, there's some different questions that pop up, and I think I've discovered that there's a shared answer behind a lot of the questions that I have been thinking about. In this post, I will briefly present the questions, and then in a followup post I will try to give my answer for them.
Why are people so insistent about outliers?
A common statistical method is to assume an outcome is due to a mixture of observed factors and unobserved factors, and then model how much of an effect the observed factors have, and attribute all remaining variation to unobserved factors. And then one makes claims about the effects of the observed factors.
But some people then pick an outlier and demand an explanation for that outlier, rather than just accepting the general statistical finding:
In fact, aren't outliers almost by definition anti-informative? No model is perfect, so there's always going to be cases we can't model. By insisting on explaining all those rare cases, we're basically throwing away the signal we can model.
A similar point applies to reading the news. Almost by definition, the news is about uncommon stuff like terrorist attacks, rather than common stuff like heart disease. Doesn't reading such things invert your perception, such that you end up focusing on exactly the least relevant things?
Why isn't factor analysis considered the main research tool?
Typically if you have a ton of variables, you can perform a factor analysis which identifies a set of variables which explain a huge chunk of variation across those variables. If you are used to performing factor analysis, this feels like a great way to get an overview over the subject matter. After all, what could be better than knowing the main dimensions of variation?
Yet a lot of people think of factor analysis as being superficial and uninformative. Often people insist that it only yields aggregates rather than causes, and while that might seem plausible at first, once you dig into it enough, you will see that usually the factors identified are actually causal, so that can't be the real problem.
A related question is why people tend to talk in funky discrete ways when careful quantitative analysis generally finds everything to be continuous. Why do people want clusters more than they want factors? Especially since cluster models tend to be more fiddly and less robust.
Why do people want "the" cause?
There's a big gap between how people intuitively view causal inference (often searching for "the" cause of something), versus how statistics views causal inference. The main frameworks for causal inference in statistics are Rubin's Potential Outcomes framework and Pearl's DAG approach, and both of these view causality as a function from inputs to outputs.
In these frameworks, causality is about functional input/output relationships, and there are many different notions of causal effects, not simply one canonical "cause" of something.
Why are people dissatisfied with GWAS?
In genome-wide association searches, researchers use statistics to identify alleles that are associated with specific outcomes of interest (e.g. health, psychological characteristics, SES outcomes). They've been making consistent progress over time, finding tons of different genetic associations and gradually becoming able to explain more and more variance between people.
Yet GWAS is heavily criticized as "not causal". While there are certain biases that can occur, those biases are usually found to be much smaller than seems justified by these critiques. So what gives?
What value does qualitative r...
view more