welcome to the nonlinear library, where we use text-to-speech software to convert the best writing from the rationalist and ea communities into audio.
this is: Reality is often underpowered, published by Gregory_Lewis on the effective altruism forum.
Introduction
When I worked as a doctor, we had a lecture by a paediatric haematologist, on a condition called Acute Lymphoblastic Leukaemia. I remember being impressed that very large proportions of patients were being offered trials randomising them between different treatment regimens, currently in clinical equipoise, to establish which had the edge. At the time, one of the areas of interest was, given the disease tended to have a good prognosis, whether one could reduce treatment intensity to reduce the long term side-effects of the treatment whilst not adversely affecting survival.
On a later rotation I worked in adult medicine, and one of the patients admitted to my team had an extremely rare cancer,[1] with a (recognised) incidence of a handful of cases worldwide per year. It happened the world authority on this condition worked as a professor of medicine in London, and she came down to see them. She explained to me that treatment for this disease was almost entirely based on first principles, informed by a smattering of case reports. The disease unfortunately had a bleak prognosis, although she was uncertain whether this was because it was an aggressive cancer to which current medical science has no answer, or whether there was an effective treatment out there if only it could be found.
I aver that many problems EA concerns itself with are closer to the second story than the first. That in many cases, sufficient data is not only absent in practice but impossible to obtain in principle. Reality is often underpowered for us to wring the answers from it we desire.
Big units of analysis, small samples
The main driver of this problem for ‘EA topics’ is that the outcomes of interest have units of analysis for which the whole population (leave alone any sample from it) is small-n: e.g. outcomes at the level of a whole company, or a whole state, or whole populations. For these big unit of analysis/small sample problems, RCTs face formidable in principle challenges:
Even if by magic you could get (e.g.) all countries on earth to agree to randomly allocate themselves to policy X or Y, this is merely a sample size of ~200. If you’re looking at companies relevant to cage-free campaigns, or administrative regions within a given state, this can easily fall another order of magnitude.
These units of analysis tend highly heterogeneous, almost certainly in ways that affect the outcome of interest. Although the key ‘selling point’ of the RCT is it implicitly controls for all confounders (even ones you don’t know about), this statistical control is a (convex) function of sample size, and isn’t hugely impressive at ~ 100 per arm: it is well within the realms of possibility for the randomisation happen to give arms with unbalanced allocation of any given confounding factor.
‘Roughly’ (in expectation) balanced intervention arms are unlikely to be good enough in cases where the intervention is expected to have much less effect on the outcome than other factors (e.g. wealth, education, size, whatever), thus an effect size that favours one arm or the other can be alternatively attributed to one of these.
Supplementing this raw randomisation by explicitly controlling for confounders you suspect (cf. block randomisation, propensity matching, etc.) has limited value when don’t know all the factors which plausibly ‘swamp’ the likely intervention effect (i.e. you don’t have a good predictive model for the outcome but-for the intervention tested). In any case, they tend to trade-off against the already scarce resource of sample size.
These ‘small sample’ problems aren’t peculiar to RCTs, but endemic to all other empirical approaches. The wealth of econometric and quasi-experimental methods (e.g. IVs, ...
view more