Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI for Bio: State Of The Field, published by sarahconstantin on August 31, 2024 on LessWrong.
AI for biotech, particularly with drug discovery applications, has been used for more than a decade, with ambiguous success. But in the era of foundation models we may have experienced a step change in what's possible.
I used to work on AI-for-drug-discovery years ago, at Recursion, where we sought to identify phenotypes of genetic diseases visible in microscopic images of cells, and screen for drugs that made the cells visually "look healthy" in the hopes that those drugs would also turn out to be effective against the symptoms of the disease.
Circa 2016, we were just beginning to transition from the old-fashioned sort of machine learning based heavily on feature engineering, to the new "deep learning" paradigm with much larger neural nets. "Old-school" machine learning was often accused of being nothing more than logistic regression in fancy VC-funded branding, and there was often some truth to that.
When our models worked best, they were picking up human-interpretable phenotypes that a pathologist could probably have described decades ago: something like "this disease causes enlarged nuclei". And, when we first started replacing the old models with deep neural nets, it wasn't clear that the New Hotness was going to work better than the Old Standby.
But things have changed. Bigger, better models (often Transformer-based) are everywhere in biotech. They genuinely seem to be changing the state of the art in drug (and biologic) development. And it's past time to do a serious review of what's become available and what it can and can't do.
AI optimists who aren't familiar with biotech are often wildly miscalibrated about what AI tools can do even in the best case scenario.
The average approved drug in the US costs $879.3 million[1] in R&D expenses (counting the costs of failed drugs), and nearly 90% of that is spent on clinical trials. It's legally, scientifically, and ethically necessary to test drugs on humans to see if they're safe and effective. And while the ballooning cost of running clinical trials is a problem worth tackling in itself[2], it's inherently time- and labor-intensive to run valid experiments on human patients.
An AI is never going to "design a drug" that you can give to patients right away. Even if the AI were a perfect all-knowing oracle, pharmaceutical companies would still need to run animal and then human trials.
AI for biotech is attempting to automate and improve particular sub-problems within that 10% of costs spent on drug discovery and preclinical research. This is hardly trivial, especially if it enables the development of new classes of drugs that were completely inaccessible before. But it does place AI hype in context.
An AI model's value to the drug discovery process is bounded by:
the labor cost of the time it saves on more manual processes
the cost it saves on any experiments it can fully replace
the cost of any failed experiments it can prevent from being done altogether
the value of any new successful therapies that would not even have been attempted without the model
If the model tells you to do something you would probably have done anyway, it's useless. If the model replaces something you would have needed to do manually, it's somewhat useful. If the model increases your odds of a successful therapy, it's extremely useful, and if it adds successful therapies it's world-changing.
With that paradigm set up, let's dig into the details.
This won't be an exhaustive list of models, or an in-depth evaluation of their performance, but an overview of the big, influential, and buzzy and a summary of what they do.
Structure Prediction Models
One class of AI models with biotech applications tackles one of the most classically fiendish problems in c...
view more