Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A primer on why computational predictive toxicology is hard, published by Abhishaike Mahajan on August 19, 2024 on LessWrong.
Introduction
There are now (claimed) foundation models for
protein sequences,
DNA sequences,
RNA sequences,
molecules,
scRNA-seq,
chromatin accessibility,
pathology slides,
medical images,
electronic health records, and
clinical free-text. It's a dizzying rate of progress.
But there's a few problems in biology that, interestingly enough, have evaded a similar level of ML progress, despite there seemingly being all the necessary conditions to achieve it.
Toxicology is one of those problems.
This isn't a new insight,
it was called out in one of Derek Lowe's posts, where he said: There are no existing AI/ML systems that mitigate clinical failure risks due to target choice or toxicology.
He also repeats it in a more recent post: '…the most badly needed improvements in drug discovery are in the exact areas that are most resistant to AI and machine learning techniques. By which I mean target selection and predictive toxicology.'
Pat Walters also goes into the subject with much more depth, emphasizing how difficult the whole field is.
As someone who isn't familiar at all with the area of predictive toxicology, that immediately felt strange. Why such little progress? It can't be that hard, right? Unlike drug development, where you're trying to precisely hit some key molecular mechanism, assessing toxicity almost feels…brutish in nature. Something that's as clear as day, easy to spot out with eyes, easier still to do with a computer trained to look for it.
Of course, there will be some stragglers that leak through this filtering, but it should be minimal. Obviously a hard problem in its own right, but why isn't it close to being solved?
What's up with this field?
Some background
One may naturally assume that there is a well-established definition of toxicity, a standard blanket definition to delineate between things that are and aren't toxic. While there are terms such as LD50, LC50, EC50, and IC50, used to explain the degree by which something is toxic, they are an immense oversimplification.
When we say a substance is "toxic," there's usually a lot of follow-up questions. Is it toxic at any dose? Only above a certain threshold? Is it toxic for everyone, or just for certain susceptible individuals (as we'll discuss later)? The relationship between dose and toxicity is not always linear, and can vary depending on the route of exposure, the duration of exposure, and individual susceptibility factors.
A dose that causes no adverse effects when consumed orally might be highly toxic if inhaled or injected. And a dose that is well-tolerated with acute exposure might cause serious harm over longer periods of chronic exposure. The very definition of an "adverse effect" resulting from toxicity is not always clear-cut either. Some drug side effects, like mild nausea or headache, might be considered acceptable trade-offs for therapeutic benefit.
But others, like liver failure or birth defects, would be considered unacceptable at any dose. This is particularly true when it comes to environmental chemicals, where the effects may be subtler and the exposure levels more variable. Is a chemical that causes a small decrease in IQ scores toxic? What about one that slightly increases the risk of cancer over a lifetime (20+ years)?
And this is one of the major problems with applying predicting toxicology at all - defining what is and isn't toxic is hard! One may assume the FDA has clear stances on all these, but even they approach it on a 'vibe-based' perspective. They simply collate the data from in-vitro studies, animal studies, and human clinical trials, and arrive to an approval/no-approval conclusion that is, very often, at odds with some portion of the medical comm...
view more