Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #63: Introducing Alpha Fold 3, published by...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #63: Introducing Alpha Fold 3, published by Zvi on May 10, 2024 on LessWrong.
It was a remarkably quiet announcement. We now have Alpha Fold 3, it does a much improved job predicting all of life's molecules and their interactions. It feels like everyone including me then shrugged and went back to thinking about other things. No cool new toy for most of us to personally play with, no existential risk impact, no big trades to make, ho hum.
But yes, when we look back at this week, I expect what we remember will be Alpha Fold 3.
Unless it turns out that it is Sophon, a Chinese technique to potentially make it harder to fine tune an open model in ways the developer wants to prevent. I do not expect this to get the job done that needs doing, but it is an intriguing proposal.
We also have 95 theses to evaluate in a distinct post, OpenAI sharing the first draft of their model spec, Apple making a world class anti-AI and anti-iPad ad that they released thinking it was a pro-iPad ad, more fun with the mysterious gpt2, and more.
The model spec from OpenAI seems worth pondering in detail, so I am going to deal with that on its own some time in the coming week.
Table of Contents
1.
Introduction.
2.
Table of Contents.
3.
Language Models Offer Mundane Utility. Agents, simple and complex.
4.
Language Models Don't Offer Mundane Utility. No gadgets, no NPCs.
5.
GPT-2 Soon to Tell. Does your current model suck? In some senses.
6.
Fun With Image Generation. Why pick the LoRa yourself?
7.
Deepfaketown and Botpocalypse Soon. It's not exactly going great.
8.
Automation Illustrated. A look inside perhaps the premiere slop mill.
9.
They Took Our Jobs. Or are we pretending this to help the stock price?
10.
Apple of Technically Not AI. Mistakes were made. All the feels.
11.
Get Involved. Dan Hendrycks has a safety textbook and free online course.
12.
Introducing. Alpha Fold 3. Seems like a big deal.
13.
In Other AI News. IBM, Meta and Microsoft in the model game.
14.
Quiet Speculations. Can we all agree that a lot of intelligence matters a lot?
15.
The Quest for Sane Regulation. Major labs fail to honor their commitments.
16.
The Week in Audio. Jack Clark on Politico Tech.
17.
Rhetorical Innovation. The good things in life are good.
18.
Open Weights are Unsafe and Nothing Can Fix This. Unless, maybe? Hmm.
19.
The Lighter Side. Mmm, garlic bread. It's been too long.
Language Models Offer Mundane Utility
How much utility for how much cost? Kapoor and Narayanan argue that with the rise of agent-based systems, you have to evaluate different models on coding tasks based on dollar cost versus quality of results. They find that a simple 'ask GPT-4 and turn the temperature slowly up on retries if you fail' is as good as the agents they tested on HumanEval, while costing less. They mention that perhaps it is different with harder and more complex tasks.
How much does cost matter? If you are using such queries at scale without humans in the loop, or doing them in the background on a constant basis as part of your process, then cost potentially matters quite a bit. That is indeed the point of agents. Or if you are serving lots of customers constantly for lots of queries, those costs can add up fast. Thus all the talk about the most cost-efficient approach.
There are also other purposes for which cost at current margins is effectively zero.
If you are a programmer who must evaluate, use and maintain the code outputted by the AI, what percentage of total costs (including your labor costs) are AI inference? In the most obvious baseline case, something akin to 'a programmer asks for help on tasks,' query speed potentially matters but being slightly better at producing good code, or even slightly better at producing code that is easier for the human to evaluate, understand and learn from, is going to crush...
view more