Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Devin, published by Zvi on March 18, 2024 on LessWrong.
Introducing Devin
Is the era of AI agents writing complex code systems without humans in the loop upon us?
Cognition is calling Devin 'the first AI software engineer.'
Here is a two minute demo of Devin benchmarking LLM performance.
Devin has its own web browser, which it uses to pull up documentation.
Devin has its own code editor.
Devin has its own command line.
Devin uses debugging print statements and uses the log to fix bugs.
Devin builds and deploys entire stylized websites without even being directly asked.
What could possibly go wrong? Install this on your computer today.
Padme.
The Real Deal
I would by default assume all demos were supremely cherry-picked. My only disagreement with Austen Allred's statement here is that this rule is not new:
Austen Allred: New rule:
If someone only shows their AI model in tightly controlled demo environments we all assume it's fake and doesn't work well yet
But in this case Patrick Collison is a credible source and he says otherwise.
Patrick Collison: These aren't just cherrypicked demos. Devin is, in my experience, very impressive in practice.
Here we have Mckay Wrigley using it for half an hour. This does not feel like a cherry-picked example, although of course some amount of select is there if only via the publication effect.
He is very much a maximum acceleration guy, for whom everything is always great and the future is always bright, so calibrate for that, but still yes this seems like evidence Devin is for real.
This article in Bloomberg from Ashlee Vance has further evidence. It is clear that Devin is a quantum leap over known past efforts in terms of its ability to execute complex multi-step tasks, to adapt on the fly, and to fix its mistakes or be adjusted and keep going.
For once, when we wonder 'how did they do that, what was the big breakthrough that made this work' the Cognition AI people are doing not only the safe but also the smart thing and they are not talking.
They do have at least one series rival, as Magic.ai has raised $100 million from the venture team of Daniel Gross and Nat Friedman to build 'a superhuman software engineer,' including training their own model. The article seems strange interested in where AI is 'a bubble' as opposed to this amazing new technology.
This is one of those 'helps until it doesn't situations' in terms of jobs:
vanosh: Seeing this is kinda scary. Like there is no way companies won't go for this instead of humans.
Should I really have studied HR?
Mckay Wrigley: Learn to code! It makes using Devin even more useful.
Devin makes coding more valuable, until we hit so many coders that we are coding everything we need to be coding, or the AI no longer needs a coder in order to code. That is going to be a ways off. And once it happens, if you are not a coder, it is reasonable to ask yourself: What are you even doing? Plumbing while hoping for the best will probably not be a great strategy in that world.
The Metric
Devin can sometimes (13.8% of the time?!) do actual real jobs on Upwork with nothing but a prompt to 'figure it out.'
Aravind Srinivas (CEO Perplexity): This is the first demo of any agent, leave alone coding, that seems to cross the threshold of what is human level and works reliably. It also tells us what is possible by combining LLMs and tree search algorithms: you want systems that can try plans, look at results, replan, and iterate till success. Congrats to Cognition Labs!
Andres Gomez Sarmiento: Their results are even more impressive you read the fine print. All the other models were guided whereas devin was not. Amazing.
Deedy: I know everyone's taking about it, but Devin's 13% on SWE Bench is actually incredible.
Just take a look at a sample SWE Bench problem: this is a task for a human! Shout out to Car...
view more