Download - LW - On Devin by Zvi

Discover

Podcast Features
Your all-in-one podcasting solution.

Blog to Podcast
Turn your blog into an engaging podcast.
Livestream
High-performing audio live, without limits.

Podcast Studio
Easy-to-use audio recorder app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Podcast App
The best podcast player & podcast app.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.
Live Streaming
Receive livestream rewards from your audience and earn
recurring income from your Fan Club membership.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Advertisers
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - On Devin by Zvi

2024-03-18

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Devin, published by Zvi on March 18, 2024 on LessWrong.
Introducing Devin
Is the era of AI agents writing complex code systems without humans in the loop upon us?
Cognition is calling Devin 'the first AI software engineer.'
Here is a two minute demo of Devin benchmarking LLM performance.
Devin has its own web browser, which it uses to pull up documentation.
Devin has its own code editor.
Devin has its own command line.
Devin uses debugging print statements and uses the log to fix bugs.
Devin builds and deploys entire stylized websites without even being directly asked.
What could possibly go wrong? Install this on your computer today.
Padme.
The Real Deal
I would by default assume all demos were supremely cherry-picked. My only disagreement with Austen Allred's statement here is that this rule is not new:
Austen Allred: New rule:
If someone only shows their AI model in tightly controlled demo environments we all assume it's fake and doesn't work well yet
But in this case Patrick Collison is a credible source and he says otherwise.
Patrick Collison: These aren't just cherrypicked demos. Devin is, in my experience, very impressive in practice.
Here we have Mckay Wrigley using it for half an hour. This does not feel like a cherry-picked example, although of course some amount of select is there if only via the publication effect.
He is very much a maximum acceleration guy, for whom everything is always great and the future is always bright, so calibrate for that, but still yes this seems like evidence Devin is for real.
This article in Bloomberg from Ashlee Vance has further evidence. It is clear that Devin is a quantum leap over known past efforts in terms of its ability to execute complex multi-step tasks, to adapt on the fly, and to fix its mistakes or be adjusted and keep going.
For once, when we wonder 'how did they do that, what was the big breakthrough that made this work' the Cognition AI people are doing not only the safe but also the smart thing and they are not talking.
They do have at least one series rival, as Magic.ai has raised $100 million from the venture team of Daniel Gross and Nat Friedman to build 'a superhuman software engineer,' including training their own model. The article seems strange interested in where AI is 'a bubble' as opposed to this amazing new technology.
This is one of those 'helps until it doesn't situations' in terms of jobs:
vanosh: Seeing this is kinda scary. Like there is no way companies won't go for this instead of humans.
Should I really have studied HR?
Mckay Wrigley: Learn to code! It makes using Devin even more useful.
Devin makes coding more valuable, until we hit so many coders that we are coding everything we need to be coding, or the AI no longer needs a coder in order to code. That is going to be a ways off. And once it happens, if you are not a coder, it is reasonable to ask yourself: What are you even doing? Plumbing while hoping for the best will probably not be a great strategy in that world.
The Metric
Devin can sometimes (13.8% of the time?!) do actual real jobs on Upwork with nothing but a prompt to 'figure it out.'
Aravind Srinivas (CEO Perplexity): This is the first demo of any agent, leave alone coding, that seems to cross the threshold of what is human level and works reliably. It also tells us what is possible by combining LLMs and tree search algorithms: you want systems that can try plans, look at results, replan, and iterate till success. Congrats to Cognition Labs!
Andres Gomez Sarmiento: Their results are even more impressive you read the fine print. All the other models were guided whereas devin was not. Amazing.
Deedy: I know everyone's taking about it, but Devin's 13% on SWE Bench is actually incredible.
Just take a look at a sample SWE Bench problem: this is a task for a human! Shout out to Car...