Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #53: One More Leap, published by Zvi on February 29, 2024 on LessWrong.
The main event continues to be the fallout from The Gemini Incident. Everyone is focusing there now, and few are liking what they see.
That does not mean other things stop. There were two interviews with Demis Hassabis, with Dwarkesh Patel's being predictably excellent. We got introduced to another set of potentially highly useful AI products. Mistral partnered up with Microsoft the moment Mistral got France to pressure the EU to agree to cripple the regulations that Microsoft wanted crippled. You know. The usual stuff.
Table of Contents
Introduction.
Table of Contents.
Language Models Offer Mundane Utility. Copilot++ suggests code edits.
Language Models Don't Offer Mundane Utility. Still can't handle email.
OpenAI Has a Sales Pitch. How does the sales team think about AGI?
The Gemini Incident. CEO Pinchai responds, others respond to that.
Political Preference Tests for LLMs. How sensitive to details are the responses?
GPT-4 Real This Time. What exactly should count as plagiarized?
Fun With Image Generation. MidJourney v7 will have video.
Deepfaketown and Botpocalypse Soon. Dead internet coming soon?
They Took Our Jobs. Allow our bot to provide you with customer service.
Get Involved. UK Head of Protocols. Sounds important.
Introducing. Evo, Emo, Genie, Superhuman, Khanmigo, oh my.
In Other AI News. 'Amazon AGI' team? Great.
Quiet Speculations. Unfounded confidence.
Mistral Shows Its True Colors. The long con was on, now the reveal.
The Week in Audio. Demis Hassabis on Dwarkesh Patel, plus more.
Rhetorical Innovation. Once more, I suppose with feeling.
Open Model Weights Are Unsafe and Nothing Can Fix This. Another paper.
Aligning a Smarter Than Human Intelligence is Difficult. New visualization.
Other People Are Not As Worried About AI Killing Everyone. Worry elsewhere?
The Lighter Side. Try not to be too disappointed.
Language Models Offer Mundane Utility
Take notes for your doctor during your visit.
Dan Shipper spent a week with Gemini 1.5 Pro and reports it is fantastic, the large context window has lots of great uses. In particular, Dan focuses on feeding in entire books and code bases.
Dan Shipper: Somehow, Google figured out how to build an AI model that can comfortably accept up to 1 million tokens with each prompt. For context, you could fit all of Eliezer Yudkowsky's 1,967-page opus Harry Potter and the Methods of Rationality into every message you send to Gemini. (Why would you want to do this, you ask? For science, of course.)
Eliezer Yudkowsky: This is a slightly strange article to read if you happen to be Eliezer Yudkowsky. Just saying.
What matters in AI depends so much on what you are trying to do with it. What you try to do with it depends on what you believe it can help you do, and what it makes easy to do.
A new subjective benchmark proposal based on human evaluation of practical queries, which does seem like a good idea. Gets sensible results with the usual rank order, but did not evaluate Gemini Advanced or Gemini 1.5.
To ensure your query works, raise the stakes? Or is the trick to frame yourself as Hiro Protagonist?
Mintone: I'd be interested in seeing a similar analysis but with a slight twist:
We use (in production!) a prompt that includes words to the effect of "If you don't get this right then I will be fired and lose my house". It consistently performs remarkably well - we used to use a similar tactic to force JSON output before that was an option, the failure rate was around 3/1000 (although it sometimes varied key names).
I'd like to see how the threats/tips to itself balance against exactly the same but for the "user" reply.
Linch: Does anybody know why this works??? I understand prompts to mostly be about trying to get the AI to be in the ~right data distributio...
view more