ARC-AGI post
Getting 50% (SoTA) on ARC-AGI with GPT-4oI recently got to 50%[1] accuracy on the public test set for ARC-AGI by having GPT-4o generate a huge number of Python implementations of the transformation rule (around 8,000 per problem) and then selecting among these implementations based on correctness of the Python programs on the examples (if this is confusing, go here)[2]. I use a variety of additional approaches and tweaks which overall substantially improve the performance of my method relative to just sampling 8,000 programs.
[This post is on a pretty different topic than the usual posts on our substack. So regular readers should be warned!]
The additional approaches and tweaks are:
- I use few-shot prompts which perform meticulous step-by-step reasoning.
- I have GPT-4o try to revise some of the implementations after seeing what they actually output on the provided examples.
- I do some feature engineering [...]
The original text contained 15 footnotes which were omitted from this narration.
---
First published: June 17th, 2024
Source: https://www.lesswrong.com/posts/Rdwui3wHxCeKb7feK/getting-50-sota-on-arc-agi-with-gpt-4o
---
Narrated by TYPE III AUDIO.