Understanding the Most Viral Chart in Artificial Intelligence

Tracy Alloway and Joe Wiesenthal host Joel Becker (technical staff) and Chris Painter (president) from METER, a San Francisco-based AI safety research nonprofit. METER specializes in measuring AI autonomy and catastrophic risk assessment through their viral time horizon charts.

From Odd Lots 3 min read

Episode

0:00 0:00

Odd Lots

Subscribe to Notes Upgrade

Odd Lots

Key Takeaways

01
METER's time horizon charts show AI models doubling capabilities every 4 months, accelerating from previous 7-month doubling times
02
Claude Opus 4.6 can complete tasks that take humans 12 hours with 50% success rate, representing exponential capability growth
03
Chinese AI models like DeepSeek don't appear on frontier charts despite market excitement, lagging 9-12 months behind US models
04
METER operates with only 30 people working on what they consider 'world-important problems' while being bottlenecked on technical talent
05
Time horizon measures task difficulty by human completion time, not actual AI working duration - a common misunderstanding
06
AI labs' R&D compute spending has risen exponentially at the same rate as capability progress, suggesting continued acceleration

Get the latest ideas from Odd Lots.

Plus the best new takeaways about artificial intelligence from other top podcasts — read in minutes, not hours.

Continue with Google

By continuing, you agree to podbrain's Terms and Privacy Policy.

These notes may contain occasional inaccuracies. Learn how podbrain notes are made

The conversation centers on METER's exponential capability charts showing AI progress, particularly the latest Claude Opus model achieving 12-hour task completion capabilities. The discussion explores the methodology behind these measurements, their implications for AI safety, and the disconnect between safety research goals and public investment enthusiasm.

Key topics include the acceleration from 7-month to 4-month capability doubling times, Chinese model performance gaps, the challenges of nonprofit AI research, and the unusual dynamic where AI companies simultaneously promote and warn about their technology's potential dangers.

Decoding METER's Viral Time Horizon Charts

METER's charts measure task difficulty by human completion time, not AI working duration - Claude Opus 4.6's '12 hours' means it can complete tasks that take humans 12 hours, with 50% success rate

The methodology involves humans completing identical tasks under similar conditions, timing their performance, then testing AI models on the same tasks to establish capability benchmarks

Tasks focus specifically on engineering work that 'a frontier AI lab engineer might be doing' - software engineering, machine learning, and fine-tuning AI models, not general human activities

The 50% threshold is chosen for statistical reliability and represents the point where 'it is more likely that the model will be able to do the task than that it can't' - Joel

Exponential Progress Acceleration and Compute Investment

Capability doubling time has accelerated from 7 months to 4 months, with recent models consistently outperforming the slower projection timeline

'The R&D spend on compute of these companies has risen exponentially at essentially the same rate as time horizon progress' - Joel, suggesting continued acceleration is likely

Future capability slowdown seems unlikely in the near term because 'a lot of those computes and investments are already baked in' with data centers planned through 2027-2028

METER suspects they're increasingly measuring 'the exact types of tasks that they're trying to get better at' as AI companies focus more on engineering capabilities

Chinese Models and Global AI Competition

Chinese models like DeepSeek don't appear on METER's frontier charts despite market excitement, lagging '9 to 12 months behind the US models'

The capability gap by time horizon is 'probably even larger than the gap by benchmark scores' with Chinese models potentially gaming benchmarks more than performing on held-out problems

METER prioritizes testing models 'that we anticipate being on the frontier' due to limited staff resources, focusing on cutting-edge capabilities rather than market impact

The Safety-Investment Paradox in AI

METER's safety-focused charts drive investment enthusiasm, with Reddit users asking 'how do I invest in OpenAI?' after seeing capability progression

'If I do believe that at some point we're going to get this AI that's improving itself... I think all of humanity being aware of it is sort of a precondition for us all being able to figure out what to do about it' - Chris

The AI industry presents a unique 'Baptist and bootlegger' dynamic where both builders and safety researchers warn about potential catastrophic risks from the same technology

Financial obligations from massive infrastructure investments could create pressure to 'continue development when you otherwise would rather invest more in safety' - Chris

METER's Nonprofit Model and Talent Bottleneck

METER operates with only 30 people while identifying '20-30 world important problems' but can only research 'one, maybe two, maybe three if we do an extraordinary job this quarter' - Joel

'The vibe inside of METER is a state of triage' with the organization 'bottlenecked on technical talent' rather than model access from AI labs

The team includes former AI lab staff who left after making sufficient money and now want to work on public-facing research without equity compensation

METER tries to 'pay competitively on cash compensation' while offering researchers the ability to 'work on whatever research we think will be most informative to the public'

From Odd Lots. Get a note like this from every new episode.

Subscribe to Notes Upgrade