Get the latest ideas from Odd Lots.
Plus the best new takeaways about artificial intelligence from other top podcasts — read in minutes, not hours.
or
By continuing, you agree to podbrain's Terms and Privacy Policy.
Tracy Alloway and Joe Wiesenthal host Joel Becker (technical staff) and Chris Painter (president) from METER, a San Francisco-based AI safety research nonprofit. METER specializes in measuring AI autonomy and catastrophic risk assessment through their viral time horizon charts.
The conversation centers on METER's exponential capability charts showing AI progress, particularly the latest Claude Opus model achieving 12-hour task completion capabilities. The discussion explores the methodology behind these measurements, their implications for AI safety, and the disconnect between safety research goals and public investment enthusiasm.
Key topics include the acceleration from 7-month to 4-month capability doubling times, Chinese model performance gaps, the challenges of nonprofit AI research, and the unusual dynamic where AI companies simultaneously promote and warn about their technology's potential dangers.
Decoding METER's Viral Time Horizon Charts
METER's charts measure task difficulty by human completion time, not AI working duration - Claude Opus 4.6's '12 hours' means it can complete tasks that take humans 12 hours, with 50% success rate
The methodology involves humans completing identical tasks under similar conditions, timing their performance, then testing AI models on the same tasks to establish capability benchmarks
Tasks focus specifically on engineering work that 'a frontier AI lab engineer might be doing' - software engineering, machine learning, and fine-tuning AI models, not general human activities
The 50% threshold is chosen for statistical reliability and represents the point where 'it is more likely that the model will be able to do the task than that it can't' - Joel
Exponential Progress Acceleration and Compute Investment
Capability doubling time has accelerated from 7 months to 4 months, with recent models consistently outperforming the slower projection timeline
'The R&D spend on compute of these companies has risen exponentially at essentially the same rate as time horizon progress' - Joel, suggesting continued acceleration is likely
Future capability slowdown seems unlikely in the near term because 'a lot of those computes and investments are already baked in' with data centers planned through 2027-2028
METER suspects they're increasingly measuring 'the exact types of tasks that they're trying to get better at' as AI companies focus more on engineering capabilities
Chinese Models and Global AI Competition
Chinese models like DeepSeek don't appear on METER's frontier charts despite market excitement, lagging '9 to 12 months behind the US models'
The capability gap by time horizon is 'probably even larger than the gap by benchmark scores' with Chinese models potentially gaming benchmarks more than performing on held-out problems
METER prioritizes testing models 'that we anticipate being on the frontier' due to limited staff resources, focusing on cutting-edge capabilities rather than market impact
The Safety-Investment Paradox in AI
METER's safety-focused charts drive investment enthusiasm, with Reddit users asking 'how do I invest in OpenAI?' after seeing capability progression
'If I do believe that at some point we're going to get this AI that's improving itself... I think all of humanity being aware of it is sort of a precondition for us all being able to figure out what to do about it' - Chris
The AI industry presents a unique 'Baptist and bootlegger' dynamic where both builders and safety researchers warn about potential catastrophic risks from the same technology
Financial obligations from massive infrastructure investments could create pressure to 'continue development when you otherwise would rather invest more in safety' - Chris
METER's Nonprofit Model and Talent Bottleneck
METER operates with only 30 people while identifying '20-30 world important problems' but can only research 'one, maybe two, maybe three if we do an extraordinary job this quarter' - Joel
'The vibe inside of METER is a state of triage' with the organization 'bottlenecked on technical talent' rather than model access from AI labs
The team includes former AI lab staff who left after making sufficient money and now want to work on public-facing research without equity compensation
METER tries to 'pay competitively on cash compensation' while offering researchers the ability to 'work on whatever research we think will be most informative to the public'
From Odd Lots. Get a note like this from every new episode.