Get the latest ideas from Dwarkesh Patel.
Plus the best new takeaways about artificial intelligence from other top podcasts — read in minutes, not hours.
or
By continuing, you agree to podbrain's Terms and Privacy Policy.
This episode features Dwarkesh Patel delivering a solo monologue — drawn from his written blog post — exploring one of the most underappreciated constraints in modern AI development: sample efficiency. Dwarkesh is a podcast host and writer known for long-form technical interviews with AI researchers and economists.
The talk covers the staggering data requirements of frontier language models compared to human learning, the mechanics of reinforcement learning as synthetic data generation, and why scaling model size alone cannot close the efficiency gap. Dwarkesh walks through the Chinchilla scaling law, the economics of the AI data labeling industry, and what the sample efficiency problem means for the two core ambitions of AI labs: automating white-collar work and automating AI research itself. The episode closes with a tease of a future post on intelligence explosions built atop LLMs.
Sample Efficiency: The Hidden Bottleneck in AI Progress
One definition of intelligence is sample efficiency — how much data is needed to operate fluently in a domain — and it is not clear meaningful progress has been made here in recent years.
"The main way that AIs have been getting better is from adding more and better data and scaling the compute required to develop that data in the first place." - Dwarkesh
Reinforcement learning functions as synthetic data generation: compute is dumped against a verifier or LLM-as-judge to identify high-quality rollouts, which are then used as training targets — analogous to next-token prediction on internet text.
With GRPO, models generate hundreds to thousands of rollouts per task to solve the credit assignment problem, versus a human student who might practice a problem once or twice.
The Data Industry Powering Frontier Models
Companies like Merkur and Scale AI post listings for Word document specialists, legal M&A diligence writers, and management consultants — illustrating how domain-specific and bespoke expert training data must be.
Each skill requires at least hundreds of human experts generating example completions, writing rubrics, and explaining chain-of-thought reasoning.
The data labeling and RL environment industry is already earning billions per year in revenue, with Dwarkesh projecting it will reach 'decabillions' soon.
"The correct way to think about these models is not like a human who has learned all these different skills... It's more like a Frankenstein's monster, which has been built out of a billion graphs of carefully constructed examples all sewn together." - Dwarkesh
The Million-Fold Data Gap: Humans vs. AI Models
A human absorbing ~2,000 words per hour from birth to adulthood accumulates roughly 200 million tokens; frontier models train on tens to hundreds of trillions of tokens — a gap of close to one million times.
A teenager learns to drive in about 20 hours of practice; even including 16 years of world-building experience, that is still 3–4 orders of magnitude less data than Waymo and Tesla use for self-driving training.
Humans can learn to teleoperate any robot arm within hours; current AI systems cannot perform complex open-ended robotic tasks even with millions of hours of demonstrations collected.
Deaf individuals who consume far fewer than 200 million language tokens still develop general intelligence, suggesting sensory data volume is not the source of human cognitive efficiency.
Why Three Common Objections to This Gap Fall Short
Objection 1 — Evolution pre-trained us: The human genome is only 3 gigabytes, with 1–2% protein-coding, which is not enough storage for a pre-trained neural network. Evolution more likely found the right hyperparameters and loss functions, not the weights themselves.
Even granting the evolution argument, it does not explain why every new marginal capability still requires enormous amounts of new data — unlike humans, who don't need 100 professors to learn a new programming language after being educated once.
Objection 2 — Multimodal sensory data: Including all sensory input from birth yields tens to hundreds of billions of tokens, but blind and deaf people still develop general intelligence, undermining the claim that sensory volume explains human efficiency.
Objection 3 — Just scale bigger: The Chinchilla scaling law shows parameter count and data requirements enter the loss function independently. Even with infinite parameters, required data would only decrease by a factor of ~10, while humans are thousands to millions of times more sample-efficient.
Frontier models currently sit around 5 trillion parameters; the human brain has ~100 trillion synapses — 1–2 orders of magnitude larger — yet scaling alone cannot bridge the efficiency gap.
Why Sample Inefficiency Doesn't Block White-Collar Automation
The labs' bet on white-collar automation rests on the fact that common tasks — software engineering, analysis, accounting — are common enough to be brought into the training distribution at scale.
"AIs can learn these skills by firehosing gigawatts of training at a time, and what they learn can be amortized across billions of sessions at once. So we can be ludicrously inefficient in training them up and still be wildly in the green." - Dwarkesh
Human lifespan limits the quantity and breadth of training any individual can receive; AI has no such constraint, making even wildly inefficient training economically rational.
Dwarkesh predicts overall demand for human software engineers will be higher in 2027 than today, driven by AI as a complementary input rather than a direct substitute.
The Path to AGI: Automating AI Research Itself
For jobs requiring frequent out-of-distribution thinking, the labs' plan is to first automate AI research, then have automated AI researchers solve the sample efficiency problem.
Epoch AI reports open models lag state-of-the-art frontier models by approximately 4 months — a relatively small gap that Dwarkesh attributes to data being the primary driver of progress and being easily distillable from public APIs.
"The way that people currently think about an intelligence explosion is very clumsy — either people dismiss the possibility of AI speeding up AI progress altogether, or they assume that some kind of god pops out the other end." - Dwarkesh
Dwarkesh teases a future blog post reasoning carefully about what accelerated AI progress looks like when built atop the specific architecture and limitations of LLMs, rather than assuming a discontinuous leap.
Resources Mentioned
Joe Abercrombie First Law Series 3 Books Collection Set (The Blade Itself, Before They Are Hanged, Last Argument Of Kings)
problem means for the two core ambitions of AI labs: automating white-collar work and automating AI research itself. The episode closes with a tease of a future post on intelligence explosions built a
Itself For jobs requiring frequent out-of-distribution thinking
ven by AI as a complementary input rather than a direct substitute. The Path to AGI: Automating AI Research Itself For jobs requiring frequent out-of-distribution thinking, the labs' plan is to first
Even if you increase the number of parameters by infinity
rameters as is necessary to make that happen. So take the constants from the Chinchilla scaling law paper Even if you increase the number of parameters by infinity, that would only decrease by a facto
AI ESSENTIALS FOR ACCOUNTANTS, CFOS, ANALYSTS AND CONSULTANT. PROVEN AI PROMPTS, PROVEN AI TEMPLATES, PROVEN AI WORK TOOLS AND SHEETS.
o overarching objectives they have, which are one, automate white-collar work, and two, automate AI research itself? The bet that the labs are making with white-collar work is that the common tasks th
Adapting To AI & Automation A Comprehensive Guide to Excelling in an Era of Technological Advancements, Job Displacement, and Career Navigation in the Age of AI (Money IQ Series Book 2)
omplementary input of AI.
The lab's plans for this latter category of jobs is first to automate AI research and then have the automated AI researchers solved the sample efficiency problem. So then th
problems that stand in the way of human-like intelligence and learning? This is a very complicated question
stion is, can AIs, which do not have human-level sample efficiency, nonetheless solve the remaining research problems that stand in the way of human-like intelligence and learning? This is a very comp
From Dwarkesh Patel. Get a note like this from every new episode.