Lex Fridman · the podbrain notes · Feb 1, 2026

6 min read

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI

From Lex Fridman 6 min read

Episode

0:00 0:00

Lex Fridman

Subscribe to Notes Upgrade

Lex Fridman

Key Takeaways

01
DeepSeek R1's January 2025 release surprised everyone with near state-of-the-art performance for allegedly $5 million in training costs, sparking intense AI competition
02
RLVR (Reinforcement Learning with Verifiable Rewards) enables dramatic scaling where models can train for weeks and continuously improve on math and coding tasks
03
Pre-training scaling laws still hold but low-hanging fruit has been picked - the real action is now in post-training with RLVR and inference-time compute
04
Chinese open-weight models like DeepSeek, Qwen, and Kimi are dominating while US companies increasingly keep models closed, creating strategic concerns
05
Cloud Opus 4.5 has generated massive hype for coding tasks, with many developers now shipping 50%+ AI-generated code according to recent surveys
06
Continual learning remains expensive and limited - most 'learning' happens through expanded context windows rather than weight updates
07
Tool use and computer automation are still primitive despite demos, representing a major bottleneck for true AI agent capabilities
08
The 996 work culture (9am-9pm, 6 days/week) is becoming standard at frontier AI labs, leading to significant burnout among researchers

Get the latest ideas from Lex Fridman.

Plus the best new takeaways about artificial intelligence from other top podcasts — read in minutes, not hours.

Continue with Google

By continuing, you agree to podbrain's Terms and Privacy Policy.

These notes may contain occasional inaccuracies. Learn how podbrain notes are made

This conversation features Sebastian Raschka, author of Build a Large Language Model from Scratch and the upcoming Build a Reasoning Model from Scratch, alongside Nathan Lambert, post-training lead at Allen Institute for AI and author of the definitive Reinforcement Learning from Human Feedback book. Both are respected machine learning researchers, engineers, and educators who provide technical depth while maintaining accessibility.

The discussion centers on the DeepSeek moment of January 2025, when the Chinese company released DeepSeek R1 with near state-of-the-art performance at allegedly much lower costs, intensifying global AI competition. They explore the current landscape of open versus closed models, the dominance of Chinese open-weight systems, and the technical breakthroughs in post-training methods.

Key topics include the evolution from pre-training to post-training focus, the rise of RLVR (Reinforcement Learning with Verifiable Rewards), the ongoing relevance of scaling laws, and practical considerations for AI education and career paths. They also examine the cultural dynamics of AI development, from Silicon Valley's intense work culture to the broader implications for human civilization.

The DeepSeek Moment and International AI Competition

DeepSeek R1's January 2025 release achieved near state-of-the-art performance with allegedly $5 million in training costs, compared to hundreds of millions for comparable Western models, shocking the AI community and accelerating competition.

Chinese companies like DeepSeek, Qwen, Kimi, and MiniMax are releasing powerful open-weight models while US companies increasingly keep their best models closed, creating a strategic imbalance in global AI development.

"I don't think nowadays, 2026, that there will be any company who [wins everything] because researchers are frequently changing jobs, changing labs" - Sebastian, noting that ideas flow freely but resources and hardware create differentiation.

The business model difference is crucial: Chinese companies use open models to gain international influence since US companies won't pay for Chinese API subscriptions due to security concerns, while US companies monetize through subscriptions.

Architecture Evolution: From GPT-2 to Modern Transformers

Despite rapid advancement, fundamental architectures remain remarkably similar to GPT-2, with most changes being incremental tweaks like mixture of experts, different attention mechanisms, and normalization layers.

The transformer architecture from Attention Is All You Need established the encoder-decoder structure, with GPT focusing on just the decoder part for autoregressive text generation, forming the basis for all current language models.

Mixture of experts (MOE) allows models to have multiple specialized feed-forward networks with a router selecting which experts to use, packing more knowledge while using less compute per token during inference.

"You can convert one from one, you can go from one into the other by just adding these changes, basically. This fundamentally is still the same architecture" - Sebastian on the continuity from GPT-2 to modern models.

Scaling Laws: Pre-training vs Post-training Dynamics

Pre-training scaling laws still hold but low-hanging fruit has been picked - the real excitement and gains are now in post-training with RLVR and inference-time compute scaling.

"Pre-training has gotten extremely expensive. I think to scale up pre-training, it's also implying that you're going to serve a very large model to the users" - Nathan on the economic constraints of scaling.

RLVR (Reinforcement Learning with Verifiable Rewards) shows logarithmic scaling where 10x more compute yields linear improvements in performance, unlike RLHF which plateaus quickly.

The cost structure differs dramatically: pre-training is a one-time expense that gives permanent capabilities, while inference scaling costs money per query but can be adjusted based on user demand and willingness to pay.

Post-training Revolution: RLVR and Reasoning Models

RLVR enables models to learn through trial and error on verifiable tasks like math and coding, with training runs now lasting weeks and showing continuous improvement unlike traditional RLHF which plateaus.

"Just 50 steps, like in a few minutes with RLVR, the model went from 15% to 50% accuracy" - Sebastian demonstrating RLVR's rapid capability unlocking on math problems.

Inference-time scaling allows models to 'think' for minutes or hours before responding, generating hidden reasoning traces that dramatically improve accuracy on complex problems.

The training pipeline now consists of pre-training (knowledge acquisition), mid-training (specialized skills like long context), and post-training (capability unlocking through RLVR and RLHF).

The Coding Revolution and Developer Experience

Recent surveys show developers are shipping 50%+ AI-generated code, with senior developers more likely to use AI extensively than junior developers, suggesting expertise enables better AI utilization.

Cloud Opus 4.5 has generated massive hype for coding tasks, with many finding it superior to other models for complex programming work and architectural decisions.

"I use basically half and half cursor and cloud code because I find them to be like fundamentally different experience and both useful" - Lex on the complementary nature of different AI coding tools.

The debate over learning and struggle continues: while AI makes coding faster, there's concern about junior developers missing fundamental learning experiences that come from working through problems independently.

Education and Learning in the AI Era

Build a Large Language Model from Scratch exemplifies the philosophy that building systems from scratch is the most effective way to understand them, providing hands-on experience with transformer architectures.

The educational challenge is finding the right balance between AI assistance and independent struggle - too much AI help prevents deep learning, while too little wastes time on mundane tasks.

Reinforcement Learning from Human Feedback addresses the philosophical complexities of preference optimization, explaining why RLHF is "never, ever fully solvable" due to the fundamental challenge of quantifying human preferences.

"I think there's a fun thing... losing my mind, that you use the router and the non-thinking model. I'm like, how do you live with that?" - Nathan on the importance of using reasoning models for complex tasks.

Silicon Valley Culture and the Future of Work

The 996 work culture (9am-9pm, 6 days/week) is becoming standard at frontier AI labs, with intense competition driving researchers to work extreme hours despite significant burnout risks.

Apple in China by Patrick McGee illustrates similar patterns of extreme work dedication, including marriage-saving programs for engineers working on supply chain development.

"My friends who are professors seem on average happier than my friends who work at a frontier lab" - Nathan observing the human cost of the AI race.

The Silicon Valley bubble creates both incredible productivity through reality distortion fields and dangerous disconnection from broader human experiences and perspectives worldwide.

Timelines and the Future of Human Civilization

AGI definitions remain contentious, but many converge on "a system that could reproduce most digital economic work" or the "remote worker" standard as a practical benchmark.

The superhuman coder milestone from AI safety frameworks may be achievable within years, but full automation faces the "jagged" nature of AI capabilities - excellent at some tasks, poor at others.

100 years from now, the current AI revolution will likely be remembered as part of the broader computing revolution, similar to how we view the Industrial Revolution's various mechanical innovations today.

"Humans do tend to find a way. I think that's what humans are built for is to have community and find a way to figure out problems" - Nathan expressing cautious optimism about navigating AI's challenges.

From Lex Fridman. Get a note like this from every new episode.

Subscribe to Notes Upgrade

Books Mentioned

Build a Large Language Model (From Scratch)

Sebastian Raschka

Build a Reasoning Model (From Scratch)

Sebastian Raschka

The RLHF Book: Reinforcement learning from human feedback, alignment, and post-training LLMs

Nathan Lambert

Season of the Witch: Enchantment, Terror and Deliverance in the City of Love

David Talbot

Attention is All You Need: Simple explanation of the paper

Henri van Maarseveen

Apple in China

Patrick McGee

These notes may contain occasional inaccuracies. Learn how podbrain notes are made

State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI

Episode

Get the latest ideas from Lex Fridman.

The DeepSeek Moment and International AI Competition

Architecture Evolution: From GPT-2 to Modern Transformers

Scaling Laws: Pre-training vs Post-training Dynamics

Post-training Revolution: RLVR and Reasoning Models

The Coding Revolution and Developer Experience

Education and Learning in the AI Era

Silicon Valley Culture and the Future of Work

Timelines and the Future of Human Civilization

Books Mentioned

Build a Large Language Model (From Scratch)

Build a Reasoning Model (From Scratch)

The RLHF Book: Reinforcement learning from human feedback, alignment, and post-training LLMs

Season of the Witch: Enchantment, Terror and Deliverance in the City of Love

Attention is All You Need: Simple explanation of the paper

Apple in China

More in Science & Tech

The 5-Minute AI Weekly Recap: Realignment Week

Why Kalshi's John Wang Says Perps Are 'the Most Pure Trading Instrument'

Your Company Doesn’t Need an AI Strategy

The data black hole at the center of AI

The New Rules of Media | Marc Andreessen & Ben Horowitz

UFO Researcher Details The STRANGEST Alien Encounters - Preston Dennett | DEBRIEFED ep 93

Lex Fridman

More episodes with Lex

Welcome to PodBrain.

Get the latest ideas from Lex Fridman.

The DeepSeek Moment and International AI Competition

Architecture Evolution: From GPT-2 to Modern Transformers

Scaling Laws: Pre-training vs Post-training Dynamics

Post-training Revolution: RLVR and Reasoning Models

The Coding Revolution and Developer Experience

Education and Learning in the AI Era

Silicon Valley Culture and the Future of Work

Timelines and the Future of Human Civilization

Books Mentioned

Build a Large Language Model (From Scratch)

Build a Reasoning Model (From Scratch)

The RLHF Book: Reinforcement learning from human feedback, alignment, and post-training LLMs

Season of the Witch: Enchantment, Terror and Deliverance in the City of Love

Attention is All You Need: Simple explanation of the paper

Apple in China

More in Science & Tech

The 5-Minute AI Weekly Recap: Realignment Week

Why Kalshi's John Wang Says Perps Are 'the Most Pure Trading Instrument'

Your Company Doesn’t Need an AI Strategy

The data black hole at the center of AI

The New Rules of Media | Marc Andreessen & Ben Horowitz

UFO Researcher Details The STRANGEST Alien Encounters - Preston Dennett | DEBRIEFED ep 93

Lex Fridman

More episodes with Lex

Finish creating your account

Authentication Issue

How did you hearabout PodBrain?

Let's personalize your experience:When do you like to read newsletters?

What are your interests?

Got it! We think you might like these shows.

Welcome to PodBrain.

How did you hear
about PodBrain?

Let's personalize your experience:
When do you like to read newsletters?