Sergey Levine - Building LLMs for the Physical World - [Invest Like the Best, EP.465]

Sergei Levine is co-founder and researcher at Physical Intelligence, a company developing foundation models for robotics. Patrick O'Shaughnessy, CEO of Positive Sum and an investor in Physical Intelligence, hosts this conversation exploring the challenges and promise of general robotic intelligence.

From Invest Like the Best with Patrick O'Shaughnessy 4 min read

Episode

0:00 0:00

Invest Like the Best with Patrick O'Shaughnessy

Subscribe to Notes Upgrade

Invest Like the Best with Patrick O'Shaughnessy

Key Takeaways

01
Physical Intelligence aims to build foundation models that can control any embodied system to do any task, similar to how language models evolved toward general capability
02
Robotics hardware costs have plummeted dramatically - from $400,000 PR2 robots a decade ago to arms costing perhaps $3,000 today
03
The company's vision language action models use chain of thought reasoning, allowing robots to think about tasks before executing them
04
Sergei believes general robotics intelligence may be easier than narrow applications, leveraging broader data sources for better world understanding
05
Current systems can already handle complex dexterous tasks and adapt to different robot embodiments without special programming
06
The biggest technical risk is handling the breadth of unexpected situations in open-world environments where anything could happen
07
Timeline uncertainty exists around the bootstrap challenge - reaching sufficient usefulness for real-world deployment and autonomous data collection

Get the latest ideas from Invest Like the Best with Patrick O'Shaughnessy.

Plus the best new takeaways from other top podcasts — read in minutes, not hours.

Continue with Google

By continuing, you agree to podbrain's Terms and Privacy Policy.

These notes may contain occasional inaccuracies. Learn how podbrain notes are made

The discussion covers Physical Intelligence's approach to building foundation models that can control any physical robot to perform any task in any environment. Levine explains how this general approach may be more effective than narrow, task-specific robotics solutions, drawing parallels to how language models evolved to handle diverse applications.

Key topics include the technical challenges of robotics learning, the role of multimodal language models in providing common sense reasoning, hardware cost reductions enabling broader experimentation, and the timeline uncertainties around achieving truly useful robotic systems that can bootstrap themselves through real-world data collection.

The Case for General Purpose Robotic Intelligence

Physical Intelligence develops robotic foundation models that can control any embodied system to do any task, believing generality may be easier than narrow specialization in the long run.

"The reason that language models took over for all of those different application domains is because they can leverage much broader sources of data" - Sergei explains how general models establish better world understanding.

The approach enables unlocking people's imagination in building robots, similar to how personal computers created a Cambrian explosion of applications in the 90s.

"If there is a solution that someone can build on top of, there's a foundation model that you can prompt that'll provide like basic functionality, and then you can fine-tune it" - Sergei on enabling broader robotics experimentation.

Technical Architecture and Breakthrough Moments

Vision language action models serve as the foundation - LLMs adapted for robotic control through training on text, then images, then diverse robot data.

Chain of thought reasoning allows robots to think before acting: "The robot enters a scene, and instead of directly starting to move, it thinks about what it was asked to do" - Sergei.

A major breakthrough occurred when models could be improved just through high-level semantic supervision rather than requiring more teleoperation data.

"What that means is that the bottleneck had actually shifted from the lowest level, meaning the robot's ability to physically do the task, to this middle level" - Sergei on the evolution toward coaching-based improvement.

Hardware Evolution and Cost Breakthroughs

Robotics hardware costs have dropped dramatically: "When I started working in robotics about a decade ago, I worked with a robot called a PR2, which I believe had a cost of about $400,000" - Sergei.

Current robotic arms cost perhaps one-tenth of the $30,000 systems used when Sergei started his Berkeley lab, with potential for even lower costs.

Low-cost arms wouldn't work in industrial settings with traditional control methods but become viable with advanced learning-based approaches.

The cost reduction involves both hardware and software advances, making general-purpose robotics more practical today than ever before.

Surprising Capabilities and Robot Olympics Results

Physical Intelligence successfully completed nearly all tasks from Benji Holson's proposed Robot Olympics focused on everyday human activities.

"We literally use this as a test of our task onboarding process" - Sergei explains how they could handle diverse tasks without developing anything special for each one.

The system demonstrated surprising dexterity and ability to work across different robot embodiments, including multi-fingered hands and varying degrees of freedom.

Tasks included opening doors, washing greasy pans, using plastic bags to pick up dog waste - mundane activities that challenge current robotic systems.

Timeline Challenges and Bootstrap Problems

"I do think the timeline is uncertain. If anything, my sense of the timeline has gotten more optimistic since we started" - Sergei on the unpredictable nature of progress.

The bootstrap challenge involves reaching sufficient usefulness for real-world deployment so robots can start collecting data from open world settings at scale.

Timeline uncertainty is exacerbated by different possible technology paths - whether systems rely more on demonstrations versus reinforcement learning from autonomous data.

Sergei positions himself as optimistic among established robotics researchers but pessimistic compared to robotics entrepreneurs regarding timelines.

Future Applications and Morphing Robot Concepts

Drawing from Michael Crichton's Prey, Sergei discusses the concept of robots that could morph into optimal shapes for specific tasks rather than being constrained to humanoid forms.

"You could imagine that you're building a house with a robot that is a swarm of 1,000 quadcopters" - Sergei on unconstrained robotic design possibilities.

Foundation models could enable experimentation with diverse form factors, from bulldozers to humanoids to robotic arms, all sharing fundamental interaction understanding.

The key insight is that fundamentals of object interaction, movement, and causality remain conserved across different physical systems.

From Invest Like the Best with Patrick O'Shaughnessy. Get a note like this from every new episode.

Subscribe to Notes Upgrade