Chelsea Finn: Building Robots That Can Do Anything

Episode Summary

Aug 15, 2025

Robots that fold laundry, light candles, or even make a sandwich on command might sound like science fiction, but Chelsea Finn’s work is bringing these everyday tasks closer to reality than ever before. In a fascinating conversation, Finn unpacks the immense challenges of teaching robots to handle the messy, unpredictable physical world—and the innovative strategies her team is using to overcome them. From the complexity of simple chores to the promise of general-purpose robotic intelligence, this deep dive reveals how robotics is evolving beyond rigid machines into adaptable helpers.

One of the biggest hurdles in robotics today is that every new task often requires building a whole new system from scratch. “You need to build a different company for logistics, for wet lab automation, for robots in kitchens, for surgical robots,” Finn explains. This means designing new hardware, writing custom software, and figuring out how a robot should move for each specific job. Her company, Physical Intelligence, aims to change that by creating a “general purpose model” for robots—one flexible system that can learn to do any task in any environment. Drawing a parallel to language models behind chatbots, Finn notes, “If you want to build a coding assistant, you don’t nowadays develop something specifically for coding, but you build on models trained on large amounts of data.” The goal is to bring this kind of adaptable intelligence from the digital realm into the physical world.

But teaching robots to perform real-world tasks isn’t just about having lots of data. Finn highlights three main sources—industrial automation, YouTube videos, and simulations—and points out their limitations. Industrial data is massive but repetitive and lacks diversity; YouTube shows humans doing countless actions, but watching isn’t the same as doing; and simulations can scale easily but don’t feel real enough. “Scale is necessary... but it’s not sufficient for the entire problem,” she says. To truly teach robots, the data must be both large and high quality, capturing the complexity of real environments.

This insight is clear in the team’s work on folding laundry—a deceptively difficult task. Folding a single shirt might seem simple, but the robot must handle variability in clothing types, crumpled shapes, and unpredictable positions. Early attempts using imitation learning on controlled shirts worked well, but when faced with mixed laundry piles, the robot’s success rate plummeted to zero. After months of trial and error, the breakthrough came by borrowing ideas from language models: “We pre-train on all the data and then fine tune on a curated, consistent, high quality set of demonstration data.” This approach enabled the robot to fold five items in a row, a milestone that made Finn “go home very excited.” Though slow—20 minutes for five items—the progress was undeniable.

The team’s improvements didn’t stop there. By scaling up to bigger models like the 3 billion parameter vision-language model Polygema, the robot gained the ability to handle unseen clothing items and recover from interruptions mid-task. Finn describes moments of impressive dexterity, such as the robot reaching under tricky parts of shorts to flatten them, mimicking human-like finesse. Beyond laundry, the same training recipe helped robots tackle other household chores like cleaning tables, scooping coffee beans, and lighting candles. Testing in over 100 different homes showed the models could generalize to new environments, a huge leap from robots that only work in their training spaces. Of course, there were still funny fails—like mistaking an oven for a drawer or struggling with a thin cutting board—reminders that the technology is impressive but far from perfect.

Finn’s team also explores how robots can understand and respond to open-ended human prompts using hierarchical vision language action (VLA) models. Imagine telling a robot, “Make me a sandwich,” and having it break down that request into smaller steps—“pick up one slice of bread,” “add lettuce,” and so on—while a low-level model handles the physical motions. This two-tiered system acts like a smart planner paired with a skilled doer. Since collecting real human-robot interaction data is costly, the team cleverly uses language models to generate synthetic prompts based on existing robot videos, expanding the robot’s ability to handle diverse instructions without endless real-world examples.

What’s truly exciting is the robot’s adaptability. It can adjust to changes on the fly—if asked for a vegan sandwich with no pickles, it skips cheese and meat; if interrupted with a new request, it swaps out ingredients accordingly. This dynamic understanding marks a shift from rigid programming to flexible, context-aware behavior. Finn points out that while large AI models excel at language, they often fall short in robotics because they lack physical and visual experience. Her team’s specialized models outperform these general ones by integrating vision and action.

During the discussion, Finn reflects on the balance between academia and industry in robotics research. Industry labs have more resources but can sometimes waste them on unfocused experiments, while academia’s limited means encourage creative problem-solving. She also emphasizes that while synthetic data and simulations are valuable, there’s no substitute for real robot experiences in building truly intelligent machines. Reinforcement learning, where robots learn by trial and error, can further enhance skills beyond imitation.

Looking ahead, Finn envisions a future where robots possess “physical intelligence”—not just specialized tools but versatile helpers capable of a wide range of tasks. She encourages collaboration through open-source projects, highlighting the importance of community in advancing the field.

Chelsea Finn’s work offers a compelling glimpse into how robots are evolving from rigid automatons into adaptable partners that can understand complex instructions, learn from diverse experiences, and handle the unpredictable nature of the real world. The journey from folding a crumpled shirt to making a personalized sandwich may still be unfolding, but the progress made so far signals a future where robots could genuinely lend a hand with everyday chores.

YC Episode Summaries

Discussion about this post