This new robot can do your laundry

A new foundation model for robotics!
Created on November 1|Last edited on November 1
Comment
While AI has revolutionized fields like image generation, language translation, and medical discovery, physical intelligence — the ability to manipulate and interact with the physical world — remains largely elusive. Complex tasks such as folding a shirt or organizing objects on a table pose fundamental challenges due to the complexity of real-world environments and dynamic interactions. Developing AI that can handle such physical tasks requires more than digital data; it requires real-world embodied experience and sophisticated control mechanisms.
Physical Intelligence (PI) has introduced π0, a foundation model for robots that promises to be a milestone in the pursuit of general-purpose, dexterous robotic capabilities. Designed to take on diverse manipulation tasks, π0 draws on Internet-scale semantic knowledge as well as real-world data from varied robot experiences. This combination allows π0 to respond to physical commands as flexibly as large language models respond to language instructions.
﻿
﻿
The promise of generalist robot policiesCurrent robotics largely relies on narrow, pre-programmed instructions, limiting robots to repetitive tasks in static environments, such as assembly lines. To handle complex, real-world tasks, robots must adapt to unique and varying contexts, a skillset that demands immense data and sophisticated algorithms. π0’s generalist approach seeks to overcome these challenges by aggregating experiences across diverse tasks, which enables it to generalize and perform new tasks with minimal additional training. Similar to how generalist models in language outperformed task-specific models, a broad-trained robot model could apply generalist understanding to specialized tasks, making robot training and usage more efficient and adaptable.
Cross-embodiment Training and multi-robot data collectionπ0’s training relies on a large-scale dataset gathered from multiple robot configurations, each representing different task sets. This includes data from dexterous tasks such as bussing dishes, packing, folding, and assembling — tasks that require different types of movement, sensory processing, and decision-making. This data diversity provides π0 with a generalized understanding of physical interactions, enabling it to perform zero-shot tasks (tasks it hasn’t explicitly trained on) with proficiency. By covering a broad spectrum of real-world tasks, π0 establishes a foundation for physical intelligence that can be fine-tuned to specific robot configurations or task complexities.
Semantic understanding from vision-language pretrainingA unique strength of π0 is its integration of a vision-language model trained on Internet-scale data, which endows it with pre-existing knowledge about objects and interactions. While VLMs typically output language tokens, π0 adapts these models to output high-frequency motor commands, achieving real-time control capabilities essential for dexterous manipulation. Through a novel flow-matching technique, π0 extends its VLM’s utility from text-based outputs to continuous action-based outputs, necessary for complex, high-frequency robot tasks such as folding laundry or assembling items.
﻿
Post-training for complex dexterous tasksFine-tuning π0 on high-quality datasets allows it to master complex, multi-stage tasks like laundry folding or table bussing. The post-training phase reinforces π0’s generalist knowledge by focusing on precision and efficiency within specific tasks, much like large language models are fine-tuned to improve their response alignment with user needs. In the laundry folding task, for instance, π0 can autonomously retrieve clothing from a dryer, fold it, and organize it on a table. Likewise, in table bussing, π0 demonstrates versatility by employing strategies like stacking plates or shaking off debris before disposal, showcasing emergent behaviors that arise from its robust training foundation.
Evaluating π0 against prior modelsTesting π0 on a range of zero-shot and fine-tuned tasks, Physical Intelligence compared it to OpenVLA, Octo, and smaller π0 variants. Tasks included complex real-world applications such as bussing cluttered tables and handling deformable objects like laundry. Across all tasks, π0 consistently outperformed previous models, with the larger version proving significantly more effective than the smaller variant. The evaluations highlight π0’s superiority in handling multi-stage behaviors, deformable object manipulation, and the ability to adapt strategies based on task context.
Future directions for generalist robot policiesPhysical Intelligence envisions π0 as a foundation model for general-purpose robots, capable of learning new tasks with minimal additional training. Future developments will focus on enhancing π0’s long-horizon planning, robustness, and safety mechanisms. A collaborative approach, involving partnerships with robotics labs and hardware developers, will be crucial in refining π0’s capabilities and extending its application to real-world use cases.
With π0, Physical Intelligence aims to revolutionize robotic autonomy, bringing robots closer to achieving true physical intelligence. For collaboration or employment inquiries, contact Physical Intelligence at research@physicalintelligence.company.
The Paper: https://www.physicalintelligence.company/download/pi0.pdf﻿
﻿
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.