PaLM-SayCan: Robots Following Instructions Through Feasible Actions
PaLM-SayCan implements Google's PaLM model to drive robots to make useful and feasible actions following user-provided instruction.
Created on August 17|Last edited on August 17
Comment
Google Research and Everyday Robots have collaborated to develop SayCan, a novel approach for AI-driven robotics that uses language models to follow instructions with optimal actions which are feasible and useful.
SayCan was first developed in April this year, though the new implementation of PaLM and the open-sourcing of various resources leads a new discussion around major improvements of the architecture. The project is presented in the paper "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances" alongside a blog post breaking it down.
How does PaLM-SayCan work?
PaLM-SayCan uses Google's PaLM language model to determine steps to solve user-provided instructions. It weighs the feasibility of each action by it's relevance to the instruction and whether it's possible in the current context.
Say: Instruction relevance
Any language model could be plugged into SayCan, but the researchers found that using Google's PaLM language model gave the best results. The language model being used is responsible for choosing the best solution out of an array of possible responses by weighing their relevance and usefulness to the user input instruction.
For example, with the instruction "How would you put an apple on the table?" it would rate "Find an apple" highly and "Place the coke" low.
Can: Action feasibility
At the same time, the environment is observed by the robot and processed through a value function to weigh whether any particular response is possible at the given moment. This value function does not pay mind to the instruction and works solely to weigh all possible actions in the action space.
In its initial position, it would rate actions like "Find an apple" and "Go to the table" highly because those are actions that are perfectly feasible in the neutral state, while an action like "Pick up the apple" would be rated lowly because it has not yet found an apple to pick up.
SayCan: Weighing actions from both sides
By multiplying the determined weights of both sides, an optimal action can be decided as the one to perform. The combined decision-making leads to better solutions by encouraging actions that are both useful to the instruction and actionable in the current environment while discouraging actions that are useful but not feasible and actions that are easily doable but not relevant.

Following up on the established example, the first step the model would determine is "Find an apple". Even though the language model determines that "Place the apple" is a better option, the value function knows it cannot place an apple that it is not yet holding. At the same time, though the value function thinks that going straight to a table or counter is the easiest thing to do, the language model knows that it would be better to look around for an apple.
Creating and following a list of steps
The model does this whole dance a few times to build a list of steps, appending each new step as it goes, updating its knowledge of what it's done already and what's possible going forwards to optimize action. The robot will first find an apple, then it will pick up the apple, then it will go to the table, and so on until the task is finished.
Play with SayCan yourself
Coming alongside the new PaLM implementation of SayCan is an open-sourced tabletop simulation that can be used to try out SayCan for yourself. This open-source setup is build to run using GPT-3 however, so you won't get to play with the best-performing PaLM version of it.
A notebook with all the required code is available here: https://github.com/google-research/google-research/blob/master/saycan/SayCan-Robot-Pick-Place.ipynb
Find out more
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.