Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild
Or, teaching four-legged robots to walk in the real world
Created on February 28|Last edited on March 5
Comment
Introduction
Legged robots can carry out missions in challenging environments that are too far or too dangerous for humans, such as radiated areas or the surfaces of other planets. After all, unlike wheeled or tracked robots of similar size, legged robots can walk over challenging terrain with steep slopes, steps, and gaps.
There has been notable progress in legged robotics in recent times, such as BigDog: The Rough-Terrain Quadruped Robot, Learning agile and dynamic motor skills for legged robots, Learning Quadrupedal Locomotion over Challenging Terrain, etc. There have also been several commercial applications of legged robots in the real world, such as Spot by Boston Dynamics, A1 by Unitree Robotics, Digit and Cassie by Agility Robotics, etc.
Here are a few of those examples in a W&B Table:
Progress in legged robotics in recent time
1
Commercial Applications of Legged Robots in the Real World
10
In spite of the recent progress made with legged robots and their widespread commercial usage, they have been unable to match the performance of animals in traversing challenging real-world terrain. Bipedal animals like humans and quadrupeds like dogs can briskly walk or run in such environments by foreseeing the upcoming terrain and planning their footsteps based on visual information. This is because animals have evolved naturally to combine proprioception and exteroception to adapt to highly irregular terrain shapes and surface properties such as slipperiness or softness, even when visual perception is limited. In other words: we anticipate our next footfalls and instinctually adapt to our environment.
With that preamble, let's look at the question the recent paper Learning robust perceptive locomotion for quadrupedal robots in the wild poses:
Is it possible to endow legged robots with the ability to combine proprioception and exteroception to adapt to highly irregular terrain shape and surface properties irrespective of visual perception?
💡
To put it plainly: can we help robots anticipate changes in terrain too?
Before we proceed any further, let's clarify a little jargon we introduced above:
- Proprioception is basically the animal's ability to predict movements for the body based on the surrounding. This ability helps the animal to traverse a real-life terrain.
- Exteroception is the ability of the animal to react to any unexpected external stimuli (for example, the floor its traversing collapses) and make necessary corrections to its kinematic parameter.
Ok, with that done, let's look at some of the challenges here.
What Doesn't Work: Challenges for Building Legged Robots
One of the biggest difficulties in building legged robots that can reliably traverse challenging difficult, real-world terrain is to reliably interpret incomplete and noisy perception. Exteroceptive information provided by sensors onboard the robots is incomplete and often unreliable in most real-world scenarios. Here's why:
Shortcomings of Onboard Sensors
- Stereo camera-based depth sensors, which most existing legged robots rely on, require texture to perform stereo matching and consequently struggle with low-texture surfaces or when parts of the image are under or overexposed.
- Time of Flight cameras often fail to perceive dark surfaces and become noisy under sunlight.
- Depth sensors by nature cannot distinguish soft unstable surfaces such as vegetation from rigid ones.
- An elevation map is often used to represent geometric terrain information extracted from depth sensor measurements. Since it relies on the robot’s estimated pose it is affected by errors in the estimate.
Shortcomings of Existing Approaches
- Pre-computed Terrain Maps: Conventional approaches assume that terrain information and any uncertainties encoded in the map are reasonably accurate, and the focus shifts solely to generating the motion. This works well for offline approaches since they rely on a pre-scanned terrain map, formulate a hand-crafted cost function for the terrain map, and optimize a trajectory that is simply replayed on the robot. However, the assumption of perfect knowledge of the full terrain is not useful for online methods that have to use the onboard sensors and computation resources to construct a map and continuously replan trajectories during execution. Real-world terrain can't be perfectly mapped.
- Optimization of Footholds: Picking footholds and generating trajectories given accurate terrain information is achieved in one of the following ways:
- Reducing the planning time with heuristics
- Faster locomotion can be achieved by reducing the planning time with heuristics or using Convolutional Neural Networks to calculate foothold cost more efficiently.
- Leveraging pre-planned motion reference and optimizing motion of the robot online by utilizing onboard LiDAR sensor data.
- Data-driven Approaches: These methods are able to incorporate more complex dynamics without compromising real-time performance. Learning-based quadrupedal or bipedal locomotion for simulated characters has been achieved by using reinforcement learning and recently these RL-based locomotion controllers have been successfully transferred to physical robots. However, these methods don't use any kind of visual information.
- Locomotion Learning + Exteroceptive Information: This approach by the paper RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and Optimal Control combines a learning-based foothold planner and a model-based whole-body motion controller to transfer policies to the real world in a laboratory setting. However, the applications are limited to rigid terrain with mostly flat surfaces and are still constrained in their deployment range and their performance is tightly bound to the quality of the map, which often becomes unreliable in the field.
In both model-based and learning-based approaches, the assumption of flawless map quality precludes the application of these methods in uncontrolled outdoor environments. In all the aforementioned methods, handling uncertainties in terrain perception has remained an unsolved problem. These controllers avoid catastrophic failures by simply refraining from using visual information in outdoor environments or by adding heuristically defined reflex rules.
💡
A Solution to Achieve Robust Perceptive Locomotion
The authors of the paper Learning robust perceptive locomotion for quadrupedal robots in the wild present a terrain-aware locomotion controller for quadrupedal robots that overcomes limitations of previous approaches and enables robust traversal of harsh natural terrain at unprecedented speeds. At its core, the controller is based on a principled solution to incorporating exteroceptive perception into locomotion control.
Controller
The Controller of the robot is a computing system that connects the robot in order to control the movements of its limbs or legs. The controller in this case is trained by a reinforcement learning paradigm called the Teacher-Student Framework. In this paradigm, the Teacher which is an agent advises another one, the Student by suggesting actions. The Student should take while learning a specific task in a sequential decision problem while the Teacher is limited by a Budget, which is basically the number of times such advice can be given.
Problem Formulation
The Control Problem is formulated in discrete-time dynamics, where the environment is fully defined by the state at time step t. The policy performs an action and observes the environment via which comes from an observation model . Then, the environment moves to the next state with transition probability and returns a reward .
When all states are observable such that , this can be considered a Markov Decision Process or MDP. When there is unobservable information, however, such as external forces or full terrain information in this case, the dynamics are modeled as a Partially Observable Markov Decision Process or POMDP.
💡
The reinforcement learning objective is to find a policy that maximizes the expected discounted reward over the future trajectory, such that
Although it is easy to solve fully-observable MDPs using lots of pre-existing reinforcement learning algorithms, solving POMDPs is more challenging as the environment state is only partially observable. This is often overcome by constructing a belief state from a history of observations in an attempt to capture the full state. In deep reinforcement learning, this can be solved by
- stacking a sequence of previous observations or
- by using architectures that can compress past information such as Recurrent Neural Networks or Temporal Convolutional Networks.
The belief state is the short-term memory of the agent, which maintains the model of current environment needed between time steps.
💡
Teacher Policy
A Teacher Policy is first trained via Reinforcement Learning using a privileged learning paradigm as proposed by the paper Learning by Cheating by virtue of which it has access with full access to privileged information such as noiseless terrain measurements, ground friction, and the disturbances that were introduced in the form of the ground-truth state of the environment. This privileged training enables the teacher policy to discover the optimal behavior given perfect knowledge of the terrain. The ground-truth state is given by a random target velocity over randomly generated terrain with random disturbances which is given by
...where:
- , represents the longitudinal and lateral velocity of the random target velocity and
- represents the yaw velocity
Policy Architecture
The authors train the Teacher Policy using Policy Proximal Optimization where the Teacher is modeled as a Gaussian policy that is given by
...where:
- where is implemented by a multilayer perceptron parameterized by θ
- represents the variance for each action.
The policy consists of three MLP components:
- The exteroceptive encoder receives and outputs a smaller latent representation such that
- The privileged encoder receives the privileged state and outputs a latent representation
- The main MLP network.
Teacher Observation
The Teacher Observation is defined as
...where:
- is the proprioceptive observation containing the body velocity, orientation, joint position and velocity history, action history, and each leg’s phase.
- is the exteroceptive observation which is a vector of height samples around each foot with five different radii
- is the privileged state that includes contact states, contact forces, contact normals, friction coefficient, thigh and shank contact states, external forces and torques applied to the body, and swing phase duration.
Action Space
The Action Space is inspired by Central Pattern Generators for producing rhythmic motor patterns. Each leg keeps a phase variable and defines a nominal trajectory based on the phase. The nominal trajectory is a stepping motion of the foot tip and we calculate the nominal joint target for each joint actuator using inverse kinematics. The action from the policy is the phase difference and the residual joint position target . For more detailed information on the action space, refer to supplementary section S5 of the paper.
Central Pattern Generators are neuronal circuits that when activated can produce rhythmic motor patterns such as walking, breathing, flying, and swimming in the absence of sensory or descending inputs that carry specific timing information.
💡
Reward Function
The authors design a positive reward for following the command velocity and a negative reward for violating some imposed constraints given by
...where:
- is the desired horizontal velocity
- is the current horizontal body velocity with respect to the body frame.
The same reward is applied to the yaw command as well. The velocity component is penalized orthogonal to the desired velocity as well as the body velocity around the roll, pitch, and yaw. Additionally, we use shaping rewards for body orientation, joint torque, joint velocity, joint acceleration, foot slippage as well as shank and knee collision. Body orientation reward was used to avoid strange posture of the body. Joint-related reward terms were used to avoid overly aggressive motion. Foot slippage and collision reward terms were used to avoid them. We tuned the reward terms by looking at the policy’s behavior in simulation. In addition to the traversal performance, we checked the smoothness of the locomotion. For more detailed information on reward terms, refer to supplementary section S7 of the paper.

Overview of the Teacher Policy Training. Source: Figure 5.1 from the paper
Student Policy
After we train a teacher policy that can traverse various terrain with the help of privileged information, we distill it into a student policy that only has access to information that is available on the real robot. The authors use the same training environment as for the teacher policy, but add additional noise to the student height sample observation given by where is a noise model applied to the height sample input.
Height Sample Randomization
During student training, the authors inject random noise into the height samples using a parameterized noise model . Two different types of measurement noise are applied when sampling the heights:
- Shifting scan points laterally
- Perturbing the height values
Each noise value is sampled from a Gaussian distribution , where each refers to the variance that controls a different noise component for leg . Both types of noise are applied in three different scopes, all with their own noise variance:
- per scan point (sampled at each time step)
- per foot (both episodic and sampled at each time step)
- per episode (constant for all points)

Height Scan Noise Model. Source: Figure 7(A) from the paper
Additionally, the authors define the following three mapping conditions with associated noise parameters to simulate changing map quality and error sources:
- Nominal noise assuming good map quality during regular operation. This is selected at the beginning of each training episode in a ratio of 60%.
- Large offsets through high per-foot noise to simulate map offsets due to pose estimation drift or deformable terrain. This is selected at the beginning of each training episode in a ratio of 30%.
- Large noise magnitude for each scan point to simulate a complete lack of terrain information due to occlusion or mapping failure. This is selected at the beginning of each training episode in a ratio of 10%.
Finally, each training terrain is divided into cells and add an additional offset to the height sample, depending on which cell it was sampled from. This simulates transitions between areas with different terrain characteristics, such as vegetation and deep snow. The parameter vector is also part of a learning curriculum and its magnitude increases linearly with training duration.

The different noise configurations z used to simulate different operating conditions. “Zero noise" is applied during teacher training, while “nominal noise" represents normal mapping conditions during student training. “Large offset" noise simulates large map offsets due to pose estimation drift or deformable terrain surfaces. “Large noise" simulates a complete lack of terrain information due to occlusion or sensor failure. Source: Figure 7(B) of the paper.
For more detail on the height sample representation refer to the supplementary section S8 from the paper.
Recurrent Belief State Network
When there is a large noise in the exteroception, it becomes unobservable, thus the dynamics is considered to be POMDP. In addition, the privileged states are not observable due to the lack of sensors to directly measure. Therefore, the policy needs to consider the sequential correlation to estimate the unobservable states. The authors use a recurrent belief state encoder to combine sequences of both exteroception and proprioception to estimate the unobservable states as a belief state. The belief state encoder is responsible for learning to integrate proprioceptive and exteroceptive data without resorting to heuristics.
The recurrent belief state encoder learns an adaptive gating factor that controls how much exteroceptive information to pass through. First, the RNN encoder is given by
...where:
- represents proprioception
- that represents the exteroceptive features from noisy observations
- hidden state
Following this, the attention vector is computed from . It controls how much exteroceptive information enters the final belief state . This set of operations is given by
...where:
- and are fully-connected networks and
- represents the sigmoid activation function

Source: Figure 6(C) from the paper
For the decoder, the same gate is used where it is used to reconstruct the privileged information and the height samples. This is used to calculate a reconstruction loss that encourages the belief state to capture veridical information about the environment.

Source: Figure 6(D) from the paper

Overview of the Student Policy Training. Source: Figure: 5.2 from the paper
The Student Policy is learnt by Imitation learning is a reinforcement learning paradigm in which a robot learns manipulation by observing the expert's demonstration, and skills can be generalized to other unseen scenarios. This process not only extracts information of the behavior and surrounding environment, but also learns the mapping between the observation and the performance.
💡
Deploying the Models
The trained controller was deployed on the ANYmal-C robot by ANYbotics in a zero-shot manner without any fine-tuning with two different sensor configurations:
The Controller takes in observations by the onboard sensors and the desired velocity command and outputs the action which is given by the target position of each joint of the robot.
Elevation Map
A robot-centric 2.5D elevation map is built at 20 Hz by estimating the robot’s pose and registering the point-cloud readings from the sensors accordingly. The policy runs at 50 Hz and samples the heights from the latest elevation map, filling a randomly sampled value if no map information is available at a query location. The authors built the elevation mapping pipeline for fast terrain mapping on a GPU to parallelize point-cloud processing by following an approach to update the map in a Kalman-filter fashion and additionally perform drift compensation and ray casting to obtain a more consistent map. This fast mapping implementation was crucial to maintain fast processing rates and keep up with the fast locomotion speeds achieved by our controller.

The robot-centric elevation map is used by the locomotion controller to anticipate upcoming terrain. The red dots along the foothold represent the exteroceptive observation which is a vector of height samples around each foot with five different radii. Source: https://www.youtube.com/watch?v=zXbb6KQ0x
The Elevation Mapping pipeline was inspired by the paper Probabilistic Terrain Mapping for Mobile Robots With Uncertain Localization.
💡

Overview of the Deployment Process. Source: Figure: 5.3 from the paper
Results
Robust Locomotion in the Wild
The proposed controller was deployed in a wide variety of terrain including alpine, forest, underground, and urban environments as showcased in the following panel. The controller was consistently robust and had zero falls during all deployments. Because of the exteroceptive perception, the robot could anticipate the terrain and adapt its motion to achieve fast and smooth walking. This was particularly notable for structures that require high foot clearance, such as stairs and large obstacles. Notable performance highlights include:
- Traversal of Natural Environments: ANYmal-C with the proposed controller was able to successfully traverse challenging natural environments with steep inclination, slippery surfaces, grass, and snow as demonstrated by figures A-J in the following panel. The performance of the controller was robust in these conditions, even when occlusion and surface properties such as high reflectance impeded exteroception.
- Traversal of Underground Environments: ANYmal-C with the proposed controller was also robustly deployed in underground environments with loose gravel, sand, dust, water, and limited illumination as demonstrated by figures K-N in the following panel.
- Traversal of Urban Environments: Urban environments present several unique and important challenges such as the traversal of stairs. While SoTA quadruped robots such as Spot from Boston Dynamics requires a dedicated engagement mode and proper orientation for traversing stairs, the proposed controller does not require any special mode for stairs, and can traverse stairs natively in any direction and any orientation, such as sideways, diagonally, and turning around on the stairway as demonstrated by the figures O-R in the following panel.
- Robust to Combination of Different Challenges: As demonstrated by Figure R in the following panel, the controller remained consistently robust with zero failures while traversing stairs with snow on them. Since snow makes stairs slippery the sensors receive incomplete and erroneous exteroceptive data. Depth sensors either fail due to the high reflectivity of snow or estimate the surface profile to be on top of the snow, whereas the robot’s legs sink below this level. Foot slippage in snow can also cause large drift in the kinematic pose estimation, making the map even more inconsistent.
Robust Locomotion in the Wild by ANYmal-C
6
A Hike in the Alps
The authors conducted a hiking experiment in which we tested if ANYmal could complete an hour-long hiking loop on the Etzel mountain in Switzerland in order to further evaluate the robustness of the proposed controller. The hiking route was 2.2 km long, with an elevation gain of 120 m. Completing the trail required traversing steep inclinations, high steps, rocky surfaces, slippery ground, and tree roots. In spite of all these challenges, the robot was able to successfully complete the entire hike without any kind of failure, stopping only to fix a detached shoe and swap batteries. Key highlights from this experiment are:
- ANYmal was able to reach the summit in 31 minutes, which is faster than the expected human hiking duration indicated in the official signage.
- ANYmal finished the entire path in 78 minutes which is virtually the same duration suggested by a hiking planner that predicted 76 minutes, which rates the hike difficult. The difficulty levels are chosen from easy, moderate, and difficult, calculated by combining the required fitness level, sport type, and technical complexity as per the Komoot Help Guides.
- During the hike the controller faced various challenges:
- The ascending path reached inclinations of up to 38% with rocky and wet surfaces.
- On the descent through a forest, tree roots formed intricate obstacles and the ground proved very slippery.
- Vegetation above the robot sometimes introduced severe artifacts into the estimated elevation map.
Despite all the challenges in the terrain, ANYmal finished the hike without any human help and without a single fall, stopping only to fix a detached shoe and swap batteries.
💡
A Hike in the Alps
1
Exteroceptive Challenges
The robot perceives the environment in the form of height samples from an elevation map constructed from point cloud input. The authors used LiDAR in some experiments (figures 3-7 in the following table) and active stereo cameras in others (figures 2 and 3 in the following table) to test the robustness of the controller in order to the sensing modality. The controller often encountered many circumstances in which exteroception provides incomplete or misleading input, as shown in the cases demonstrated in the following table (figures 2-7). The estimated elevation map can unreliable due to sensing failures, limitations of the 2.5D height map representation, or viewpoint restrictions due to onboard sensing.
- Since most depth sensors rely on light to infer distance, either through time-of-flight measurements or stereo disparity, they commonly struggle with reflective or translucent surfaces.
- Figure 2 in the following table demonstrates one such case of sensing failure due to a reflective metal floor that induced large depth outliers which appear as a trench in the elevation map.
- Figure 3 in the following table demonstrates another such case of sensing failure due to snow. Since snow is highly reflective and has very little texture, stereo cameras could not infer depth, which lead to an empty map.
- The 2.5D elevation map representation cannot accurately represent overhanging objects such as tree branches or low ceilings as demonstrated by figure 4 in the following table. These were integrated into the height field and were misrepresented as tall obstacles.
- Since the sensors cannot distinguish between rigid or soft materials, the map gave misleading information in soft vegetation or deep snow as demonstrated by figure 5 in the following table.
- Slippery or deformable surfaces caused odometry drift because they violate the assumption of stable footholds, commonly adopted by kinematic pose estimators. Since map construction relies on such pose estimation to register consecutive input point clouds, the map became inaccurate in such circumstances as demonstrated by figure 6 in the following table.
- Since the sensors were only located on the robot itself, areas behind structures were occluded and not presented in the map, which was especially problematic during uphill walking as demonstrated by figure 7 in the following table.
Note that the proposed controller could handle all of these challenging conditions gracefully, without a single failure. This was made possible by training the belief state estimator to assess the reliability of exteroceptive information. When exteroceptive information was incomplete, noisy, or misleading, the controller could always gracefully degrade to proprioceptive locomotion, which was shown to be robust. The controller thus aims to achieve the best of both worlds: achieving fast predictive locomotion when exteroceptive information is informative, but seamlessly retaining the robustness of proprioceptive control when it is not.
Exteroperceptve Challenges
1
Note that the aforementationed table was constructed based on Figure 3 from the paper.
💡
Evaluating the Contribution of Exteroception
The authors conduct several controlled experiments to quantitatively evaluate the contribution of exteroception and compare the proposed controller to a proprioceptive baseline that does not use exteroception.
Success Rate of Overcoming Fixed-height Steps
Wooden steps of various heights (from 12 cm to 36.5 cm) were placed ahead of the robot, which performed 10 trials to overcome each step with a fixed velocity command. A trial was considered successful if the robot overcomes the step within 5 seconds. The success rate of the proprioceptive baseline dropped at 20 cm step height when the front legs started frequently getting stuck at the step. Even when the front legs successfully overcame the step, the hind legs often failed to fully step up. In contrast, the proposed controller reliably traversed steps of up to 30.5 cm in height. Since our controller could anticipate the step, it lifted its legs higher without making physical contact first and leaned its body forward to let the hind leg swing over the step (Figure 4A). Until this height, the dominating failure reason was the robot evading the step sideways instead of falling. When approaching steps higher than 32 cm, our controller hesitated to walk forward because it learned that steps of such height are at or above the robot’s physical limits and are likely to incur a high cost.
Success Rate of Proposed vs Baseline Controller dealing with Fixed Height Steps (Source Figure 4A-B from the paper)
1
Success Rate of Overcoming Fixed-height Steps
The robot was given a fixed path over the obstacles and tracked it using a pure pursuit controller. The path traverses several types of obstacles – an inclined platform, a raised platform, stairs, and a pile of blocks. The platforms are 20 cm high, the stairs are 17 cm high and 29 cm deep each, and the blocks are each 20 cm in both height and depth. The proposed controller followed the given path smoothly without any assistance. The exteroceptive perception provided advanced information on the upcoming obstacles, allowing the controller to adjust the robot’s motion before it made contact with the obstacles, facilitating fast and smooth motion through the obstacle course. However, the baseline failed to track the path without human assistance. During execution, it got stuck on all three obstacles and we had to lift and push the robot to continue the experiment.

Source: Figure 4C from the paper
Comparison of Locomotion Speed
The authors measure the maximum locomotion speed of both controllers over flat ground and in the presence of obstacles. The controller was given a constant forward, lateral, or turning command and recorded the velocity on flat ground and over a 20 cm step. The proposed controller walked at 1.2 m/s, while the baseline could only achieve 0.6 m/s on flat ground in both the forward and lateral directions. The difference became even more pronounced over the obstacle. The proposed controller could traverse the obstacle without any notable slow-down, while the baseline was stymied. The turning velocity showed the biggest difference between the baseline policy and the proposed one. The proposed controller could turn at 3 rad/s while the baseline policy could only turn at 0.6 rad/s, which is a five-fold difference over the baseline.
Note that the baseline controller only receives a directional command and learns to walk as fast as possible in the commanded direction.
💡
Comparison of Locomotion Speed. Source: Figures 4G-H from the paper
1
Evaluating Robustness of the Controller with Belief State Visualization
The authors conduct a number of controlled experiments to examine how our controller integrates proprioception and exteroception. The experiments were performed with two types of obstacles that provide ambiguous or misleading exteroceptive input: an opaque foam obstacle that appears solid but cannot support a foothold and a solid but transparent obstacle. Each obstacle was placed ahead of the robot and commanded the robot to walk forward at a constant velocity.
- Foam Block Obstacle: The sensors perceived the foam block as solid and the robot consequently prepared to step on it but could not achieve a stable foothold due to the deformation of the foam. We can see from the respective figure in the panel attached below how the internal belief state (blue) was revised as the robot encounters the misleading obstacle: the controller initially trusted the exteroceptive input (red) but quickly revised its estimate of terrain height upon contact. Once the correct belief had been formed, it was retained even after the foot left the ground, showing that the controller retains past information due to its recurrent structure.
- Transparent Obstacle: The transparent obstacle is a block made of clear, acrylic plates, which are not accurately perceived by the onboard sensors. We can see from the respective figure in the panel attached below, that the robot walked as if it were on flat ground until it made contact with the step, at which point it revised its estimate of terrain profile upwards and changed its gait accordingly.
- Sensors Covered: The authors simulated complete exteroception failure by physically covering the sensors, thus making them fully uninformative. The robot was commanded to walk up and down two steps of stairs. With an unobstructed sensor, the controller traversed the stairs gracefully, without any unintended contact with the stair risers, adjusting its footholds and body posture to step down the stairs softly. When the sensors were covered, the map had no information and the controller received random noise as input. In this condition, the robot made contact with the riser of the first stair, which could not be perceived in advance, revised its estimate of the terrain profile, adjusted its gait accordingly, and successfully climbed the stairs. On the way down, the blinded robot made a hard landing with its front feet but kept its balance and stepped down softly with its hind legs.
- Slippery Surface: Lastly, the authors tested the locomotion over an elevated slippery surface. After the robot stepped onto the slippery platform, it detected the low friction and adapted its behavior to step faster and keep its balance. The momentarily sliding feet violated the assumption of the kinematic pose estimator, which in turn destabilized the estimated elevation map and rendered exteroception uninformative during this time. The controller seamlessly fell back on proprioception until the estimated elevation map stabilized and exteroception became informative again.
Evaluating Robustness of the Controller with Belief State Visualization. Source: Figure 4 from the paper
1
DARPA Subterranean Challenge
The proposed controller was used as the default controller in the DARPA Subterranean Challenge missions of team Cerberus which has won the first prize in the finals. In this challenge, the proposed controller drove ANYmal robots to operate autonomously over extended periods of time in underground environments with rough terrain, obstructions and degraded sensing in the presence of dust, fog, water, and smoke. The proposed controller played a crucial role as it enabled four ANYmals to explore over 1700m in all three types of courses: tunnel, urban, and cave; without a single fall.
For more detailed information and insights on this amazing work, we recommend you check out the official project page corresponding to the paper.
💡
Possibilities of Future Works
- Future work could explicitly utilize the uncertainty information in the belief state. Currently, the policy uses uncertainty only implicitly to estimate the terrain. For example, in front of a narrow cliff or a stepping stone, the elevation map does not provide sufficient information due to occlusion. Therefore, the policy assumes a continuous surface and, as a result, the robot might step off and fall. Explicitly estimating uncertainty may allow the policy to become more careful when exteroceptive input is unreliable, for example using its foot to probe the ground if it is unsure about it.
- The current implementation of the controller obtains perceptual information through an intermediate state in the form of an elevation map, rather than directly ingesting raw sensor data. This has the advantage that the model is independent of the specific exteroceptive sensors. However, the elevation map representation omits detail that may be present in the raw sensory input and may provide additional information concerning material and texture.
- Elevation map construction in the current implementation of the policy relies on a classical pose estimation module that is not trained jointly with the rest of the system. Appropriately folding the processing of raw sensory input into the network may further enhance the speed and robustness of the controller.
- An occlusion model could also be learned, such that the policy understands that there’s an occlusion behind the cliff and avoids stepping off it.
- Another limitation of the current implementation of the controller is the inability to complete locomotion tasks that would require maneuvers very different from normal walking, for example recovering from a leg stuck in narrow holes or climbing onto high ledges. This could be attempted to solve in future work.
Conclusion
The proposed approach achieves substantial improvements over the previous SoTA approach in locomotion speed and obstacle traversability while maintaining exceptional robustness. It achieves these improvements by combining multi-modal perception, i.e, both proprioception and exteroception. The belief state encoder is trained end-to-end to integrate proprioceptive and exteroceptive data without resorting to heuristics. It learns to take advantage of the foresight afforded by exteroception to plan footholds and accelerate locomotion when exteroception is reliable, and can seamlessly fall back to robust proprioceptive locomotion when needed. The learned controller thus combines the best of both worlds: the speed and efficiency afforded by exteroception and the robustness of proprioception. Since the policy has been trained to handle significant noise, bias, and gaps in the elevation map, the robot can continue walking even when mapping fails or the sensors are physically broken.
The novel controller policy proposed by the authors in this paper results in the first rough-terrain legged locomotion controller that combines the speed and grace of vision-based locomotion with the high robustness of proprioception. The authors validated the combination of speed and high robustness achieved by the proposed controller through controlled experiments and extensive deployments in the wild, including an hour-long hiking route in the Alps that is rated difficult by existing hiking planners. The entire route was completed by the robot without human assistance (other than reattaching a detached shoe and swapping the batteries), in the recommended time for completion of this route by human hikers.
The breakthroughs showcased by the authors expand the operational domain of legged robots and opens up new frontiers in autonomous navigation. Navigation planners no longer need to identify the ground types or switch modes during autonomous operation. The proposed controller was used as the default controller in the DARPA Subterranean Challenge missions of team Cerberus which has won the first prize in the finals.
Similar Reports
Training Reproducible Robots with W&B
How I'm using W&B in my Robot Training Workflow.
Offline Reinforcement Learning with Collaborative Datasets
highlights: data visualization; easy collaboration; reproducible research
Tim & Heinrich — Democraticizing Reinforcement Learning Research
Since reinforcement learning requires hefty compute resources, it can be tough to keep up without a serious budget of your own. Find out how the team at Facebook AI Research (FAIR) is looking to increase access and level the playing field with the help of NetHack, an archaic rogue-like video game from the late 80s.
Peter Welinder — Deep Reinforcement Learning and Robotics
Peter Welinder, Robotics lead at OpenAI talks about his love of robotics, the early days of reinforcement learning, and the evolution of the robot hand.
Add a comment