Skip to main content

The ML Tasks Of Autonomous Vehicle Development

This report goes through the different tasks in the autonomous vehicle development lifecycle and the various machine learning techniques associated with them.
Created on September 5|Last edited on September 30
While we're still a ways away from level 5 autonomy, many autonomous vehicle researchers agree on the most vital tasks required to get there. In this report, we'll look at many of these–both the tasks and the subtasks–and explore how different machine learning methods can be used to solve them. We'll end by putting all the pieces of the puzzle back together and seeing what the end-to-end pipeline looks like.
Let's get going.

What We'll Be Covering




Breaking Down Autonomous Vehicle Tasks

Let us start by breaking down the major steps researchers are using today:

Perception

"Perception" in autonomous driving refers to the task of determining where objects like cars and pedestrians are, which lane we're driving in, etc. This machine learning task coordinates different sensors to perceive the environment around the vehicle.
These sensors include sonar, radar, LiDAR, cameras, and more. However, there's an added bit of complexity here too, some objects don't move (static elements) while others do (dynamic elements) and it is important to segregate the two.

Static Elements

Static elements don't move. You can split them into two broad categories:
On-Road Elements
  1. Roads and lane markings
  2. Things that segregate regions on the roads like zebra crossings and important messages such as school up ahead.
  3. Road obstructions - cones, lane dividers, construction signs etc
  4. Free Space
Off-Road Elements
  1. Curbs that define the boundaries within which the vehicle needs to be driven.
  2. Traffic signals that periodically change and signal whether the vehicle is allowed to move forward, left, right.
  3. Road signs like those showing the speed limit, indicating direction, whether there is a hospital, a school, and so on.

Dynamic Elements

These are the elements that move around the vehicle and need to be modeled in that context.
These include:
  1. Four-wheelers like trucks, buses, cars, and so on.
  2. Two-wheelers, like motorcycles, bicycles, and so forth.
  3. Pedestrians
The reason we split these elements into two separate categories is that one is in motion and needs to be modeled differently.

Element Identification

Now that we know what we're looking for and have the data coming in from our various sensors, let's start picking out these elements from that. The most intuitive form of data is from the cameras.
These images can be analyzed using conventional methods in computer vision such as:
  • 2D Object Detection - The autonomous vehicle task of 2D object detection aims to find different objects in an image and label them with a bounding box and object type to identify the different types of elements present near the AV
  • Semantic Segmentation - With semantic segmentation, each pixel in an image is labeled if it belongs to a certain object. This is more accurate than drawing bounding boxes around each object and is important for tasks like finding drivable area, the location of sidewalks etc.
You can explore what this type of data looks like in the interactive visualization panels below in Weights & Biases.
💡

Run set
133

  • Depth Estimation - Depth is extracted from either monocular (single) or stereo (multiple views of a scene) images. Traditional methods use multi-view geometry to find the relationship between the images. The panel below is an example of a depth map.

Run set
1

  • Pose Estimation - Pose estimation is a computer vision technique that predicts and tracks the location of a person or object. This is done by looking at a combination of the pose and the orientation of a given person/object.
For sensors like LiDAR which provide 3D information, the techniques used by machine learning engineers are quite different. Fundamentally the tasks include:
  1. 3D Object Detection - Similar to 2D object detection but finds a bounding box in 3D.
  2. Point Cloud Segmentation - Similar to semantic segmentation but works in 3D.
You can see an interactive visualization of LiDAR data in the Weights & Biases panel below.

Run set
20


Further Reading




Localization and Mapping

Now that we have an idea of what's around us, we create a "3D map" consisting of all the objects that were perceived in the previous step to make the decision: What do I do next?
In the case of autonomous vehicles, this map is constructed using a priori mapping and simultaneous localization and mapping (SLAM) algorithms. However, localization also has to be analyzed from multiple perspectives including:
  1. Local Localization - Local localization is done in order to find the location of the vehicle in the "current context" i.e. which elements are present in its immediate surroundings using sensor data.
  2. Global Localization - Global localization entails finding the location of the vehicle in a larger world to make decisions like where the vehicle should go next. A common way of doing this is using GPS data.

A Priori Mapping

Isn't it a lot easier to drive on a route that we use very often? We end up memorizing a lot of static elements like lanes, road boundaries, stop signs, and more so why not do the same thing in autonomous vehicles? In order to do this, extremely accurate and detailed sensor data along with GPS information is collected for certain routes by driving through them.
This dataset is then used to perform both local localization and obstacle detection. Local localization is performed by finding similarities in the newly observed data and the available dataset while object detection is done by finding the differences between the two.
This sounds like a great way to find the local context but then, why doesn't the task end there? Not only can obstacles change, but weather conditions such as snow or fog, also create problems for this technique due to the drastic change in the environment's appearance, making map matching difficult.
Additionally, changes to the road, such as new construction zones, speed limits or traffic lights, also pose problems, as the vehicle assumes that major components of the environment will be identical to the original a priori data. Thus, a vehicle with only priori-based localization and mapping may speed through construction zones or new traffic lights without noticing the change in traffic rules.

SLAM

Consider a home robot vacuum. Without SLAM, it will just move randomly within a room and may not be able to clean the entire floor surface. In addition, this approach uses excessive power, so the battery will run out more quickly. On the other hand, robots with SLAM can use information such as the number of wheel revolutions and data from cameras and other imaging sensors to determine the amount of movement needed. This is called localization. The robot can also simultaneously use the camera and other sensors to create a map of the obstacles in its surroundings and avoid cleaning the same area twice. This is called mapping. Source: What Is SLAM?
Conventionally, once an obstacle is recognized, that area of the map will always be avoided regardless of the fact that the obstacle may be moving as well. This is not a great approach, especially in autonomous driving where there is a very large number of dynamic obstacles. Hence, newer SLAM algorithms don't just need to determine if an obstacle is present at a certain location or not, but also the probability that the obstacle will be at that location in the future.
Various probabilistic techniques have been researched for discriminating between static and dynamic objects in the environment using SLAM-based methods.
Though these techniques can potentially allow a vehicle to be driven autonomously in any situation, they can be computationally intensive and also require analysis of a lot of complexities like human driver behavior while ensuring that the vehicle can safely react to this behavior.
A popular example of this is the Tesla Model S which allows the car to self-drive on highways but needs human intervention in more complex scenarios like navigating intersections.

semantic_view
satellite_view
1
2
This is an example of what global mapping data might look like
💡

Path Planning, Decision Making and Motion Control

The next step, now that we know what's around us, where the empty spaces are, where other cars are located, and where they may be located some time in the future?
Now we need to decide whether we need to keep going straight or steer or slow down to reach the destination. This process entails path planning and decision-making.
The task of the path planning module in autonomous vehicles is used to select a route through the road network which takes it from its current position to the destination while avoiding obstacles seen at that instant.
In a dynamic scenario where the autonomous vehicle must navigate in any environment and scenario, this is not always feasible.
The alternative is using different algorithms to generate multiple possible future scenarios in which the vehicle takes different paths to reach the destination, and where one of these paths is chosen depending on multiple parameters like time taken and feasibility.
Exploring many future possibilities according to the current state and choosing the best case scenario based on some user defined parameters. That is exactly what reinforcement learning does!
💡
Reinforcement learning is used to create a policy that informs the system what the ‘best’ action is at any given state. If the policy has been built as a goal-reaching problem, then the autonomous vehicle will behave in such a way as to maximize its chances of reaching the goal.
The selected path is then fed into the behavioral layer so that the vehicle navigates the selected route and interacts with other elements while being compliant with driving conventions and traffic rules.
On a completely empty road, this is a simple matter, as everything is static but in a real-world driving scenario, everything must be modeled with some uncertainty as vehicles may change positions (often unpredictably) in the future. Thus, the behavioral layer must use probabilistic planning formalisms.
Finally, we know where to go and how to go there and what we need to do at that instant. It is time to actually speed up/slow down while steering left/right. This is where the motion control module kicks in which actually executes the recipe given to it and makes the vehicle move!

Further Reading

Testing Our Autonomous Vehicle

Let's say we build all the modules to complete all the tasks mentioned in the previous sections, chain them together and build an end-to-end autonomous vehicle system. How do we test this?
Putting an AV directly on a real road for testing is extremely dangerous (understatement) because a vehicle going off-course may lead to significant damage to other vehicles and possibly even loss of life. Simulation to the rescue!
Designers can use advanced simulation tools to virtually test how the vehicle will perform in millions of different scenarios at a fraction of the cost and extremely quickly. Through simulation, the vehicle can be "driven" in silico for millions of miles to validate its performance across a large variety of scenarios like low-density traffic or being at an intersection.
Moreover, for reinforcement learning-driven approaches, simulations are a great way of generating data to train the model providing faster development of safe and robust autonomous vehicles.

Conclusion

In this article, we broke down the problem of creating an autonomous vehicle system into the associated tasks and subtasks and how each of them can be tackled. What's mind-blowing is that as humans we do all of this instantaneously while driving, but each of these modules requires extensive computation and analysis in an autonomous vehicle.
The driving module of the vehicle needs to:
  1. Gather all the data from the sensors (perception)
  2. Create a "map" to find where the obstacles are and which areas are drivable (localization and mapping)
  3. Find possible paths that will avoid the obstacles, be safe and will actually lead to the destination (path planning)
  4. Find feasible steps that can be used to follow the selected path (decision making)
  5. Actually follow the steps by steering left/right while accelerating/decelerating
Repeat these at regular intervals to keep track of as many dynamic elements as possible.
I hope this article gave you an insight into the different tasks of the complex autonomous vehicle development problem.

Iterate on AI agents and models faster. Try Weights & Biases today.
File<(table)>