Understanding State of the Art in Deep Learning: 3D Semantic Segmentation

This model takes input of a point cloud representing a real-world object and provides segmentation of the object into different parts.

Nicholas Bardy

Created on February 21|Last edited on October 4

Comment

﻿
Deep Learning: 3D Semantic SegmentationThe goal of this model is to take input of a point cloud representing a real-world object and provide segmentation of the object into different parts. 3D Semantic segmentation is a foundational problem of computer vision and has applications from self-driving cars to medical diagnoses.
The model is trained on the dataset ShapeNet. ShapeNet contains 3D point cloud data from a set of objects across 17 different categories. Providing labels for segmenting each object into separate parts. For example, Planes are segmented into wing, body, and tail.
This sweep provides an exploration of the dataset and capabilities of the model architecture. Running a sweep across each category while varying hyperparameters allows us to visualize how our model deals with different categories of images. Helping to answer questions such as: Are certain categories more challenging? Which categories take more time to train? Are there categories where we need more data?
The model architecture used in these training runs is based on the increasingly popular segmentation model architecture U-Net, an architecture designed for faster inference and smaller amounts of training data. U-Net was introduced in 2015 in a paper for Biomedical Image Segmentation﻿
﻿See the code →﻿
Comparing Ground Truth and Final Predictions﻿
Sweep runs88
﻿
﻿
Point Cloud Visualizations﻿
Run table82
﻿
﻿

Add a comment

Tags: Intermediate, Computer Vision, 3D, Semantic Segmentation, Experiment, U-Net, Plots, ShapeNet

Iterate on AI agents and models faster. Try Weights & Biases today.