Introduction

Code | Paper →

In the Representing Scenes as Neural Radiance Fields for View Synthesis paper, the authors present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views.

Their algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x, y, z) and viewing direction (θ, φ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location.

They synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. They describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis.

If you'd like to learn more about the paper, check out this 2 minute paper video – Screen Shot 2020-04-13 at 12.17.06 AM.png

Baseline Model

We've created a colab notebook complete with a hyperparameter sweep, so you can reproduce this analysis in a colab. See if you can improve on the results by tweaking the hyperparameters.

Try this in a colab →

First, let's train a baseline model, and log our model's predictions in wandb. This lets us observe in real time how the model learns the representation of the underlying scene at each iteration.

Baseline Model

We've created a colab notebook complete with a hyperparameter sweep, so you can reproduce this analysis in a colab. See if you can improve on the results by tweaking the hyperparameters.

Try this in a colab →

First, let's train a baseline model, and log our model's predictions in wandb. This lets us observe in real time how the model learns the representation of the underlying scene at each iteration.

Rendering New Views From a Learned Neural Representation For a Single Scene

Let's try changing the various hyperparameters of our model and see how that chances the performance. Below we can see that the longer we train it, the better our model gets at constructing learning views from neural representations of a single scene.

We stop here at 10,000 iterations but I encourage you to try training the model for longer. Below is a video the authors of the paper were able to reproduce after 200,000 iterations – as you can see, the results are remarkably realistic.

After training for 200,000 iterations

Rendering New Views From a Learned Neural Representation For a Single Scene

Let's try changing the various hyperparameters of our model and see how that chances the performance. Below we can see that the longer we train it, the better our model gets at constructing learning views from neural representations of a single scene.

We stop here at 10,000 iterations but I encourage you to try training the model for longer. Below is a video the authors of the paper were able to reproduce after 200,000 iterations – as you can see, the results are remarkably realistic.

After training for 200,000 iterations

Effect of Changing Learning Rates

In this section, we vary the learning rate, while keeping all other hyperparameters the same. Let's pick a reasonable number of epochs to train our model for, say 1000. Here we can compare the loss functions and see that the ideal learning_rate is between 3e-4 and 7e-4.

5e-3 was too high, whereas 5e-5 and 5e-6 were too low. If you wanted to improve your model performance, I'd recommend trying more values in the range between [5e-4 and 7e-4].

We can also this pattern reflected in the videos rendered. Our model start out by not learning the underlying structures at 5e-3, then it slowly gets good at learning these 3D representations between 7e-4 and 3e-4, after which its performance slowly degrades until it cannot capture the neural representations of the scene at 5e-6. Keep in mind that we only trained our model for 1000 epochs. If we trained it for longer with that smaller learning rate, we might start to see really good performance.

Effect of Changing Learning Rates

In this section, we vary the learning rate, while keeping all other hyperparameters the same. Let's pick a reasonable number of epochs to train our model for, say 1000. Here we can compare the loss functions and see that the ideal learning_rate is between 3e-4 and 7e-4.

5e-3 was too high, whereas 5e-5 and 5e-6 were too low. If you wanted to improve your model performance, I'd recommend trying more values in the range between [5e-4 and 7e-4].

We can also this pattern reflected in the videos rendered. Our model start out by not learning the underlying structures at 5e-3, then it slowly gets good at learning these 3D representations between 7e-4 and 3e-4, after which its performance slowly degrades until it cannot capture the neural representations of the scene at 5e-6. Keep in mind that we only trained our model for 1000 epochs. If we trained it for longer with that smaller learning rate, we might start to see really good performance.

Effect of Changing The Embedding Size

In the next experiment, we take this learning rate of 5e-4, and train all our models for 1000 epochs as before. This time, we vary the embedding size and find ourselves in a goldilocks scenario once again.

We can observe from both the loss and psnr plots, and also the rendered videos from the learning neural representations of the scene that an embedding size of 2 is too small to capture the complexity of the underlying scene, whereas 10 is simply too large. I would encourage you to explore more embedding values near 6 to see if you can improve on our model's performance.

Effect of Changing The Embedding Size

In the next experiment, we take this learning rate of 5e-4, and train all our models for 1000 epochs as before. This time, we vary the embedding size and find ourselves in a goldilocks scenario once again.

We can observe from both the loss and psnr plots, and also the rendered videos from the learning neural representations of the scene that an embedding size of 2 is too small to capture the complexity of the underlying scene, whereas 10 is simply too large. I would encourage you to explore more embedding values near 6 to see if you can improve on our model's performance.

Effect of Adding More Layers

Next up - we add more hidden layers, while keeping our learning rate at 5e-4, embedding size at 6, and training all our models for 1000 epochs. We observe interestingly that adding more layers doesn't always equal more performance. A hidden layer size of 6 out-performed both networks with 4 and 8 layers, although it was very close. For this experiment, we can stick with a layer size of 6, and concentrate our efforts on tweaking other parameters that have a bigger impact on the loss and psnr.

Effect of Adding More Layers

Next up - we add more hidden layers, while keeping our learning rate at 5e-4, embedding size at 6, and training all our models for 1000 epochs. We observe interestingly that adding more layers doesn't always equal more performance. A hidden layer size of 6 out-performed both networks with 4 and 8 layers, although it was very close. For this experiment, we can stick with a layer size of 6, and concentrate our efforts on tweaking other parameters that have a bigger impact on the loss and psnr.

Effect of Adding More Neurons

Finally, we keep all the hyperparameters the same as above (learning rate = 5e-4, embedding size = 6, epochs = 1000, layer_size = 6) and tweak the size of the dense_layers. We see a clear pattern of the loss being inversely proportional to the dense layer size.

My GPU ran out of memory at 512 layers, but if you have a bigger GPU, I'd encourage you to try layer sizes greater than 256 to see if the model's performance continues to improve.

Effect of Adding More Neurons

Finally, we keep all the hyperparameters the same as above (learning rate = 5e-4, embedding size = 6, epochs = 1000, layer_size = 6) and tweak the size of the dense_layers. We see a clear pattern of the loss being inversely proportional to the dense layer size.

My GPU ran out of memory at 512 layers, but if you have a bigger GPU, I'd encourage you to try layer sizes greater than 256 to see if the model's performance continues to improve.

Running A Hyperparameter Sweep To Find The Best Model

Finally, we run a hyperparameter sweep to test out these learning rate, embedding size and other hyperparameters in combination and explore the hyperparameter space more thoroughly to find the best performing model. With W&B, you can run a hyperparameter sweep easily by specifying the parameters you'd like to try and the search strategy as a .yaml file.

See how you can launch a hyperparameter sweep in 5 mins →

Running A Hyperparameter Sweep To Find The Best Model

Finally, we run a hyperparameter sweep to test out these learning rate, embedding size and other hyperparameters in combination and explore the hyperparameter space more thoroughly to find the best performing model. With W&B, you can run a hyperparameter sweep easily by specifying the parameters you'd like to try and the search strategy as a .yaml file.

See how you can launch a hyperparameter sweep in 5 mins →

Parallel Co-ordinates Plot

Once you've run a sweep, you can visualize the results of your sweep with a parallel co-ordinates plot (as seen above), which maps all your runs wr.t. the metric of your choice, so you can find patterns easily. This plot is useful for honing in on combinations of hyperparameters that led to the best model performance.

Hyperparameter Importance Plot

The hyperparameter importance plot (seen here on the right) surfaces which hyperparameters were the best predictors of, and highly correlated to desirable values for your metrics.

Correlation is the linear correlation between the hyperparameter and the chosen metric (in this case loss). So a high correlation between epochs and loss here means that when the epochs have a higher value, the loss also has higher (negative) value and vice versa.

The parameter importances are the results of training a random forest with the hyperparameters as inputs and the metric (loss) as the target output. Here we can see batch_size, using optimizer_sgd were most important in predicting the loss values.

You can use both the parameter importance and parallel co-ordinates plots to hone in on hyperparameter values to try next.

Try it out yourself

We've created a colab notebook complete with a hyperparameter sweep, so you can reproduce this analysis in a colab. See if you can improve on the results by tweaking the hyperparameters.

Try this in a colab →