Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
In this article, we explore how to achieve photorealistic rendering of large unbounded 3D scenes from novel camera angles while preserving fine-grained details.
Created on May 10|Last edited on February 10
Comment
In 2020, the paper NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis presented a novel method for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views.
Mip-NeRF, a follow-up work, demonstrated that it was possible to extend NeRF to represent the scene at a continuously-valued scale, effectively reducing objectionable aliasing artifacts and significantly improving the performance of the original approach.
Neural Radiance Fields have demonstrated impressive view synthesis results on objects and small bounded regions of space. However, the question that we ask ourselves today is
Is it possible for NeRF-based models to effectively represent an unbounded scene, where the camera may point in any direction and content may exist at any distance?
This is the problem that the authors of the paper Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields attempt to address. In this paper, the authors present Mip-NeRF 360, an extension of Mip-NeRF that uses a non-linear scene parameterization, online distillation, and a novel distortion-based regularizer to overcome the challenges presented by unbounded scenes.
The authors demonstrate the capability of the model to produce realistic synthesized views, and detailed depth maps for highly intricate, unbounded real-world scenes in which the camera rotates 360 degrees around a point.
Mip-NeRF 360 rendering unbounded real-world scenes in which the camera rotates 360 degrees around a point.
1
This article was written as a Weights & Biases Report which is a project management and collaboration tool for machine learning projects. Reports let you organize and embed visualizations, describe your findings, share updates with collaborators, and more. To know more about reports, check out Collaborative Reports.
💡
Here's what we'll be covering:
Table of Contents
What Are Neural Radiance Fields?The Novelty of Mip-NeRF 360ResultsLimitations of Mip-NeRF 360Potential for Negative ImpactConclusionSimilar Posts
What Are Neural Radiance Fields?
In order to understand what exactly Neural Radiance Fields is, let's break down the phrase into its component parts:
- The word neural obviously means that there's a Neural Network involved
- Radiance refers to the radiance of the scene that the Neural Network outputs. It is basically describing how much light is being emitted by a point in space in each direction, and
- The word Field means that the Neural Network models a continuous and non-discretized representation of the scene it learns.
Putting it all together and we can define that a Neural Radiance Field is a neural network that models the point and a viewing direction in a 3D space to the amount of light that is being emitted by the said point in each direction in a non-discrete manner. Functionally, it allows us to synthesize novel (new) views of complex scenes.

The Neural Network in this case, is a simple Multi-layered Perceptron or MLP with ReLU activation.
- The MLP consists of 9 fully-connected layers of width 256.
- The Input to the MLP consists of 2 components:
- which denotes the spatial position of a given point in 3D space.
- denotes a given viewing direction from the point.
- forms a single continuous 5D coordinate which is fed to the MLP.
- The output to the MLP consists of 2 components as well:
- which denotes the view composed from the point along the direction in RGB colorspace.
- denotes the density or transparency of the point.
- The value of lies in the range .
- A value of means there is nothing at the point or the point is transparent and A value of means that the point is opaque.
This architecture ensures that the output color can vary when observed from different angles, allowing NeRF to represent reflections and glossy materials, but that the underlying geometry represented by \sigmaσ is only a function of position.
💡
In order to enable the NeRF MLPs to represent higher frequency detail, the inputs and are each preprocessed by a component-wise sinusoidal positional encoding given by
The Positional Encoding strategy is proposed by the paper Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains.
💡
Novel View Synthesis
Each pixel in an image corresponds to a ray propagating through the 3D space given by where the ray parameter is a real number. In order to calculate the color of the ray , NeRF randomly samples distances along the ray and passes the points and direction through its MLPs to calculate and . The resulting output color is given by:
where:
- is the transmittance of the point given by
- is the change in the parameter or

A summary of the volume rendering pipeline
Transmittance represents how visible a point is from a particular input camera. Points in free space or on the surface of the first intersected object will have transmittance near 1, and points inside or behind the first visible object will have transmittance near 0. If a point is seen from some viewpoints but not others, the regressed transmittance value will be the average over all training cameras and lie between zero and one, indicating that the point is partially observed.
💡
The Novelty of Mip-NeRF 360
We have already discussed that Mip-NeRF 360 is essentially a variant of the original NeRF that uses a coordinate-based MLP to model the volumetric density and color of a given scene and uses a volumetric rendering model similar to ray-tracing in order to render the scene.
However, this ignores both the relative footprint of the corresponding image pixel and the length of the interval along the ray containing the point, resulting in aliasing artifacts when rendering novel camera trajectories. Mip-NeRF solves this issue by using the projected pixel footprint to sample conical frustums along the ray rather than intervals and uses multivariate Gaussian distributions with parameters , to represent 3D volumes in a given scene.
The authors of Mip-NeRF 360 observe that applying NeRF-like models to large unbounded scenes raises several critical issues...
Parameterization
The Problem
Unbounded 360° scenes can occupy an arbitrarily large region of Euclidean space, but Mip-NeRF requires that 3D scene coordinates lie in a bounded domain. Such unbounded scenes require a different kind of parameterization that was explored in
These approaches behave somewhat analogously to NDC but in every direction rather than just along the z-axis. In this work, the authors extend this idea to Mip-NeRF and present a method for applying any smooth parameterization to volumes (rather than points), and also present our own parameterization for unbounded scenes.
Normalized device coordinate or NDC space is a screen independent display coordinate system; it encompasses a cube where the x, y, and z components range from −1 to 1. Although clipping to the view volume is specified to happen in clip space, NDC space can be thought of as the space that defines the view volume.
💡
The Solution
In order to understand how the authors attempt to solve the parameterization of unbounded 360° scenes, let us emulate a toy flatland scene with 3 cameras, as shown in the following figure. In Mip-NeRF, these cameras cast Gaussians into the scene, as shown in the following figure.
Now, in a large unbounded scene, this results in Gaussians that are very far away from the origin and very elongated. This is a problem for Mip-NeRF, which requires a bounded coordinate space and works best when the Gaussians are somewhat isotropic.
Gaussians cast by Mip-NeRF
1
In order to fix this issue, the authors define a warp (given by the following contract operator) that smoothly maps all coordinates outside of a ball of radius 1 (shown in blue in the following figure) into a ball of radius 2 (shown in orange in the following figure). This warp is designed to counteract the non-linear spacing defined by the Mip-NeRF Gaussians.
In order to apply the aforementioned contract to the Mip-NeRF Gaussians, the authors use the following function:
where si the Jacobian of at . Using this, we can apply to as follows:
which is functionally equivalent to the classic Extended Kalman filter. This enables us to warp the Gaussians such that an unbounded scene into a ball of radius 2 (shown in orange in the following figure). This non-Euclidean space is where the authors represent the inputs into the MLP.
Warping of Mip-NeRF Gaussians
1
Efficiency
The Problem
A fundamental challenge in dealing with unbounded scenes is that such scenes are often large and detailed. Though NeRF-like models can accurately reproduce objects or regions of scenes using a surprisingly small number of weights, the capacity of the NeRF MLP saturates when faced with increasingly intricate scene content.
Moreover, larger scenes require significantly more samples along each ray to localize surfaces accurately. This makes scaling the representation of unbounded scenes using NeRF-like models computationally expensive.
Several works, such as Baking Neural Radiance Fields for Real-Time View Synthesis, have attempted to distill or “bake” a trained NeRF into a format that can be rendered quickly, but these techniques do not accelerate training. Existing rendering literature also explores accelerating ray-tracing through a hierarchical data structure such as octrees or bounding volume hierarchies.
However, such approaches do not naturally generalize to an inverse rendering context in which the geometry of the scene is unknown and must be recovered due to their prior assumption of the knowledge of geometry.
The Solution
The authors attempt to "distill" scene geometry from a large NeRF MLP into a small proposal MLP during optimization, which makes training almost 3 times faster. In order to understand this online distillation approach, let us first look at how the training and sampling procedure works for Mip-NeRF:
- In Mip-NeRF, first of all, we define a set of even spaces coarse intervals along a ray which are basically like the endpoints in a histogram.
- The Gaussians corresponding to each interval are pushed through an MLP, which produces a colored histogram with weights and color .
- These weights and colors are then averaged to produce an alpha-composited color for that particular pixel.
- Then, these weights are re-sampled to get a new set of intervals clustered around wherever there is content in the scene.
- These intervals are then pushed through the MLP to get a set of weights and colors, which are used to produce the color of the pixel.
- Mip-NeRF is optimized by minimizing a reconstruction loss between all rendered pixel values and the true pixel color taken from the input images.

The training and sampling procedure of Mip-NeRF
Note that only the fine color is used to render the final image, which shows how computationally wasteful Mip-NeRF is. The only reason the coarse rendering is supervised is to help guide the sampling of the fine histogram. This observation motivates the training and sampling procedure of Mip-NeRF 360, which is described below...
- In Mip-NeRF 360, we start with evenly spaced histogram intervals along a ray.
- These are pushed through a Proposal MLP that produces a set of weights and no colors.
- These weights are then resampled to get a new set of intervals.
- The last set of intervals produced by this Proposal MLP is pushed through a NeRF-MLP that behaves exactly like the MLP in the Mip-NeRF pipeline, which gives us a set of weights and color which are used to render the pixel color.
- This rendering is supervised to be close to the true pixel color taken from an input image using a reconstruction loss between the rendered pixel value and the true pixel color taken from the input images.
- Instead of supervising the Proposal MLP to reconstruct the image accurately, its output weights are supervised to be consistent with the output weights of the NeRF-MLP.

The training and sampling procedure of Mip-NeRF 360
The aforementioned setup of Mip-NeRF 360 allows us to have a very small Proposal MLP which is queried very often and a very large NeRF-MLP which gets queried relatively few times, thus making the combined model with a high capacity that is still tractable to train.
The Proposal Loss Function
In order to make the aforementioned Mip-NeRF 360 pipeline work, we need a loss function that encourages histograms with different bin endpoints to be consistent with each other. Let's take an example to illustrate this scenario. In the following figure, we have a true 1D distribution on the left and two histograms of that true distribution on the right.

Since these two histograms are summaries of the same underlying distribution, we can make some strong assertions regarding how they must relate to each other. For example, the weight of any particular bin in the top histogram must not exceed the sum of the bin weights that overlap with it in the histogram shown below. With this fact, we can construct an upper bound on the weights of one histogram using the weights of the other histogram, as shown in the following figure.

The aforementioned bound holds true because the two histograms are summaries of the same underlying 1D distribution.
💡
During the training of Mip-NeRF 360, the authors impose a loss on the histograms produced by the Proposal MLP and the NeRF-MLP that penalizes any excess mass that violates the aforementioned bound (as shown in the following figure). This encourages the Proposal MLP to learn an upper envelope on the volumetric scene density learned by the NeRF-MLP.

This loss function is given by...
Ambiguity
The Problem
Though NeRFs are traditionally optimized using many input images of a scene, the problem of recovering a NeRF that produces realistic synthesized views from novel camera angles is still fundamentally underconstrained. For example, a NeRF could recreate all input images by simply reconstructing each image as a textured plane immediately in front of its respective camera.
The original NeRF paper regularized ambiguous scenes by injecting Gaussian noise into the density head of the NeRF MLP before the rectifier, which encourages densities to gravitate towards either zero or infinity. Though this reduces some “floaters” by discouraging semitransparent densities, the authors observe that it is insufficient for the more challenging task of representing unbounded scenes.
Several other regularizers for NeRF have been proposed in previous works, such as a robust loss on density or smoothness penalties on surfaces proposed by UNISURF and NeRFactor. Still, these solutions address different problems than the ones involving unbounded scenes.
The Solution
This ambiguity problem is tackled in the Mip-NeRF 360 pipeline using a straightforward regularizer on each ray's histogram. This regularizer is given by
where is the interpolation into the step function defined by at .
Here, the authors are minimizing the weighted absolute distance between all points along the ray, and this encourages each histogram to be as close to a delta function as possible.
A Visualization of the Regularizer on a Single Ray Histogram
1
The double integral form of the regularizer is not easy to evaluate, but the authors derive a nice closed form, shown below, that is trivial to compute. This reformulation also provides some intuition for how this loss behaves: the first term minimizes the weighted distances between all pairs of interval midpoints, and the second term minimizes the weighted size of each individual interval.
Results
The authors showcase the performance of Mip-NeRF 360 on a new dataset of 9 challenging indoor and outdoor scenes. As shown by the following panels, Mip-NeRF 360 can produce photorealistic renderings of a scene from unseen camera positions given a couple of hundred posed images of the scene. We can also query the Mip-NeRF 360 model, which exhibits a lot of fine-grained details.
Scenes rendered by Mip-NeRF 360
1
Ablating Parameterization
In the following panel, we demonstrate the ablation of the Parameterization component in the Mip-NeRF 36p pipeline, which, as we can see, results in a very blurry background. We can also observe that not only does this produce blurry backgrounds in the renderings, but also, the depth maps lack a lot of the details in the distant areas.
Ablation study of Parameterization
1
Ablating Online Distillation
The following panel shows the ablation of the Distillation technique in the Mip-NeRF 36p pipeline, which, as we can see, uniformly lowers the quality of rendering everywhere. There's a notable amount of flickering in the rendered scene and thin structures are not rendered properly. Though this technique's main aim is to accelerate the training of the combined model, it has a notable impact on the quality of the rendering.
Ablation Study of the Distillation Technique
1
Ablating the Regularizer
The following panel shows the ablation of the proposed Regularizer in the Mip-NeRF 360 pipeline, which, as we can see, results in a lot of floater artifacts. These artifacts are often difficult to be noticed in a single image and aren't penalized much by a single image metric. Still, they become very apparent while looking at a video and are especially visible when looking at the predicted depth maps.
Ablation Study of the Regularizer
1
Comparison with Mip-NeRF
The authors compare the performance of Mip-NeRF 360 against Mip-NeRF. For these comparisons, the authors have scaled down the Euclidean space in order to deal with Mip-NeRF's requirement of bounded coordinate space. We can observe that Mip-NeRF struggles due to the Parameterization and Ambiguity factors that we have already discussed; it produces blurry renderings in both the foreground as well as the background with lots of floater artifacts.
Comparison of Mip-NeRF 360 against Mip-NeRF
1
Comparison with NeRF++
The authors also compare the performance of Mip-NeRF 360 against NeRF++ which is a NeRF variant that was specifically designed to handle unbounded scenes. It does a much better job of rendering the background compared to Mip-NeRF but it still doesn't achieve the same level of accuracy and realism as Mip-NeRF 360.
Comparison of Mip-NeRF 360 with NeRF++
1
Comparison with Stable View Synthesis + COLMAP
The authors also compare the performance of Mip-NeRF 360 against Stable View Synthesis, the top-performing non-NeRF baseline the authors used. Although some renderings from Stable View Synthesis look very realistic, it's prone to severe failure modes that result in very blurry renderings. These failure models seem to occur because of the reliance of Stable View Synthesis on proxy geometry that's produced by COLMAP that can sometimes be accurate. Mip-NeRF, on the other hand, doesn't require any geometry as input and can produce more realistic depth maps than COLMAP.
Comparison of Mip-NeRF 360 with Stable View Synthesis + COLMAP
1
Limitations of Mip-NeRF 360
Although Mip-NeRF 360 significantly outperforms Mip-NeRF and other prior work, it is not perfect.
- As shown in the aforementioned panels, some thin structures and fine details may be missed, such as the tire spokes in the bicycle scene or the veins on the leaves in the stump scene.
- View synthesis quality is likely to degrade if the camera is moved far from the center of the scene.
- Like most NeRF-like models, recovering a scene requires several hours of training on an accelerator, precluding on-device training.
Potential for Negative Impact
The broad use of neural rendering techniques carries several potential negative societal impacts.
- NeRF-like models have recently been incorporated into generative modeling approaches such as StyleNeRF. Generative modeling techniques, in general, can be used to synthesize deep fakes that could be used to mislead people. Although Mip-NeRF 360 doesn't directly concern generative modeling and instead aims to reconstruct accurate physical models of a scene from which new views can be generated, it may be useful for generative approaches that build on NeRF.
- The ability to reconstruct accurate models of a scene from photographs may have modest potential negative impacts. Mip-NeRF 360 could conceivably be used to construct a surveillance system, and such a system could have a negative impact if used negligently or maliciously.
- Mip-NeRF 360 could be used to generate visual effects (a task that is currently labor-intensive), and as such, it may negatively affect job opportunities for artists.
- Training a NeRF is computationally demanding and requires multiple hours of optimization on an accelerator. This expensive training requires energy, and this may be of concern if that energy was produced in a way that damages the climate.
Conclusion
- In this article, we discuss Mip-NeRF 360, a novel NeRF variant that can produce photorealistic renderings of a large unbounded scene from unseen camera positions given a couple of hundred posed images of the scene.
- We briefly discuss the concept of Neural Radiance Fields and novel view synthesis.
- We discuss the problems of parameterization of large unbounded 360° scenes and how the authors solve this issue by counteracting the non-linear spacing defined by the Mip-NeRF Gaussians by the classic Extended Kalman Filter.
- We discuss the novel training and sampling pipeline proposed by the authors that make the optimization of Mip-NeRF 360 several times faster than Mip-NeRF. We also discuss the proposed loss function for optimizing the Proposal MLP introduced by the authors.
- We also discuss the problems of ambiguity in realistic synthesized views from novel camera angles and how the authors solve this problem by introducing a novel regularizer.
- We showcase the renderings by Mip-NeRF 360 on a new dataset of 9 challenging indoor and outdoor scenes collected by the authors.
- We observe how ablation on individual components in the Mip-NeRF 360 pipeline affects its performance.
- We also observe the comparisons of the renderings by Mip-NeRF 360 against prior works such as Mip-NeRF, NeRF++, and Stable View Synthesis.
- We discuss the limitations of Mip-NeRF 360 and its potential for negative societal impacts.
Similar Posts
Implementing NeRF in JAX
This article uses JAX to create a minimal implementation of 3D volumetric rendering of scenes represented by Neural Radiance Fields, using W&B to track all metrics.
Block-NeRF: Scalable Large Scene Neural View Synthesis
Representing large city-scale environments spanning multiple blocks using Neural Radiance Fields
Extracting Triangular 3D Models, Materials, and Lighting From Images
In this article, we'll explore a novel and efficient approach for joint optimization of topology, materials, and lighting from multi-view image observations.
Barbershop: Hair Transfer with GAN-Based Image Compositing Using Segmentation Masks
A novel GAN-based optimization method for photo-realistic hairstyle transfer
Generating Digital Painting Lighting Effects via RGB-space Geometry
Exploring the paper "Generating Digital Painting Lighting Effects via RGB-space Geometry" in which the authors propose an image processing algorithm to generate digital painting lighting effects from a single image.
3D Image Inpainting With Weights & Biases
In this article, we take a look at a novel way to convert a single RGB-D image into a 3D image, using Weights & Biases to visualize our results.
Add a comment
A Visualization of the Regularizer on a Single Ray Histogram
Could you please provide a simple code to reproduce this gif?Reply
Iterate on AI agents and models faster. Try Weights & Biases today.