Computer VisionNeRFGaussian Splatting3D Reconstruction

Neural Radiance Fields and 3D Gaussian Splatting: The Future of 3D Reconstruction

From the original NeRF paper to real-time Gaussian splatting, how learned scene representations are eating classical photogrammetry across robotics, VR, film, and autonomous driving.

David Kim

March 8, 2026

12 min read

Neural Radiance Fields and 3D Gaussian Splatting: The Future of 3D Reconstruction

Photogrammetry was a slow, brittle discipline for two decades. Multi-view stereo pipelines from COLMAP, RealityCapture, and Agisoft Metashape required carefully calibrated captures, struggled with reflective or textureless surfaces, and produced meshes that needed heavy cleanup. The release of Neural Radiance Fields, or NeRF, by Mildenhall and colleagues at Berkeley in 2020 reframed the problem. Instead of recovering geometry as an explicit mesh, NeRF learns a continuous 5D function that maps a 3D point and a viewing direction to a color and a density value, then renders novel views by integrating that function along camera rays.

The NeRF Formulation

The original paper, arXiv:2003.08934, defined the scene as a function F(x, y, z, theta, phi) returning RGB and sigma, where sigma is volume density. A small multilayer perceptron with roughly 8 hidden layers of 256 units represents this function. Training uses posed 2D images, typically with camera poses recovered by COLMAP, and minimizes the squared error between rendered and ground-truth pixels via the classical volume rendering integral, discretized through stratified sampling along each ray.

A critical detail is positional encoding. Feeding raw coordinates into the MLP produces blurry reconstructions because ReLU networks are biased toward low-frequency functions. Mildenhall lifted inputs through a Fourier feature mapping with sinusoidal frequencies up to 2 to the 10th, which unlocks the recovery of fine textures and sharp edges. This insight, formalized in Tancik et al.'s 2020 paper on Fourier features, became foundational to coordinate-based neural representations in graphics and beyond.

Speedups: Hash Grids, Mip-NeRF, and TensoRF

Vanilla NeRF took roughly a day to train on a single scene and rendered at sub-interactive frame rates. Instant-NGP, introduced by NVIDIA's Thomas Muller in 2022, replaced the global MLP with a multi-resolution hash grid that stores learnable features indexed by spatial hashing. Training drops from hours to seconds for many scenes, and the technique has been widely adopted as the default fast NeRF backbone. TensoRF, by Chen et al., decomposes the radiance field into a sum of vector-matrix factors, achieving comparable quality with smaller memory footprints.

Mip-NeRF addressed a different problem: aliasing when rendering at multiple scales. By treating each ray as a cone and integrating positional encodings over the conical frustum, it produces dramatically cleaner anti-aliased views. Mip-NeRF 360 extended the formulation to unbounded scenes through a non-linear scene contraction that maps distant points into a finite domain. NeRF++ had earlier proposed a similar inverted-sphere parameterization. Together these papers turned NeRF from a object-centric toy into a viable representation for full real-world environments.

Abstract neural network visualization representing a learned 3D scene — Coordinate-based networks represent geometry and appearance as continuous functions, sidestepping mesh topology entirely.

Dynamic Scenes and Generative Variants

Static scenes are only the start. D-NeRF added a learned deformation field that warps points from observed time into a canonical space, enabling reconstruction of moving subjects from monocular video. Nerfies and HyperNeRF refined this with higher-dimensional latent codes for topology changes such as opening mouths. K-Planes factorizes the 4D plenoptic function into six learnable planes, providing fast training on dynamic captures and supporting both static and time-varying scenes within a unified framework.

Generative offshoots include GIRAFFE, EG3D from NVIDIA, and the more recent Zero-1-to-3 and Instant3D systems that hallucinate full radiance fields from a single image. These have begun to merge with diffusion models such as Stable Diffusion 3 to produce text-to-3D pipelines including DreamFusion, Magic3D, and ProlificDreamer, though geometry quality still trails dedicated reconstruction methods.

The Gaussian Splatting Paradigm Shift

In August 2023, Kerbl, Kopanas, Leimkuhler, and Drettakis published 3D Gaussian Splatting for Real-Time Radiance Field Rendering at SIGGRAPH. The method replaces the implicit MLP with millions of explicit 3D anisotropic Gaussians, each carrying a position, covariance, opacity, and view-dependent color encoded via spherical harmonics. Rendering is done through a differentiable tile-based rasterizer that splats Gaussians directly onto the image plane, sorted by depth and alpha-blended.

The result was a step change. Training matches or beats Mip-NeRF 360 quality in 30 to 60 minutes on a single consumer GPU, and rendering runs at well above 100 frames per second at 1080p. Because the representation is explicit, edits, physics integration, and streaming become tractable in ways that were awkward for implicit NeRFs. Within 18 months, follow-ups including Mip-Splatting for anti-aliasing, 4D Gaussian Splatting for dynamic scenes, and SuGaR for surface extraction had turned the technique into the dominant practical 3D representation.

# Minimal sketch of the Gaussian splatting forward pass
for gaussian in sorted_by_depth(visible_gaussians):
    mu_2d = project(gaussian.mu, camera)
    sigma_2d = project_cov(gaussian.cov, camera)
    color = eval_sh(gaussian.sh, view_dir)
    alpha = gaussian.opacity * gaussian_2d(pixel, mu_2d, sigma_2d)
    pixel_color = pixel_color + transmittance * alpha * color
    transmittance = transmittance * (1 - alpha)

Applications Across Industries

Consumer apps such as Luma AI and Polycam now let anyone capture a scene with a phone and publish a Gaussian splat or NeRF that streams in a browser. Apple Vision Pro experiences increasingly rely on splatting for photorealistic environments because the explicit primitives map cleanly to GPU rasterization on mobile silicon. In robotics, NeRF-based representations have been used for grasp planning and visuomotor policy learning, with work from Stanford and Berkeley demonstrating that radiance fields can serve as differentiable world models. In film visual effects, ILM and Wargaming have publicly discussed NeRF and splatting pipelines for set extension and previs.

Autonomous driving has emerged as one of the highest-stakes applications. NVIDIA's neural reconstruction stack, demonstrated in DRIVE Sim and the Omniverse-based simulation tools, uses Gaussian splatting to turn fleet camera footage into editable simulation environments. Waymo published Block-NeRF in 2022 to reconstruct entire San Francisco neighborhoods from millions of street-view images, and successor systems now combine LiDAR priors with splats for accurate, sensor-faithful simulation at city scale.

Robotics: differentiable scene representations for grasp planning and policy learning
VR and AR: photoreal environments on Apple Vision Pro, Quest, and WebXR via splat streaming
Consumer capture: Luma AI, Polycam, KIRI Engine turning phones into 3D scanners
Film and VFX: previs, set extension, and virtual production by ILM, Weta, and DNEG teams
Autonomous driving: NVIDIA neural reconstruction and Waymo Block-NeRF for simulation
Cultural heritage: digitization of museums and archaeological sites with sub-millimeter detail
E-commerce: 3D product views replacing turntable photography on Shopify and Amazon pilots

Gaussian splatting did not just make NeRF faster. It made learned 3D representations practical enough that the question is no longer whether to use them in production, but which variant to standardize on.

Open Problems

Specular and transparent surfaces still confound both NeRF and splatting because view-dependent reflections are hard to separate from geometry. Deformable objects, especially clothing and human skin, remain an active research front despite progress from K-Planes, 4DGS, and HumanGaussian. Capture requirements are another bottleneck: while phone-based capture is improving, reliable reconstructions still favor dense, well-lit, slow camera trajectories.

The deepest open question is how learned 3D representations will integrate with the rest of the graphics stack. Game engines, CAD tools, and physics simulators still assume meshes and textures. Bridges such as 2D Gaussian Splatting for surfaces, NeRF2Mesh, and Gaussian Frosting are early answers, but the broader industry transition will determine whether radiance fields remain a capture-side technology or become the native representation for interactive 3D itself.