[NeurIPS 2019 Highlight] Vincent Sitzmann @ Stanford: Scene Representation Networks

Updated: Dec 22, 2019

This episode is an interview with Vincent Sitzmann from Stanford University, discussing highlights from his paper, "Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations," accepted as an oral presentation at NeurIPS 2019 conference.

Vincent is a fourth-year Ph.D. student in the Stanford Computational Imaging Laboratory, advised by Prof. Gordon Wetzstein. His research interest lies in 3D-structure-aware neural scene representations - a novel way for AI to represent information on our 3D world. His goal is to allow AI to reason about our world given visual observations, such as inferring a complete model of a scene with information on geometry, material, lighting etc. from only few observations, a task that is simple for humans, but currently impossible for AI.

Interview with Robin.ly:

Robin.ly is a content platform dedicated to helping engineers and researchers develop leadership, entrepreneurship, and AI insights to scale their impacts in the new tech era.

Subscribe to our newsletter to stay updated for more NeurIPS interviews and inspiring AI talks:

Paper At A Glance

The advent of deep learning has given rise to neural scene representations - learned mathematical models of a 3D environment. However, many of these representations do not explicitly reason about geometry and thus do not account for the underlying 3D structure of the scene. In contrast, geometric deep learning has explored 3D-structure-aware representations of scene geometry, but requires explicit 3D supervision. We propose Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating the image formation as a differentiable ray-marching algorithm, SRNs can be trained end-to-end from only 2D observations, without access to depth or geometry. This formulation naturally generalizes across scenes, learning powerful geometry and appearance priors in the process. We demonstrate the potential of SRNs by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.


[Full Paper on arXiv]

Visualization of Scene Representation Networks

Scene Representation Networks map a world coordinate to a learned feature representation of whatever is at that world coordinate

Visualization of the Neural Renderer

A differentiable neural renderer renders the scene by ray-marching along each camera ray until a surface is discovered.


A Hypernetwork (Ha et al. 2016) enables generalization across a class of scenes by predicting the weights of a scene representation network from a low-dimensional scene embedding.

Leadership, entrepreneurship, and AI insights

  • LinkedIn Social Icon
  • YouTube Social  Icon
  • Twitter Social Icon
  • Facebook Social Icon
  • Instagram Social Icon
  • SoundCloud Social Icon
  • Spotify Social Icon

© 2018 by Robin.ly. Provided by CrossCircles Inc.