🏡Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Rui Li1     Tobias Fischer1     Mattia Segu1     Marc Pollefeys1     Luc Van Gool1     Federico Tombari2,3    

CVPR 2024

1ETH Zürich     2Google     3Technical University of Munich    

Know Your Neighbors (KYN) excels in disambiguating occluded scene geometry from a single image by utilizing vision-language semantics and spatial reasoning.

TL;DR

  • A new single-view scene reconstruction method that reasons faithful scene/object geometry with partial visual observations.
  • A VL modulation module that enriches per-point features with fine-grained semantics from visual and text features.
  • A VL spatial attention that aggregates point representations of the scene for accurate predictions aware of the neighboring 3D semantic context.

Overview

Given an input image \(\textbf{I}_{0}\), we use two image encoders to obtain features (\(F_{\text{app}}\), \(F_{\text{vis}}\)), and fuse these into feature map \(F_{\text{fused}}\). We further extract category-level text features and a segmentation map \(S\). For a given 3D point set \(\mathbf{X}\), we query the extracted features by projecting them onto the image plane yielding point-wise visual and text features. Next, the VL modulation layers endow the point representation with fine-grained semantic information. Finally, the VL spatial attention aggregates these point representations across the 3D scene, yielding density predictions aware of the 3D semantic context.

Visual Comparisons

Ours
BTS [Wimbauer, 2023]
Ours
PixelNeRF [Yu, 2021]
Ours
MonoDepth2 [Godard, 2019]

Scene Reconstruction

Compared to previous methods that struggle with corrupted and trailing shapes, our method produces faithful scene geometry, especially for occluded areas.

Object Reconstruction

Our method produces more faithful object geometries for various semantic categories.

BibTeX

@inproceedings{li2024know,
      title={Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning}, 
      author={Li, Rui and Fischer, Tobias and Segu, Mattia and Pollefeys, Marc and Van Gool, Luc and Tombari, Federico},
      booktitle={CVPR},
      year={2024}
}

awesome webpage template