LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models

1NVIDIA Research, 2Massachusetts Institute of Technology, 3Harvard University, 4Georgia Institute of Technology, 5University of California, Berkeley, 6Stanford University, 7University of Southern California

ICLR 2025 Spotlight

TL;DR: LoRA3D helps 3D geometric foundation models (e.g. DUSt3R, MASt3R) to self-specialize to your target scene using just around 10 uncalibrated images in under 5 min on a single GPU.

Target scene: Replica office0

Task: 3D reconstruction

Teaser image showing the results of the Lora3d method.
Loading point cloud...
Loading point cloud...

Interactive visualization of our reconstructed point clouds. Use mouse to rotate, scroll to zoom.

Target scene: Waymo test segment-10084

Task: Novel view synthesis

Teaser image showing the results of the Lora3d method.

The videos show novel view renders of a 3DGS model trained with only 10 input images using InstantSplat (DUSt3R-initialized 3DGS training). The artifacts that remain after self-calibration are primarily caused by incomplete sparse view observation.

Target scene: TUM fr2_xyz

Task: Camera pose estimation

Teaser image showing the results of the Lora3d method.
TUM dataset reconstruction before LoRA3D calibration
TUM dataset reconstruction after LoRA3D calibration

The figures above show estimated and ground truth camera poses for 10 sample images in TUM fr2_xyz. The camera pose estimates are made more accurate by self-calibration.

Target scene: Waymo test segment-10084

Task: 3D reconstruction

MASt3R calib images.
Waymo dataset reconstruction before LoRA3D calibration
Waymo dataset reconstruction after LoRA3D calibration

Abstract

Emerging 3D geometric foundation models, such as DUSt3R, offer a promising approach for in-the-wild 3D vision tasks. However, due to the high-dimensional nature of the problem space and scarcity of high-quality 3D data, these pre-trained models still struggle to generalize to many challenging circumstances, such as limited view overlap or low lighting. To address this, we propose LoRA3D, an efficient self-calibration pipeline to specialize the pre-trained models to target scenes using their own multi-view predictions. Taking sparse RGB images as input, we leverage robust optimization techniques to refine multi-view predictions and align them into a global coordinate frame. In particular, we incorporate prediction confidence into the geometric optimization process, automatically re-weighting the confidence to better reflect point estimation accuracy. We use the calibrated confidence to generate high-quality pseudo labels for the calibrating views and use low-rank adaptation (LoRA) to fine-tune the models on the pseudo-labeled data. Our method does not require any external priors or manual labels. It completes the self-calibration process on a single standard GPU within just 5 minutes. Each low-rank adapter requires only 18MB of storage. We evaluated our method on more than 160 scenes from the Replica, TUM and Waymo Open datasets, achieving up to 88% performance improvement on 3D reconstruction, multi-view pose estimation and novel-view rendering.

BibTeX


    @article{lu2024lora3d,
      title={LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models},
      author={Lu, Ziqi and Yang, Heng and Xu, Danfei and Li, Boyi and Ivanovic, Boris and Pavone, Marco and Wang, Yue},
      journal={arXiv preprint arXiv:2412.07746},
      year={2024}
    }