DRoPS: Dynamic 3D Reconstruction of Pre-scanned Objects

Narek Tumanyan^1,2 Samuel Rota Bulò² Denis Rozumny² Lorenzo Porzi²
Adam Harley² Tali Dekel¹ Peter Kontschieder² Jonathon Luiten²

¹Weizmann Institute of Science ²Meta Reality Labs

Paper ArXiv Code (coming soon)

Loading...

Input Video

Our Dynamic 3D Reconstruction

Trajectories

Abstract

Dynamic scene reconstruction from casual videos has seen recent remarkable progress. Numerous approaches have attempted to overcome the ill-posedness of the task by distilling priors from 2D foundational models and by imposing hand-crafted regularization on the optimized motion. However, these methods struggle to reconstruct scenes from extreme novel viewpoints, especially when highly articulated motions are present. In this paper, we present DRoPS, a novel approach that leverages a static pre-scan of the dynamic object as an explicit geometric and appearance prior. While existing state-of-the-art methods fail to fully exploit the pre-scan, DRoPS leverages our novel setup to effectively constrain the solution space and ensure geometrical consistency throughout the sequence. The core of our novelty is twofold: first, we establish a grid-structured and surface-aligned model by organizing Gaussian primitives into pixel grids anchored to the object surface. Second, by leveraging the grid structure of our primitives, we parameterize motion using a CNN conditioned on those grids, injecting strong implicit regularization and correlating the motion of nearby points. Extensive experiments demonstrate that our method significantly outperforms the current state of the art in rendering quality and 3D tracking accuracy.

Pipeline

Panoptic Studio Results View video results and comparisons Truebones Results View video results and comparisons Ablation Studies View ablation results alongside ours

Panoptic Studio Results

Complete dataset comparison ↗

Evaluation on the CMU Panoptic Studio dataset [4], which features real-world multi-view captures of complex human motions such as juggling, softball, and tennis. We use a single camera view for training and evaluate novel view synthesis on 30 held-out cameras at extreme viewing angles. We compare against state-of-the-art optimization-based (HiMoR [1], OriGS [2]) and generative (Cog-NVS [3]) approaches. DRoPS significantly outperforms the baselines in maintaining a consistent object geometry and sharp appearance, while accurately modeling the scene dynamics.

See more examples here.

Softball

Training Video

Show Pre-scan

Ground Truth

Ours

HiMoR

OriGS

Cog-NVS

Juggle

Training Video

Show Pre-scan

Ground Truth

Ours

HiMoR

OriGS

Cog-NVS

TrueBones Results

Complete dataset comparison ↗

Evaluation on our synthetic dynamic animals dataset generated using TrueBones [5] animated characters. Since the data is synthetic, we have access to ground-truth depth and camera parameters, allowing us to validate our approach under controlled conditions. We compare against state-of-the-art optimization-based (HiMoR [1], OriGS [2]) and generative (Cog-NVS [3]) approaches. DRoPS significantly outperforms the baselines in maintaining a consistent object geometry and sharp appearance, while accurately modeling the scene dynamics.

See more examples here.

Camel Run

Training Video

Show Pre-scan

Ground Truth

Ours

HiMoR

OriGS

Cog-NVS

Bat Attack

Training Video

Show Pre-scan

Ground Truth

Ours

HiMoR

OriGS

Cog-NVS

In-The-Wild Videos

DRoPS can operate on casual, fully monocular videos where no multi-view pre-scan is available. Instead, the pre-scan is generated from the first video frame using an off-the-shelf image-to-3D model. As shown below, DRoPS achieves high-quality, complete dynamic 3D reconstruction even in this challenging in-the-wild setting. Videos are taken from the DAVIS dataset [6] and the internet.

Loading...

Input Video

Our Dynamic 3D Reconstruction

Trajectories

Ablations

Full ablation study ↗

We ablate the key design choices of our method on the CMU Panoptic Studio dataset [4]. Each row shows the result of removing one component on a held-out test camera. Removing any key component significantly degrades reconstruction quality, highlighting their contribution.

See the full ablation study here.

Softball — Test Camera 1

Ours (Full)

w/o CNN, w/ MLP

w/o CNN

w/o Coarse Iso. Loss

w/o Dense Iso. Loss

w/o Rigid Loss

Long-Range 3D Tracking

DRoPS produces an explicit dynamic 3D representation from which long-range 3D point trajectories emerge naturally. We compare our 3D tracking against optimization-based baselines HiMoR [1] and OriGS [2], which also incorporate explcit dynamic 3D representations with deforming Gaussian splats. The ground-truth positions are visualized as crosses, the predictions as circles, and the lines connecting to them show the error.
DRoPS achieves significantly more accurate long-range 3D tracking than the baselines, while maintaining a consistent object geometry and sharp appearance. See our paper for quantitative evaluation and more details.

Softball Tracking

Ours

HiMoR

OriGS

Juggle

Ours

HiMoR

OriGS

Coyote

Ours

HiMoR

OriGS

Camel Run

Ours

HiMoR

OriGS

Text-to-4D Application

DRoPS enables Text-to-4D generation: given a text prompt describing a dynamic subject, we first generate a video using an off-the-shelf text-to-video model, then apply DRoPS with the in-the-wild pipeline for dynamic 3D reconstruction. The colored lines visualize 3D point trajectories emerging from our representation.

Trajectories

BibTeX

@inproceedings{drops2026,
    author = {Narek Tumanyan, Samuel Rota Bulò, Denis Rozumny, Lorenzo Porzi, Adam Harley, Tali Dekel, Peter Kontschieder, Jonathon Luiten},
    title = {DRoPS: Dynamic 3D Reconstruction of Pre-scanned Objects},
    year = {2026},
}

[1] Yiming Liang, Tianhan Xu, Yuta Kikuchi. HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation, CVPR 2025

[2] Junyi Wu, Jiachen Tao, Haoxuan Wang, Gaowen Liu, Ramana Rao Kompella, Yan Yan. Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos, NeurIPS 2025

[3] Kaihua Chen*, Tarasha Khurana*, Deva Ramanan. Reconstruct, Inpaint, Test-Time Finetune: Dynamic Novel-view Synthesis from Monocular Videos, NeurIPS 2025

[4] Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, Yaser Sheikh. Panoptic Studio: A Massively Multiview System for Social Motion Capture, ICCV 2015

[5] TrueBones. truebones.com

[6] Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbelàez, Alexander Sorkine-Hornung, Luc Van Gool. The 2017 DAVIS Challenge on Video Object Segmentation, arXiv:1704.00675, 2017