Logo 3D Motion Reconstruction for 4D Synthesis

1Westlake University 2HUST 3Hillbot

Synthesising high-quality 4D dynamic objects from single monocular video

Abstract

We present Motion 3-to-4, a feed-forward framework for synthesising high-quality 4D dynamic objects from a single monocular video and an optional 3D reference mesh. While recent advances have significantly improved 2D, video, and 3D content generation, 4D synthesis remains difficult due to limited training data and the inherent ambiguity of recovering geometry and motion from a monocular viewpoint.

Motion 3-to-4 addresses these challenges by decomposing 4D synthesis into static 3D shape generation and motion reconstruction. Using a canonical reference mesh, our model learns a compact motion latent representation and predicts per-frame vertex trajectories to recover complete, temporally coherent geometry. A scalable frame-wise transformer further enables robustness to varying sequence lengths. Evaluations on both standard benchmarks and a new dataset with accurate ground-truth geometry show that Motion 3-to-4 delivers superior fidelity and spatial consistency compared to prior work.

Methodology

Method Diagram

Our framework consists of two main components:

Visual Comparison with SOTA on Motion-80 Dataset

Comparing Motion 3-to-4 with baseline methods

Synthesis Results

Real-World Video Reconstruction

3D Animation via VideoGen

Driving static 3D assets with text prompts and generated videos

Motion Retargeting

Roar Motion Transfer
Walk Motion Transfer

Related Work

Citation

@article{chen2026motion3to4,
    title={Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis},
    author={Hongyuan, Chen and Xingyu, Chen and Youjia Zhang, and Zexiang, Xu and Anpei, Chen},
    journal={arXiv preprint arXiv:2601.14253},
    year={2026}
}