Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Sili Chen Hengkai Guo^† Shengnan Zhu Feihu Zhang
Zilong Huang Jiashi Feng Bingyi Kang^†
ByteDance
†Corresponding Author

CVPR 2025

This work presents Video Depth Anything, which is based on Depth Anything V2. It can be applied to arbitrarily long videos and exhibits strong quality, temporal consistency, and generalization ability.

Long Video Results

Comparison with DepthCrafter on open-world videos

Comparison with DepthCrafter on Unprojected Point Cloud

Intrinsic of camera as well as the aligned scale and shift parameters are obtained by running the first frame of the video through MoGe, which is a powerful model for recovering 3D geometry from monocular open-domain images.

Citation

@article{video_depth_anything,
    title={Video Depth Anything: Consistent Depth Estimation for Super-Long Videos},
    author={Chen, Sili and Guo, Hengkai and Zhu, Shengnan and Zhang, Feihu and Huang, Zilong and Feng, Jiashi and Kang, Bingyi},
    journal={arXiv:2501.12375},
    year={2025}
  }