Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Sili Chen      Hengkai Guo      Shengnan Zhu      Feihu Zhang     
Zilong Huang      Jiashi Feng      Bingyi Kang     
ByteDance
†Corresponding Author

This work presents Video Depth Anything, which is based on Depth Anything V2. It can be applied to arbitrarily long videos and exhibits strong quality, temporal consistency, and generalization ability.

Long Video Results




Comparison with DepthCrafter on open-world videos

Comparison with DepthCrafter on Unprojected Point Cloud

Intrinsic of camera as well as the aligned scale and shift parameters are obtained by running the first frame of the video through MoGe, which is a powerful model for recovering 3D geometry from monocular open-domain images.

Citation

@article{video_depth_anything,
    title={Video Depth Anything: Consistent Depth Estimation for Super-Long Videos},
    author={Chen, Sili and Guo, Hengkai and Zhu, Shengnan and Zhang, Feihu and Huang, Zilong and Feng, Jiashi and Kang, Bingyi},
    journal={arXiv:2501.12375},
    year={2025}
  }