Skip to content

Commit bdc4ed6

Browse files
authored
Merge pull request isl-org#229 from isl-org/MiDaS_v3_1_Readme_Update
Update README.md with links to technical report, video, ZoeDepth and LDM3D
2 parents 1645b7e + cc2935b commit bdc4ed6

File tree

1 file changed

+42
-1
lines changed

1 file changed

+42
-1
lines changed

README.md

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ and our [preprint](https://arxiv.org/abs/2103.13413):
1111
> Vision Transformers for Dense Prediction
1212
> René Ranftl, Alexey Bochkovskiy, Vladlen Koltun
1313
14+
For the latest release MiDaS 3.1, a [technical report](https://arxiv.org/pdf/2307.14460.pdf) and [video](https://www.youtube.com/watch?v=UjaeNNFf9sE&t=3s) are available.
1415

1516
MiDaS was trained on up to 12 datasets (ReDWeb, DIML, Movies, MegaDepth, WSVD, TartanAir, HRWSI, ApolloScape, BlendedMVS, IRS, KITTI, NYU Depth V2) with
1617
multi-objective optimization.
@@ -204,9 +205,16 @@ Test configuration
204205

205206
Speed: 22 FPS
206207

208+
### Applications
209+
210+
MiDaS is used in the following other projects from Intel Labs:
211+
212+
- [ZoeDepth](https://arxiv.org/pdf/2302.12288.pdf) (code available [here](https://github.com/isl-org/ZoeDepth)): MiDaS computes the relative depth map given an image. For metric depth estimation, ZoeDepth can be used, which combines MiDaS with a metric depth binning module appended to the decoder.
213+
- [LDM3D](https://arxiv.org/pdf/2305.10853.pdf) (Hugging Face model available [here](https://huggingface.co/Intel/ldm3d-4c)): LDM3D is an extension of vanilla stable diffusion designed to generate joint image and depth data from a text prompt. The depth maps used for supervision when training LDM3D have been computed using MiDaS.
214+
207215
### Changelog
208216

209-
* [Dec 2022] Released MiDaS v3.1:
217+
* [Dec 2022] Released [MiDaS v3.1](https://arxiv.org/pdf/2307.14460.pdf):
210218
- New models based on 5 different types of transformers ([BEiT](https://arxiv.org/pdf/2106.08254.pdf), [Swin2](https://arxiv.org/pdf/2111.09883.pdf), [Swin](https://arxiv.org/pdf/2103.14030.pdf), [Next-ViT](https://arxiv.org/pdf/2207.05501.pdf), [LeViT](https://arxiv.org/pdf/2104.01136.pdf))
211219
- Training datasets extended from 10 to 12, including also KITTI and NYU Depth V2 using [BTS](https://github.com/cleinc/bts) split
212220
- Best model, BEiT<sub>Large 512</sub>, with resolution 512x512, is on average about [28% more accurate](#Accuracy) than MiDaS v3.0
@@ -249,6 +257,39 @@ If you use a DPT-based model, please also cite:
249257
}
250258
```
251259

260+
Please cite the technical report for MiDaS 3.1 models:
261+
262+
```
263+
@article{birkl2023midas,
264+
title={MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation},
265+
author={Reiner Birkl and Diana Wofk and Matthias M{\"u}ller},
266+
journal={arXiv preprint arXiv:2307.14460},
267+
year={2023}
268+
}
269+
```
270+
271+
For ZoeDepth, please use
272+
273+
```
274+
@article{bhat2023zoedepth,
275+
title={Zoedepth: Zero-shot transfer by combining relative and metric depth},
276+
author={Bhat, Shariq Farooq and Birkl, Reiner and Wofk, Diana and Wonka, Peter and M{\"u}ller, Matthias},
277+
journal={arXiv preprint arXiv:2302.12288},
278+
year={2023}
279+
}
280+
```
281+
282+
and for LDM3D
283+
284+
```
285+
@article{stan2023ldm3d,
286+
title={LDM3D: Latent Diffusion Model for 3D},
287+
author={Stan, Gabriela Ben Melech and Wofk, Diana and Fox, Scottie and Redden, Alex and Saxton, Will and Yu, Jean and Aflalo, Estelle and Tseng, Shao-Yen and Nonato, Fabio and Muller, Matthias and others},
288+
journal={arXiv preprint arXiv:2305.10853},
289+
year={2023}
290+
}
291+
```
292+
252293
### Acknowledgements
253294

254295
Our work builds on and uses code from [timm](https://github.com/rwightman/pytorch-image-models) and [Next-ViT](https://github.com/bytedance/Next-ViT).

0 commit comments

Comments
 (0)