You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+42-1Lines changed: 42 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,7 @@ and our [preprint](https://arxiv.org/abs/2103.13413):
11
11
> Vision Transformers for Dense Prediction
12
12
> René Ranftl, Alexey Bochkovskiy, Vladlen Koltun
13
13
14
+
For the latest release MiDaS 3.1, a [technical report](https://arxiv.org/pdf/2307.14460.pdf) and [video](https://www.youtube.com/watch?v=UjaeNNFf9sE&t=3s) are available.
14
15
15
16
MiDaS was trained on up to 12 datasets (ReDWeb, DIML, Movies, MegaDepth, WSVD, TartanAir, HRWSI, ApolloScape, BlendedMVS, IRS, KITTI, NYU Depth V2) with
16
17
multi-objective optimization.
@@ -204,9 +205,16 @@ Test configuration
204
205
205
206
Speed: 22 FPS
206
207
208
+
### Applications
209
+
210
+
MiDaS is used in the following other projects from Intel Labs:
211
+
212
+
- [ZoeDepth](https://arxiv.org/pdf/2302.12288.pdf) (code available [here](https://github.com/isl-org/ZoeDepth)): MiDaS computes the relative depth map given an image. For metric depth estimation, ZoeDepth can be used, which combines MiDaS with a metric depth binning module appended to the decoder.
213
+
- [LDM3D](https://arxiv.org/pdf/2305.10853.pdf) (Hugging Face model available [here](https://huggingface.co/Intel/ldm3d-4c)): LDM3D is an extension of vanilla stable diffusion designed to generate joint image and depth data from a text prompt. The depth maps used for supervision when training LDM3D have been computed using MiDaS.
214
+
207
215
### Changelog
208
216
209
-
* [Dec 2022] Released MiDaS v3.1:
217
+
* [Dec 2022] Released [MiDaS v3.1](https://arxiv.org/pdf/2307.14460.pdf):
210
218
- New models based on 5 different types of transformers ([BEiT](https://arxiv.org/pdf/2106.08254.pdf), [Swin2](https://arxiv.org/pdf/2111.09883.pdf), [Swin](https://arxiv.org/pdf/2103.14030.pdf), [Next-ViT](https://arxiv.org/pdf/2207.05501.pdf), [LeViT](https://arxiv.org/pdf/2104.01136.pdf))
211
219
- Training datasets extended from 10 to 12, including also KITTI and NYU Depth V2 using [BTS](https://github.com/cleinc/bts) split
212
220
- Best model, BEiT<sub>Large 512</sub>, with resolution 512x512, is on average about [28% more accurate](#Accuracy) than MiDaS v3.0
@@ -249,6 +257,39 @@ If you use a DPT-based model, please also cite:
249
257
}
250
258
```
251
259
260
+
Please cite the technical report for MiDaS 3.1 models:
261
+
262
+
```
263
+
@article{birkl2023midas,
264
+
title={MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation},
265
+
author={Reiner Birkl and Diana Wofk and Matthias M{\"u}ller},
266
+
journal={arXiv preprint arXiv:2307.14460},
267
+
year={2023}
268
+
}
269
+
```
270
+
271
+
For ZoeDepth, please use
272
+
273
+
```
274
+
@article{bhat2023zoedepth,
275
+
title={Zoedepth: Zero-shot transfer by combining relative and metric depth},
276
+
author={Bhat, Shariq Farooq and Birkl, Reiner and Wofk, Diana and Wonka, Peter and M{\"u}ller, Matthias},
277
+
journal={arXiv preprint arXiv:2302.12288},
278
+
year={2023}
279
+
}
280
+
```
281
+
282
+
and for LDM3D
283
+
284
+
```
285
+
@article{stan2023ldm3d,
286
+
title={LDM3D: Latent Diffusion Model for 3D},
287
+
author={Stan, Gabriela Ben Melech and Wofk, Diana and Fox, Scottie and Redden, Alex and Saxton, Will and Yu, Jean and Aflalo, Estelle and Tseng, Shao-Yen and Nonato, Fabio and Muller, Matthias and others},
288
+
journal={arXiv preprint arXiv:2305.10853},
289
+
year={2023}
290
+
}
291
+
```
292
+
252
293
### Acknowledgements
253
294
254
295
Our work builds on and uses code from [timm](https://github.com/rwightman/pytorch-image-models) and [Next-ViT](https://github.com/bytedance/Next-ViT).
0 commit comments