-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
294 lines (255 loc) · 17.2 KB
/
index.html
File metadata and controls
294 lines (255 loc) · 17.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
<!DOCTYPE html>
<html>
<head lang="en">
<meta charset="UTF-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<title>BOP-Distrib</title>
<meta name="description" content="">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- <meta property="og:image" content="https://jonbarron.info/zipnerf/img/nottingham.jpg"> -->
<meta property="og:image:type" content="image/png">
<meta property="og:image:width" content="1296">
<meta property="og:image:height" content="840">
<meta property="og:type" content="website" />
<!-- <meta property="og:url" content="https://jonbarron.info/zipnerf/"/> -->
<meta property="og:title" content="BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities" />
<meta property="og:description" content="6D pose estimation aims at determining the object posethat best explains the camera observation. The unique so-lution for non-ambiguous objects can turn into a multi-modal pose distribution for symmetrical objects or whenocclusions of symmetry-breaking elements happen, depend-ing on the viewpoint. Currently, 6D pose estimation meth-ods are benchmarked on datasets that consider, for theirground truth annotations, visual ambiguities as only relatedto global object symmetries, whereas they should be de-fined per-image to account for the camera viewpoint. Wethus first propose an automatic method to re-annotate thosedatasets with a 6D pose distribution specific to each image,taking into account the object surface visibility in the imageto correctly determine the visual ambiguities. Second, giventhis improved ground truth, we re-evaluate the state-of-the-art single pose methods and show that this greatly modifiesthe ranking of these methods. Third, as some recent worksfocus on estimating the complete set of solutions, we de-rive a precision/recall formulation to evaluate them againstour image-wise distribution ground truth, making it the firstbenchmark for pose distribution methods on real images." />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities" />
<meta name="twitter:description" content="6D pose estimation aims at determining the object posethat best explains the camera observation. The unique so-lution for non-ambiguous objects can turn into a multi-modal pose distribution for symmetrical objects or whenocclusions of symmetry-breaking elements happen, depend-ing on the viewpoint. Currently, 6D pose estimation meth-ods are benchmarked on datasets that consider, for theirground truth annotations, visual ambiguities as only relatedto global object symmetries, whereas they should be de-fined per-image to account for the camera viewpoint. Wethus first propose an automatic method to re-annotate thosedatasets with a 6D pose distribution specific to each image,taking into account the object surface visibility in the imageto correctly determine the visual ambiguities. Second, giventhis improved ground truth, we re-evaluate the state-of-the-art single pose methods and show that this greatly modifiesthe ranking of these methods. Third, as some recent worksfocus on estimating the complete set of solutions, we de-rive a precision/recall formulation to evaluate them againstour image-wise distribution ground truth, making it the firstbenchmark for pose distribution methods on real images." />
<!-- <meta name="twitter:image" content="https://jonbarron.info/zipnerf/img/teaser.jpg" /> -->
<!-- <link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>⚡</text></svg>"> -->
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<!--
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
-->
<link rel="stylesheet"
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<!--link rel="stylesheet" href="./static/css/index.css"-->
<link rel="icon" href="./img/favicon.ico">
<link rel='stylesheet' href='https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.css'>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.4.0/css/font-awesome.min.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.css">
<link rel="stylesheet" href="css/app.css">
<link rel="stylesheet" href="css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/js/bootstrap.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/codemirror/5.8.0/codemirror.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/1.5.3/clipboard.min.js"></script>
<script src="js/app.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<!--
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
-->
<!-- script src="./static/js/index.js"></script-->
</head>
<body>
<div class="container" id="main">
<div class="row">
<h2 class="col-md-12 text-center">
<b>BOP-Distrib</b>: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities</br>
</h2>
</div>
<div class="row">
<div class="col-md-12 text-center">
<ul class="list-inline">
<li>
<a href="https://scholar.google.fr/citations?user=knXPf8oAAAAJ&hl=fr">
Boris Meden¹
</a>
</li>
<li>
<a href="https://scholar.google.com/citations?user=OD1IpwYAAAAJ&hl=fr">
Asma Brazi¹²
</a>
</li>
<li>
<a href="https://scholar.google.com/citations?user=VIAU0dYAAAAJ&hl=fr">
Fabrice Mayran de Chamisso¹
</a>
</li>
<li>
<a href="https://scholar.google.com/citations?user=Ym-suFYAAAAJ&hl=fr">
Steve Bourgeois¹
</a>
</li>
<li>
<a href="https://vincentlepetit.github.io/">
Vincent Lepetit²
</a>
</li>
</br>¹Université Paris-Saclay, CEA List, F-91120, Palaiseau, France, ²LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, Marne-la-vallée, France
</ul>
</div>
</div>
<div class="row">
<div class="col-md-4 col-md-offset-4 text-center">
<ul class="nav nav-pills nav-justified">
<li>
<a href="https://arxiv.org/abs/2408.17297">
<image src="img/paper_image.png" height="150px" width="116px">
<h4><strong>Paper</strong></h4>
</a>
</li>
<li>
<a href="https://github.com/CEA-LIST">
<image src="img/data.png" height="150px" width="116px" style="padding-bottom:15%;padding-top:9%;">
<h4><strong>Data (To be released)</strong></h4>
</a>
</li>
</ul>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<div class="text-center">
<div style="position:relative;padding-top:32%;">
<img src="img/teaser.png" style="position:absolute;top:0;left:0;width:100%">
</div>
<br>
</div>
<h2 class="subtitle has-text-justified">
We provide for the first time 6D pose annotations in the form of a per-image object pose distribution. Current annotations in
BOP [21] datasets are given as a single pose, shown here as a circle in the SO(3) representations. BOP also provides a symmetry pattern
per object, from which a distribution can be computed (the colored points in SO(3)). Such distribution however does not cover many
cases [35]: In this example, when only the core is visible (<span style="background: #66ff66">Case 1</span>), the pose is fully ambiguous and should be represented by a continuous
distribution in SO(3). When the sides of the head are visible (<span style="background: #ccccff">Case 2</span>), there are still ambiguities and the distribution is made of 6 modes.
When the hole is visible (<span style="background: #ff6666">Case 3</span>), the pose distribution should be concentrated around one non-ambiguous pose. Our method annotates
scenes with per-image distributions, taking into account the partial occlusions and allowing us to evaluate a predicted pose properly. We
show that considering these distributions for evaluation results in a significant change of ranking for the BOP challenge. Such ground truth
distributions also become a key asset when it comes to evaluating pose distribution estimation methods [13, 23]. With appropriate metrics,
we demonstrate the first quantitative evaluation of pose distribution methods on real images, as an extension to single pose methods.
</h2>
</div>
<div class="col-md-8 col-md-offset-2">
<h3>
Abstract
</h3>
<p class="text-justify">
6D pose estimation aims at determining the object pose that best explains the camera observation. The unique solution for non-ambiguous objects can turn into a multi-modal pose distribution for symmetrical objects or when occlusions of symmetry-breaking elements happen, depending on the viewpoint. Currently, 6D pose estimation methods are benchmarked on datasets that consider, for their ground truth annotations, visual ambiguities as only related to global object symmetries, whereas they should be defined per-image to account for the camera viewpoint. We thus first propose an automatic method to re-annotate those datasets with a 6D pose distribution specific to each image, taking into account the object surface visibility in the image to correctly determine the visual ambiguities. Second, given this improved ground truth, we re-evaluate the state-of-the-art single pose methods and show that this greatly modifies the ranking of these methods. Third, as some recent works focus on estimating the complete set of solutions, we derive a precision/recall formulation to evaluate them against our image-wise distribution ground truth, making it the first benchmark for pose distribution methods on real images.</p>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Per-image pose distribution annotation method
</h3>
<!--
<p class="text-justify">
From a symmetry candidate set (or an SE(3) sampling), we pre-compute the object per-vertex ϵ-sym. Then for a given scene,
we compute the vertices visibility (<span style="color:green">✓</span> and <span style="color:red">✗</span> illustrate respectively if the visibility test passed or not for the vertex) and perform a robust
intersection between their ϵ-sym. This intersection is then pruned with a depth comparison and the result constitutes the symmetries pattern
of this object instance for this image. When multiplied by the ground truth, we obtain the SE(3) distribution of the object instance.
</p>
--!>
<div class="text-center">
<div style="position:relative;padding-top:50%;">
<img src="img/Overview.png" style="position:absolute;top:0;left:0;width:100%">
</div>
<br>
</div>
<!--
<div class="text-center">
<div style="position:relative" >
<video controls src="vid/EpsilonSymmetries.mp4" type="video/mp4" autoplay loop muted/>
</div>
<br>
</div>
--!>
<div class="text-center">
<div style="position:relative" >
<video controls src="vid/AnnotationPipeline.mp4" type="video/mp4" autoplay loop muted/>
</div>
<br>
</div>
</div>
<div class="col-md-8 col-md-offset-2">
<h3>
Metrics for evaluating pose distribution evaluation methods
</h3>
<p class="text-justify">
We propose an adaptation of Precision and Recall to the distribution to evaluate how accurate the estimated poses are (Precision), but also how well they cover the ground truth distribution (Recall). The poses comparison is done with registration errors such as MPD and MSD.
</p>
<div style="position:relative">
<video controls src="vid/Metrics.mp4" type="video/mp4" autoplay loop muted/>
</div>
<!--
<div style="position:relative">
<img src="img/precision_distrib.png">
</div>
<p class="text-justify">
</p>
<div style="position:relative">
<img src="img/recall_distrib.png">
</div>
--!>
</div>
<div class="col-md-8 col-md-offset-2">
<h2>
Results
</h2>
<!--
<h3>
Per-image pose distribution ground truth of T-LESS
</h3>
<p class="text-justify">
We present here the results of our re-annotation procedure, displaying for each object in each image its per-image pose distribution ground truth.
</p>
<div style="position:relative">
<video controls src="vid/BOP_Distrib_newGT_visualizations.mp4" type="video/mp4" autoplay loop muted/>
</div>
--!>
</div>
<div class="col-md-8 col-md-offset-2">
<h3>
Pose distribution evaluations of SpyroPose and LiePose
</h3>
<p class="text-justify">
We present here the first quantitative evaluation of pose distribution methods <a target="_blank" rel="noopener noreferrer" href="https://spyropose.github.io/">SpyroPose</a> and <a target="_blank" rel="noopener noreferrer" href="https://ending2015a.github.io/liepose-diffusion-page/">LiePose</a> on real data (<a target="_blank" rel="noopener noreferrer" href="https://cmp.felk.cvut.cz/t-less/">T-LESS</a>). The graphs below also incorporate the results from <a target="_blank" rel="noopener noreferrer" href="https://arxiv.org/pdf/2505.02501">Corr2Distrib</a>.
</p>
<div style="position:relative">
<img src="img/results_distrib_SpyroPose_LiePose.png">
</div>
<div style="position:relative">
<img src="img/PR_distrib.png">
</div>
<div style="position:relative">
<video controls src="vid/BOP_Distrib_distribution_comparison_SpyroPose_LiePose.mp4" type="video/mp4" autoplay loop muted/>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Citation
</h3>
<div class="form-group col-md-10 col-md-offset-1">
<textarea id="bibtex" class="form-control" readonly>
@article{meden2026bopd,
title={BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities},
author={Meden, Boris and Brazi, Asma and Mayran de Chamisso, Fabrice and Bourgeois, Steve and Lepetit, Vincent},
journal={Winter Conference on Applications of Computer Vision (WACV)},
year={2026}
}</textarea>
</div>
</div>
</div>
<div class="row">
<div class="col-md-8 col-md-offset-2">
<h3>
Acknowledgements
</h3>
<p class="text-justify">
The website template was borrowed from <a href="https://cea-list.github.io/RING-NeRF/"> RING-NeRF</a>, <a href="https://dorverbin.github.io/refnerf">Ref-NeRF</a> and <a href="https://nerfies.github.io/">nerfies</a>.
</p>
</div>
</div>
</div>
</body>
</html>