Project_2_Face_Detection/index.html at master · COMP5421/Project_2_Face_Detection · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
<html>
<head>
<title>COMP5421 Face Detection</title>
<link href='http://fonts.googleapis.com/css?family=Nunito:300|Crimson+Text|Droid+Sans+Mono' rel='stylesheet' type='text/css'>
<!--<link rel="stylesheet" title="Default" href="styles/github.css">-->
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script>

<link rel="stylesheet" href="html/highlighting/styles/default.css">
<script src="html/highlighting/highlight.pack.js"></script>

<style type="text/css">
body {
	margin: 0px;
	width: 100%;
	font-family: 'Crimson Text', serif;
	font-size: 20px;
	background: #fcfcfc;
}
h1 {
	font-family: 'Nunito', sans-serif;
	font-weight: normal;
	font-size: 28px;
	margin: 25px 0px 0px 0px;
	text-transform: lowercase;
}
h2 {
	font-family: 'Nunito', sans-serif;
	font-weight: normal;
	font-size: 32px;
	margin: 15px 0px 35px 0px;
	color: #333;
	word-spacing: 3px;
}
h3 {
	font-family: 'Nunito', sans-serif;
	font-weight: normal;
	font-size: 26px;
	margin: 10px 0px 0px 0px;
	color: #333;
	word-spacing: 2px;
}
h4 {
	font-family: 'Nunito', sans-serif;
	font-weight: normal;
	font-size: 22px;
	margin: 10px 0px 10px 0px;
	color: #333;
	word-spacing: 2px;
}
h5 {
	font-family: 'Nunito', sans-serif;
	font-weight: normal;
	font-size: 18px;
	margin: 10px 0px 10px 0px;
	color: #111;
	word-spacing: 2px;
}
p, li {
	color: #444;
}
a {
	color: #C7EDCC;
}
.container {
	margin: 0px auto 0px auto;
	width: 1160px;
}
#header {
	background: #333;
	width: 100%;
}
#headersub {
	color: #ccc;
	width: 960px;
	margin: 0px auto 0px auto;
	padding: 20px 0px 20px 0px;
}
.chart {
	width: 480px;
}
.lol {
	font-size: 16px;
	color: #888;
	font-style: italic;
}
.sep {
	height: 1px;
	width: 100%;
	background: #999;
	margin: 20px 0px 20px 0px;
}
.footer{
	font-size: 16px;
}
.latex {
	width: 100%;
}
.latex img {
	display: block;
	margin: 0px auto 0px auto;
}
pre {
	font-family: 'Droid Sans Mono';
	font-size: 14px;
}
table td {
  text-align: center;
  vertical-align: middle;
}
table td img {
  text-align: center;
  vertical-align: middle;
}
#contents a {
}
</style>
<script type="text/javascript">
    hljs.initHighlightingOnLoad();
</script>
</head>
<body>
<div id="header" >
<div id="headersub">
<h1>Xiyuan Liu, Shan Huang <span style="color: #DE3737; font-size: 20px"></span></h1>
</div>
</div>


<div class="container">
<h2 style="font-family:verdana">COMP5421 / Project 2 / Face Detection</h2>
<div style="float: right; padding: 20px">
<center>
<img src="html/detections_cs143_2013_class_easy_01.jpg.png" width="100%"/>
</center>
</div>
</div>


<div class="container">
<h2><b>Overview</b></h2>
<p>
	In this project we implement a face detection program using SIFT-like Histogram of Gradients(HoG) based on Triggs's paper.<br>
	We also test on some picture preprocessing techniques and extra positive training sets for better performance. Details are explained below.
</p>

<p>
	The whole program mainly consists of the following steps.<br>
<ol>
	<li>Extract Histogram of Oriented Gradient(HOG) features from positive samples.</li>
	<li>Extract Histogram of Oriented Gradient(HOG) features from random negative samples.</li>
	<li>Train a Linear SVM classifer based on both positive and negative samples using vl_svmtrain.</li>
	<li>(Extra credit) Hard negative mining, details are explained below.</li>
	<li>Detect test dataset with multiple scale sliding windows, determing whether each window contains a face or not. </li>
	<li>Generate a bounding box with confidence threshold.</li>
	<li>Compute ROC, precision-recall curve and average precision.</li>
</ol>
</p>
</div>


<div class="container">
<h3><b>Train Linear SVM</b></h3>
<p>
	We have extracted 6713 positive features (faces) from Caltech Web Faces dataset and extracted in total 50000 random negative features (non-faces) from SUN dataset.<br>
	We use linear SVM (vl_svmtrain) with regularization parameter (lambda) as 0.0001 to obtain a linear classifier.
</p>

<h3><b>Multi-Scaling and Step Size</b></h3>
<p>
	We used multiple scale sliding windows (0.05:1.2:0.05) to detect images.<br> To evaluate the effects of different steps(HoG cell size) on test results, method of control variates are used.<br>
	The following results are obtained with HoG_Template_Size=36, Confidence_Threshold=-0.5.
</p>


<div align="center">
<table border=1>

<tr>
<td><b><font size="5">Hog Cell Size</font></b></td>
<td><b><font size="5">Cell Size = 6</font></b></td>
<td><b><font size="5">Cell Size = 4</font></b></td>
<td><b><font size="5">Cell Size = 3</font></b></td>
</tr>

<tr>
<td><b><font size="5">HoG</font></b></td>
<td> <img src="html/hog_template6.png" width="100%"/> </td>
<td> <img src="html/hog_template4.png" width="100%"/> </td>
<td> <img src="html/hog_template3.png" width="100%"/> </td>
</tr>

<tr>
<td><b><font size="5">Average Precision</font></b></td>
<td> <img src="html/average_precision6.png" width="100%"/> </td>
<td> <img src="html/average_precision4.png" width="100%"/> </td>
<td> <img src="html/average_precision3.png" width="100%"/> </td>
</tr>

<tr>
<td><b><font size="5">Recall(Viola Jones)</font></b></td>
<td> <img src="html/Detection rate6.jpg" width="100%"/> </td>
<td> <img src="html/Detection rate4.jpg" width="100%"/> </td>
<td> <img src="html/Detection rate3.jpg" width="100%"/> </td>
</tr>

<tr>
<td><b><font size="5">Sample Result</font></b></td>
<td> <img src="html/detections_henry6.png" width="100%"/> </td>
<td> <img src="html/detections_henry4.png" width="100%"/> </td>
<td> <img src="html/detections_henry3.png" width="100%"/> </td>
</tr>

</table>
</div>
<p>
	It turns out that the detection results become better as HoG cell size gets smaller. However, the total running time also increases dramatically. It is safe to conclude that there exists a tradeoff between average precision and running time.
</p>
</div>


<div class="container">
<h3><b>Extra Credit: Hard Negative Mining</b></h3>
<p>
	To refine our classifier, we implement the method of hard negative mining, which includes the following steps:<br>
<ol>
	<li>......previous steps to obtain the initial SVM.</li>
	<li>Test the SVM on negative samples dataset we used in the previous step.</li>
	<li>Since there should be no faces in the negative dataset, any detection of faces (confidence above certain threshold) should be false positive and will be recorded.</li>
	<li>Add the recorded new negative features to the old negative feature set.</li>
	<li>Retrain the SVM using old positive feature set and new nageative feature set. </li>
	<li>The enhanced SVM is obtained. </li>
</ol>
<p>The following results show the improvements hard negative mining has on the SVM.<br>(HoG_Template_Size=36, HoG_Cell_Size=3, Confidence_Threshold=-0.5)</p>
</p>


<div align="center">
<table border=1 width="1000">

<tr>
<td></td>
<td><b><font size="5">Hard Negative Mining=OFF</font></b></td>
<td><b><font size="5">Hard Negative Mining=ON</font></b></td>
</tr>

<tr>
<td><b><font size="5">Average Precision</font></b></td>
<td> <img src="html/average_precision3.png" width="100%"/> </td>
<td> <img src="html/average_precision3_hnm.png" width="100%"/> </td>
</tr>

<tr>
<td><b><font size="5">Recall(Viola Jones)</font></b></td>
<td> <img src="html/Detection rate3.jpg" width="100%"/> </td>
<td> <img src="html/Detection rate3_hnm.jpg" width="100%"/> </td>
</tr>

<tr>
<td><b><font size="5">Sample Result</font></b></td>
<td> <img src="html/detections_Arsenal3.jpg.png" width="100%"/> </td>
<td> <img src="html/detections_Arsenal3_hnm.jpg.png" width="100%"/> </td>
</tr>

</table>
</div>
<p>
	Hard negative mining does improve the performance a little bit. However, it also increase the training time.
</p>
</div>


<div class="container">
<h3><b>Extra Credit: Alternative Positive Training Data</b></h3>

<p>
	We search for extra face dataset and find Labeled Faces in the Wild(LFW) dataset from UMASS. We select around 8000 extra face images and resize them to 36*36, then mix up with the caltech faces dataset. Finally, we divided the dataset into two new datasets with each contains around 10000 faces images. The results below illustrates performance of each datasets.<br>
	(HoG_Template_Size=36, HoG_Cell_Size=3, Confidence_Threshold=-0.5, HNM=OFF)
</p>


<div align="center">
<table border=1 width="1000">

<tr>
<td></td>
<td><b><font size="5">NewFaceSet</font></b></td>
<td><b><font size="5">NewFaceSet2</font></b></td>
</tr>

<tr>
<td><b><font size="5">HoG</font></b></td>
<td> <img src="html/hog_template_NFS.png" width="100%"/> </td>
<td> <img src="html/hog_template_NFS2.png" width="100%"/> </td>
</tr>

<tr>
<td><b><font size="5">Average Precision</font></b></td>
<td> <img src="html/average_precision_NFS.png" width="100%"/> </td>
<td> <img src="html/average_precision_NFS2.png" width="100%"/> </td>
</tr>

<tr>
<td><b><font size="5">Recall(Viola Jones)</font></b></td>
<td> <img src="html/Detection rate_NFS.jpg" width="100%"/> </td>
<td> <img src="html/Detection rate_NFS2.jpg" width="100%"/> </td>
</tr>

<tr>
<td><b><font size="5">Sample Result</font></b></td>
<td> <img src="html/detections_Brazil_NFS.png" width="100%"/> </td>
<td> <img src="html/detections_Brazil_NFS2.png" width="100%"/> </td>
</tr>

</table>
</div>

<p>
	In general, our NewFaceSet2 performes better than NewFaceSet. After rough inspection of these two dataset, we find that NewFaceSet contains many pictures that are the same face but from different directions, that is probably why the HoG image of NewFaceSet is not so face-like.
</p>
</div>


<div class="container">
<h3><b>Extra Credit: Interesting Features</b></h3>

<p>
	In search for better recognition, we look into faces that can not be detected and apply different image augmentaion skills including:<br>
<ul>
	<li>When doing multiple scale window detection, apply contrast stretching to each extracted windows before computing HoG.<br>
	Reason: We find that many undetected faces are not equally illuminated, may result in false negative.</li>
	<li>When extracting positive samples, flip each image to get a new positive sample and add it to the positive features.<br>
	Reason: Some false negative are due to different directions of faces, some face even turn up side down.</li>
	<li>When extracting negative samples, downsize image with multiple scales before extracting.<br>
	Reason: This is suggested in the comment, however, we believe random negatives should be good enough.</li>
</ul>
<p>Unfortunately, none of the techniques mentioned above have noticeable improvement on the recognition result.<br>
(HoG_Template_Size=36, HoG_Cell_Size=3, Confidence_Threshold=-0.5, HNM=OFF)
</p>
</p>

<div align="center">
<table border=1>

<tr>
<td></td>
<td><b><font size="5">Contrast Stretching</font></b></td>
<td><b><font size="5">Flipped Face</font></b></td>
<td><b><font size="5">Downsize Negative Samples</font></b></td>
</tr>

<tr>
<td><b><font size="5">Average Precision</font></b></td>
<td> <img src="html/average_precision_histeql.png" width="100%"/> </td>
<td> <img src="html/average_precision_flipface.png" width="100%"/> </td>
<td> <img src="html/average_precision_downsize.png" width="100%"/> </td>
</tr>

</table>
</div>
</div>
</div>


<div class="container">
<h3><b>Best Performance</b></h3>
<p>The best average precision we obtain is 0.937, under the following conditions:<br>
	HoG_Template_Size=36, HoG_Cell_Size=3, Confidence_Threshold=-1.1, HNM=ON
</p>
<div align="center">
<table border=1>

<tr>
<td></td>
<td><b><font size="5">HoG</font></b></td>
<td><b><font size="5">Average Precision</font></b></td>
<td><b><font size="5">Recall(Viola Jones)</font></b></td>
<td><b><font size="5">Sample Results</font></b></td>
</tr>

<tr>
<td><b><font size="5">Best Performance</font></b></td>
<td> <img src="html/hog_template_best.png" width="100%"/> </td>
<td> <img src="html/average_precision_best.png" width="100%"/> </td>
<td> <img src="html/detection rate_best.jpg" width="100%"/> </td>
<td> <img src="html/detections_trekcolr_best.png" width="100%"/> </td>
</tr>

</table>
</div>
<p>However, due to the relativly low confidence threshold, there are many false positives.
</p>
</div>


<div class="container">
<h3><b>Test Result on Extra Test Scenes</b></h3>
<p>
	HoG_Template_Size=36, HoG_Cell_Size=3, Confidence_Threshold=0.95, HNM=ON
</p>
<center>
<p>
<img src="html/detections_cs143_2011_class_easy.jpg.png" width="45%"/>
<img src="html/detections_cs143_2011_class_hard.jpg.png" width="48%"/>
</p>
</center>
<center>
<p>
<img src="html/detections_cs143_2013_class_easy_01.jpg.png" width="45%"/>
<img src="html/detections_cs143_2013_class_easy_02.jpg.png" width="49%"/>
</p>
</center>
<center>
<p>
<img src="html/detections_cs143_2013_class_hard_01.jpg.png" width="60%"/>
<img src="html/detections_cs143_2013_class_hard_02.jpg.png" width="60%"/>
<img src="html/detections_cs143_2013_class_hard_03.jpg.png" width="60%"/>
</p>
</center>

</div>


<center><p>
	The END
<p></center>

</body>
</html>