-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
169 lines (133 loc) · 8.77 KB
/
index.html
File metadata and controls
169 lines (133 loc) · 8.77 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<!-- Begin Jekyll SEO tag v2.5.0 -->
<title> ARDIS | Blekinge Tekniska Högskola, Department of Computer Science</title>
</head>
<body>
<header class="page-header" role="banner">
<a href='https://www.bth.se/'>
<img src="https://via.tt.se/data/images/00933/401e8804-268f-4b76-bab1-12d227c81b74-w_960_h_960.png" width="100" height="100" alt="logo"/>
</a>
<font size="7" color="#78281f"><center>ARDIS</center></font>
<h2 class = "project-name"><center><img src="https://raw.githubusercontent.com/ardisdataset/ARDIS/master/Im.png"></center></h2>
<br><h3 class="project-tagline"><center>Department of Computer Science, Blekinge Tekniska Högskola, SE-371 79, Karlskrona, Sweden.</h3></center>
</header>
<hr>
<p align="justify"><b><h3>I. Description of the Data Sets</h3></b></p>
This is a new image-based handwritten historical digit dataset named ARDIS (Arkiv Digital Sweden). The images in ARDIS dataset are extracted from 15.000 Swedish church records which were written by different priests with various handwriting styles in the nineteenth and twentieth centuries. The constructed dataset consists of three single digit datasets and one digit strings dataset. The digit strings dataset includes 10.000 samples in Red-Green-Blue (RGB) color space, whereas, the other datasets contain 7.600 single digit images in different color spaces. Figure 1 illustrates handwritten digit images from different datasets in ARDIS.
<p><center> <img src="https://raw.githubusercontent.com/ardisdataset/ARDIS/master/ARDIS_.png" width="500" height="400">
<br><br><br> Figure 1. Examples of handwritten digit images from the different datasets in ARDIS.</center></p>
<p align="justify"><b><h3>II. Use of the Materials</h3></b></p>
The users of the ARDIS Data Set must agree that:
<ol>
<li> The use of the data set is restricted to research purpose only </li>
<li> No redistribution of the dataset is allowed </li>
<li> In any resultant publications of research that uses the dataset, due credits will be provided to:
<br> <b>Huseyin Kusetogullari, Amir Yavariabdi, Abbas Cheddad, Håkan Grahn and Johan Hall, 2019, "ARDIS: A Swedish Historical Handwritten Digit Dataset," Neural Computing and Applications, Springer.
DOI: 10.1007/s00521-019-04163-3 </b> </li></ol>
<br>Link to the paper, <a href="https://link.springer.com/article/10.1007/s00521-019-04163-3">Click here</a>
<p align="justify"><b><h3>III. Download Links</h3></b></p>
<!--Info of Dataset I-->
<br><font color = "brown">#### ARDIS DATASET_I: </font><mark>This dataset has been updated 2020-04-04</mark>
<br>This date string image data set contains 10000 images of four digit characters and is divided into the following three parts cropped automatically from the original full document images (
more info here >> <a href = "https://raw.githubusercontent.com/ardisdataset/ARDIS//master/Readme.pdf"> Readme.pdf </a><<) :
<ol>
<li><a href= "https://raw.githubusercontent.com/ardisdataset/ARDIS//
Updates-Date-String/Date Strings Part I.zip">Part I:</a> This set contains 3977 RGB images in JPG format.</li>
<li><a href= "https://raw.githubusercontent.com/ardisdataset/ARDIS//
Updates-Date-String/Date Strings Part II.zip">Part II:</a> This set contains 4503 RGB images in JPG format.</li>
<li><a href= "https://raw.githubusercontent.com/ardisdataset/ARDIS//
Updates-Date-String/Date Strings Part III.zip">Part III:</a> This set contains 1520 RGB images in JPG format.</li>
</ol>
<br> <center><img src="https://raw.githubusercontent.com/ardisdataset/ARDIS/master/Distribution.png" width="933" height="514">
<br> Figure 2. Date Distribution. </center>
<p>
<!--Info of Dataset II-->
<br><font color = "brown">#### ARDIS DATASET_II: </font>
<br>This dataset contains 7600 corrupted and noisy handwritten digit images. You can use 6600 images for training and 1000 for testing.
<p> ARDIS_DATASET_II download link: <a href="https://raw.githubusercontent.com/ardisdataset/ARDIS/master/ARDIS_DATASET_II.rar">Click here</a>
<p>
<p><center> <img src="https://raw.githubusercontent.com/ardisdataset/ARDIS/master/digits.png" width="250" height="150">
<br> Figure 3. Corrupted Handwritten Digit Images. </center>
<p>
<!--Info of Dataset III-->
<br><font color = "brown">#### ARDIS DATASET_III: </font>
<br>This dataset contains 7600 handwritten digit images with clean background. You can use 6600 images for training and 1000 for testing.
<p>
<p> ARDIS_DATASET_III download link: <a href="https://raw.githubusercontent.com/ardisdataset/ARDIS//master/ARDIS_DATASET_III.rar">Click here</a>
<p><center> <img src="https://raw.githubusercontent.com/ardisdataset/ARDIS/master/digits2.png" width="250" height="150">
<br> Figure 4. Handwritten Digit Images. </center>
<!--Info of Dataset IV-->
<br><font color = "brown">#### ARDIS DATASET_IV: </font>
<br>This dataset contains 6600 training and 1000 testing images in .csv files. The digit images in this dataset are same format with the MNIST and the USPS digit image datasets.
The results of different machine learning methods in our accepted paper show that the ARDIS dataset is different than the MNIST and the USPS datasets.
<ol>
<li>ARDIS_train_2828.csv</li>
<li>ARDIS_train_labels.csv</li>
<li>ARDIS_test_2828.csv</li>
<li>ARDIS_test_labels.csv</li>
</ol>
<p> ARDIS_DATASET_IV download link: <a href="https://raw.githubusercontent.com/ardisdataset/ARDIS/master/ARDIS_DATASET_IV.rar">Click here</a></p>
<p>
<p><center> <img src="https://raw.githubusercontent.com/ardisdataset/ARDIS/master/digits3.png" width="500" height="200">
<br> Figure 5. Illustration of digit values from 0 to 9: a) ARDIS, b) MNIST, and c) USPS </center>
<p>
<p align="justify"><b><h3> IV. Implementation</h3></b></p>
<p>
#### DATASET_IV
<br>#### In Python
<font face="Courier New">
<br>x_train=np.loadtxt('.../ARDIS_train_2828.csv', dtype='float')
<br>x_test=np.loadtxt('.../ARDIS_test_2828.csv', dtype='float')
<br>y_train=np.loadtxt('.../ARDIS_train_labels.csv', dtype='float')
<br>y_test=np.loadtxt('.../ARDIS_test_labels.csv', dtype='float') </font>
</p>
<p>
<br>#### reshape to be [samples][pixels][width][height]
<font face="Courier New">
<br>x_train = x_train.reshape(x_train.shape[0], 1, 28, 28).astype('float32')
<br>x_test = x_test.reshape(x_test.shape[0], 1, 28, 28).astype('float32') </font>
</p>
<p align="justify"><b><h3>V. Contributions from other Researchers to ARDIS</h3></b></p>
<br>
<ol type = "1">
<li><b>Brazil: </b> Bounding box annotations (BBA) using Darknet-YOLO
<br>This set contains the BBA of the ARDIS DATASET_I (10K 4 digits string) <a href= "https://www.abbascheddad.net/Upload/ARDIS_BBA.zip"> Click here</a> .</li>
<br> If you use this annotation & ARDIS in your research, please credit the authors by citing:
<br><b>For the BBA:</b> Hochuli, A. G.; Britto JR, A. S. ; J. P. Barddal; Oliveira, L.S. ; Sabourin, R. "An End-To-End Approach for Recognition of Modern and Historical Handwritten Numeral Strings,"
In: IEEE International Joint Conference on Neural Networks, 2020.
<br><b> For ARDIS:</b> Kusetogullari, H.; Yavariabdi, A.; Cheddad, A.; Grahn, H.; Hall, J. "ARDIS: A Swedish Historical Handwritten Digit Dataset,"
Neural Computing and Applications, 2019, Springer. DOI: 10.1007/s00521-019-04163-3.
<br><br> How to interpret the annotation file?
<br> <center><img src="https://raw.githubusercontent.com/ardisdataset/ARDIS/Updates-Date-String/BBA_Explained.png" width="300" height="125">
</li>
</ol>
<p align="justify"><b><h3>VI. Feedback or Comments</h3></b></p>
<br> We will be pleased to get your feedback/suggestions to improve the dataset.
<p> <img src="https://raw.githubusercontent.com/ardisdataset/ARDIS/master/email.png" width="582" height="343">
</p>
<br>
<b>P.S: <mark>Interested in whole page historical handwritten documents? Click here for our <a href= "https://ardisdataset.github.io/SHIBR/" target = "_blank">SHIBR</a> (the Swedish Historical Birth Records) dataset. It is a semi-annotated dataset.</mark></b>
<br>
<br> Karlskrona, Sweden on: 2019-04-02 </p>
<p align="justify">Blekinge Institute of Technology</p>
</body>
</html>
<!-- Default Statcounter code for Ardisdataset.github.io
ARDIS https://ardisdataset.github.io/ARDIS/ -->
<script type="text/javascript">
var sc_project=11978811;
var sc_invisible=1;
var sc_security="c8d9473b";
</script>
<script type="text/javascript"
src="https://www.statcounter.com/counter/counter.js"
async></script>
<noscript><div class="statcounter"><a title="web statistics"
href="https://statcounter.com/" target="_blank"><img
class="statcounter"
src="https://c.statcounter.com/11978811/0/c8d9473b/1/"
alt="web statistics"></a></div></noscript>
<!-- End of Statcounter Code -->