This repository provides easy access to various datasets to be used in the Neural Network module of Owl. Currently it mainly includes MNIST and CIFAR-10.
You can get all these data in Owl by executing: Dataset.download_all ().
Thedata are downloaded in the home directory, for example, ~/.owl/dataset on Linux.
The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. Each example is of size 28 x 28.
You can get MNIST data via these Owl functions:
-
Dataset.load_mnist_train_data (): returns a tripletx, y, y'.xis a [60000, 784] ndarray (Owl_dense_ndarray.S.mat) where each row represents a [28, 28] image.yis a [60000, 1] label ndarray. Each row is an integer ranging from 0 to 9, indicating the digit on each image.y'is a [60000, 10] label ndarray. Each one-hot row vector corresponds to one label.
-
Dataset.load_mnist_test_data (): returns a triplet. Similar toload_mnist_train_data, only that it returns the test set, so the example size is 10,000 instead of 60,000. -
Dataset.load_mnist_train_data_arr (): similar toload_mnist_train_data, but returnsxas [60000,28,28,1] ndarray -
Dataset.load_mnist_test_data_arr (): similar toload_mnist_train_data_arr, but it returns the test set, so the example size is 10, 000 instead of 60, 000. -
Dataset.draw_samples x y ndrawsnrandom examples from images ndarrayxand lable ndarrayy.
The CIFAR-10 dataset consists of 60,000 32 x 32 colour images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images.
Due to the limit of file size on Github, the training set is cut into 5 smaller
batch. You can get CIFAR-10 data using Owl:
-
Dataset.load_cifar_train_data batch: returns a tripletx, y, y'.- The input paramter
batchcan range from 1 to 5, indicating which training set batch to choose. xis an [10000, 32, 32, 3] ndarray (Owl_dense_ndarray.S.arr). The last dimension indicates color channels (first Red, then Green, finally Blue).yis an [10000, 1] label ndarray, each number representing an image class.y'is the corresponding [10000, 10] one-hot label ndarray.
- The input paramter
-
Dataset.load_cifar_test_data (): similar toload_cifar_train_data, only that it loads test data. -
Dataset.draw_samples_cifar x y ndrawsnrandom examples from images ndarrayxand lable ndarrayy.
Note that all elements in the loaded matrices and ndarrays are of float32 format.