Skip to content

Commit 92d3b4a

Browse files
denklegithub-actions[bot]mikeheddes
authored
Add Adult and Abalone datasets (#96)
* Create the first attempt to integrate datasets from Do we need 100s.. * [github-action] formatting fixes * Faster download and extraction * [github-action] formatting fixes * Move dataset to Google Drive and add download progress bar * [github-action] formatting fixes * Add tqdm dependency * Revisting logic of assigning data w.r.t. variables * [github-action] formatting fixes * Fix google drive download link extraction * Rework classes to streamline inclusion of new datasets from the collections * [github-action] formatting fixes * Revised some logic of classes Resovling the merge conflict * [github-action] formatting fixes * Delete collection_datasets.py * Removed DS store * Delete __init__.py * Refactor datasets * [github-action] formatting fixes * Update workflow python version * Refactor data loading * Update docs Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: mikeheddes <[email protected]>
1 parent 1fd1344 commit 92d3b4a

File tree

9 files changed

+472
-9
lines changed

9 files changed

+472
-9
lines changed

.github/workflows/test.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,12 @@ permissions:
1111
jobs:
1212
test:
1313
name: Test with Python ${{ matrix.python-version }} on ${{ matrix.os }}
14-
runs-on: ubuntu-latest
14+
runs-on: ${{ matrix.os }}
1515
timeout-minutes: 10
1616
strategy:
1717
matrix:
18-
python-version: ['3.6', '3.8', '3.10']
19-
os: [ubuntu-latest, windows-latest, macOS-latest]
18+
python-version: ['3.8', '3.9', '3.10']
19+
os: [ubuntu-latest, windows-latest, macos-latest]
2020

2121
steps:
2222
- uses: actions/checkout@v3

dev-requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,5 @@ requests
55
numpy
66
flake8
77
pytest
8-
black
8+
black
9+
tqdm

docs/datasets.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,17 @@ The Torchhd library provides many popular built-in datasets to work with.
1919
EMGHandGestures
2020
PAMAP
2121
CyclePowerPlant
22+
Abalone
23+
Adult
24+
25+
26+
Base classes
27+
------------------------
28+
29+
.. autosummary::
30+
:toctree: generated/
31+
:template: class_dataset.rst
32+
33+
CollectionDataset
34+
DatasetFourFold
35+
DatasetTrainTest

setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
"pandas",
2424
"numpy",
2525
"requests",
26+
"tqdm",
2627
],
2728
packages=find_packages(exclude=["docs", "torchhd.tests", "examples"]),
2829
python_requires=">=3.6, <4",

torchhd/datasets/__init__.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@
66
from torchhd.datasets.emg_hand_gestures import EMGHandGestures
77
from torchhd.datasets.pamap import PAMAP
88
from torchhd.datasets.ccpp import CyclePowerPlant
9+
from torchhd.datasets.dataset import CollectionDataset
10+
from torchhd.datasets.dataset import DatasetFourFold
11+
from torchhd.datasets.dataset import DatasetTrainTest
12+
from torchhd.datasets.abalone import Abalone
13+
from torchhd.datasets.adult import Adult
14+
915

1016
__all__ = [
1117
"BeijingAirQuality",
@@ -16,4 +22,9 @@
1622
"EMGHandGestures",
1723
"PAMAP",
1824
"CyclePowerPlant",
25+
"CollectionDataset",
26+
"DatasetFourFold",
27+
"DatasetTrainTest",
28+
"Abalone",
29+
"Adult",
1930
]

torchhd/datasets/abalone.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
from typing import List
2+
from torchhd.datasets import DatasetFourFold
3+
4+
5+
class Abalone(DatasetFourFold):
6+
"""`Abalone <https://archive.ics.uci.edu/ml/datasets/abalone>`_ dataset.
7+
8+
Args:
9+
root (string): Root directory containing the files of the dataset.
10+
train (bool, optional): If True, returns training (sub)set from the file storing training data as further determined by fold and hyper_search variables.
11+
Otherwise returns a subset of train dataset if hypersearch is performed (``hyper_search = True``) if not (``hyper_search = False``) returns a subset of training dataset
12+
as specified in ``conxuntos_kfold.dat`` if fold number is correct. Otherwise issues an error.
13+
fold (int, optional): Specifies which fold number to use. The default value of -1 returns all the training data from the corresponding file.
14+
Values between 0 and 3 specify, which fold in ``conxuntos_kfold.dat`` to use. Relevant only if hyper_search is set to False and ``0 <= fold <= 3``.
15+
Indices in even rows (zero indexing) of ``conxuntos_kfold.dat`` correspond to train subsets while indices in odd rows correspond to test subsets.
16+
hyper_search (bool, optional): If True, creates dataset using indeces in ``conxuntos.dat``. This split is used for hyperparameter search. The first row corresponds to train indices (used if ``train = True``)
17+
while the second row corresponds to test indices (used if ``train = False``).
18+
transform (callable, optional): A function/transform that takes in an torch.FloatTensor
19+
and returns a transformed version.
20+
target_transform (callable, optional): A function/transform that takes in the
21+
target and transforms it.
22+
download (bool, optional): If True, downloads the dataset from the internet and
23+
puts it in root directory. If dataset is already downloaded, it is not
24+
downloaded again.
25+
"""
26+
27+
name = "abalone"
28+
classes: List[str] = [
29+
"0",
30+
"1",
31+
"2",
32+
]

torchhd/datasets/adult.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
from typing import List
2+
from torchhd.datasets import DatasetTrainTest
3+
4+
5+
class Adult(DatasetTrainTest):
6+
"""`Adult <https://archive.ics.uci.edu/ml/datasets/adult>`_ dataset.
7+
8+
Args:
9+
root (string): Root directory containing the files of the dataset.
10+
train (bool, optional): If True, returns training (sub)set from the file storing training data as further determined by hyper_search variable.
11+
Otherwise returns a subset of train dataset if hyperparameter search is performed (``hyper_search = True``) if not (``hyper_search = False``) returns test set.
12+
hyper_search (bool, optional): If True, creates dataset using indices in ``conxuntos.dat``. This split is used for hyperparameter search. The first row corresponds to train indices (used if ``train = True``)
13+
while the second row corresponds to test indices (used if ``train = False``).
14+
transform (callable, optional): A function/transform that takes in an torch.FloatTensor
15+
and returns a transformed version.
16+
target_transform (callable, optional): A function/transform that takes in the
17+
target and transforms it.
18+
download (bool, optional): If True, downloads the dataset from the internet and
19+
puts it in root directory. If dataset is already downloaded, it is not
20+
downloaded again.
21+
"""
22+
23+
name = "adult"
24+
classes: List[str] = [
25+
">50K",
26+
"<=50K",
27+
]

0 commit comments

Comments
 (0)