Skip to content

Commit 5f62ca1

Browse files
authored
Add support for empty levels #minor (#37)
* Add support for empty levels * Refactor predict method for local classifier per parent node * Refactor predict() method for local classifier per level * Add black linting and badge
1 parent 233701e commit 5f62ca1

19 files changed

+568
-373
lines changed

.github/workflows/deploy-pypi.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ jobs:
3030
python -m pip install .
3131
- name: Test with pytest
3232
run: |
33-
pytest -v
33+
pytest -v --flake8 --pydocstyle --cov=hiclass --cov-fail-under=90 --cov-report html
3434
coverage xml
3535
- name: Upload Coverage to Codecov
3636
if: matrix.os == 'ubuntu-latest'

.github/workflows/test-pr.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,12 @@ on:
66
- main
77

88
jobs:
9-
build:
9+
lint:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v2
13+
- uses: psf/black@stable
14+
test:
1015
runs-on: ${{ matrix.os }}
1116
strategy:
1217
fail-fast: false
@@ -29,4 +34,4 @@ jobs:
2934
python -m pip install .
3035
- name: Test with pytest
3136
run: |
32-
pytest -v
37+
pytest -v --flake8 --pydocstyle --cov=hiclass --cov-fail-under=90 --cov-report html

README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
HiClass is an open-source Python library for hierarchical classification compatible with scikit-learn.
44

5-
[![Deploy PyPI](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml/badge.svg?event=push)](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml) [![Documentation Status](https://readthedocs.org/projects/hiclass/badge/?version=latest)](https://hiclass.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/mirand863/hiclass/branch/main/graph/badge.svg?token=PR8VLBMMNR)](https://codecov.io/gh/mirand863/hiclass) [![Downloads PyPI](https://static.pepy.tech/personalized-badge/hiclass?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=pypi)](https://pypi.org/project/hiclass/) [![Downloads Conda](https://img.shields.io/conda/dn/conda-forge/hiclass?label=conda)](https://anaconda.org/conda-forge/hiclass) [![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
5+
[![Deploy PyPI](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml/badge.svg?event=push)](https://github.com/mirand863/hiclass/actions/workflows/deploy-pypi.yml) [![Documentation Status](https://readthedocs.org/projects/hiclass/badge/?version=latest)](https://hiclass.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/mirand863/hiclass/branch/main/graph/badge.svg?token=PR8VLBMMNR)](https://codecov.io/gh/mirand863/hiclass) [![Downloads PyPI](https://static.pepy.tech/personalized-badge/hiclass?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=pypi)](https://pypi.org/project/hiclass/) [![Downloads Conda](https://img.shields.io/conda/dn/conda-forge/hiclass?label=conda)](https://anaconda.org/conda-forge/hiclass) [![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
66

77
✨ Here is a **demo** that shows HiClass in action on hierarchical data:
88

@@ -16,7 +16,7 @@ HiClass is an open-source Python library for hierarchical classification compati
1616
- [Who is using HiClass?](#who-is-using-hiclass)
1717
- [Install](#install)
1818
- [Quick start](#quick-start)
19-
- [Step-by-step- walk-through](#step-by-step-walk-through)
19+
- [Step-by-step walk-through](#step-by-step-walk-through)
2020
- [API documentation](#api-documentation)
2121
- [FAQ](#faq)
2222
- [Support](#support)
@@ -34,7 +34,7 @@ HiClass is an open-source Python library for hierarchical classification compati
3434
- **Hierarchical metrics:** HiClass supports the computation of hierarchical precision, recall and f-score, which are more appropriate for hierarchical data than traditional metrics.
3535
- **Compatible with pickle:** Easily store trained models on disk for future use.
3636

37-
**Don't see a feature on this list?** Search our [issue tracker](https://github.com/mirand863/hiclass/issues) if someone has already requested it and add a comment to it explaining your use-case, or open a new issue if not. We prioritize our roadmap based on user feedback, so we'd love to hear from you.
37+
**Any feature missing on this list?** Search our [issue tracker](https://github.com/mirand863/hiclass/issues) to see if someone has already requested it and add a comment to it explaining your use-case. Otherwise, please open a new issue describing the requested feature and possible use-case scenario. We prioritize our roadmap based on user feedback, so we would love to hear from you.
3838

3939
## Benchmarks
4040

@@ -85,7 +85,7 @@ We would love to benchmark with larger datasets, if we can find them in the publ
8585

8686
Here is our public roadmap: https://github.com/mirand863/hiclass/projects/1.
8787

88-
We do Just-In-Time planning, and we tend to reprioritize based on your feedback. Hence, items you see on this roadmap are subject to change. We prioritize features based on the number of people asking for it, features/fixes that are small enough and can be addressed while we work on other related features, features/fixes that help improve stability & relevance and features that address interesting use cases that excite us! If you'd like to have a request prioritized, we ask that you add a detailed use-case for it, either as a comment on an existing issue (besides a thumbs-up) or in a new issue. The detailed context helps.
88+
We do Just-In-Time planning, and we tend to reprioritize based on your feedback. Hence, items you see on this roadmap are subject to change. We prioritize features based on the number of people asking for it, features/fixes that are small enough and can be addressed while we work on other related features, features/fixes that help improve stability & relevance and features that address interesting use cases that excite us! If you would like to have a request prioritized, we ask that you add a detailed use-case for it, either as a comment on an existing issue (besides a thumbs-up) or in a new issue. The detailed context helps.
8989

9090

9191
## Who is using HiClass?
@@ -123,7 +123,7 @@ Here's a quick example showcasing how you can train and predict using a local cl
123123
from hiclass import LocalClassifierPerNode
124124
from sklearn.ensemble import RandomForestClassifier
125125

126-
# define data
126+
# Define data
127127
X_train = [[1], [2], [3], [4]]
128128
X_test = [[4], [3], [2], [1]]
129129
Y_train = [
@@ -152,7 +152,7 @@ from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
152152
from sklearn.linear_model import LogisticRegression
153153
from sklearn.pipeline import Pipeline
154154

155-
# define data
155+
# Define data
156156
X_train = [
157157
'Struggling to repay loan',
158158
'Unable to get annual report',
@@ -220,7 +220,9 @@ Please reach out to [email protected].
220220

221221
## Contributing
222222

223-
We are a small team on a mission to democratize hierarchical classification, and we'll take all the help we can get! If you'd like to get involved, here's information on [contribution guidelines and how to test the code locally](https://github.com/mirand863/hiclass/blob/main/CONTRIBUTING.md).
223+
We are a small team on a mission to democratize hierarchical classification, and we will take all the help we can get! If you would like to get involved, here is information on [contribution guidelines and how to test the code locally](https://github.com/mirand863/hiclass/blob/main/CONTRIBUTING.md).
224+
225+
You can contribute in multiple ways, e.g., reporting bugs, writing or translating documentation, reviewing or refactoring code, requesting or implementing new features, etc.
224226

225227
## Getting the latest updates
226228

docs/examples/README.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
Gallery of Examples
22
===================
33

4-
These examples illustrate the main features of HiClass.
4+
These examples illustrate the main features of HiClass.
5+
6+
.. toctree::
7+
:hidden:

docs/examples/plot_empty_levels.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# -*- coding: utf-8 -*-
2+
"""
3+
==========================
4+
Different Number of Levels
5+
==========================
6+
7+
HiClass supports different number of levels in the hierarchy.
8+
For this example, we will train a local classifier per node
9+
with a hierarchy similar to the following image:
10+
11+
.. figure:: ../algorithms/local_classifier_per_node.svg
12+
:align: center
13+
"""
14+
from sklearn.linear_model import LogisticRegression
15+
16+
from hiclass import LocalClassifierPerNode
17+
18+
# Define data
19+
X_train = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
20+
X_test = [[9, 10], [7, 8], [5, 6], [3, 4], [1, 2]]
21+
Y_train = [
22+
["Bird"],
23+
["Reptile", "Snake"],
24+
["Reptile", "Lizard"],
25+
["Mammal", "Cat"],
26+
["Mammal", "Wolf", "Dog"],
27+
]
28+
29+
# Use random forest classifiers for every node
30+
rf = LogisticRegression()
31+
classifier = LocalClassifierPerNode(local_classifier=rf)
32+
33+
# Train local classifier per node
34+
classifier.fit(X_train, Y_train)
35+
36+
# Predict
37+
predictions = classifier.predict(X_test)
38+
print(predictions)

docs/examples/plot_parallel_training.py

Lines changed: 9 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,9 @@
77
Larger datasets require more time for training.
88
While by default the models in HiClass are trained using a single core,
99
it is possible to train each local classifier in parallel by leveraging the library Ray [1]_.
10-
In this example, we demonstrate how to train a hierarchical classifier in parallel,
11-
using all the cores available, on a mock dataset from Kaggle [2]_.
10+
In this example, we demonstrate how to train a hierarchical classifier in parallel by
11+
setting the parameter :literal:`n_jobs` to use all the cores available. Training
12+
is performed on a mock dataset from Kaggle [2]_.
1213
1314
.. [1] https://www.ray.io/
1415
.. [2] https://www.kaggle.com/datasets/kashnitsky/hierarchical-text-classification
@@ -25,29 +26,15 @@
2526
from hiclass import LocalClassifierPerParentNode
2627

2728

28-
def download(url: str, path: str) -> None:
29-
"""
30-
Download a file from the internet.
31-
32-
Parameters
33-
----------
34-
url : str
35-
The address of the file to be downloaded.
36-
path : str
37-
The path to store the downloaded file.
38-
"""
39-
response = requests.get(url)
40-
with open(path, "wb") as file:
41-
file.write(response.content)
42-
43-
4429
# Download training data
45-
training_data_url = "https://zenodo.org/record/6657410/files/train_40k.csv?download=1"
46-
training_data_path = "train_40k.csv"
47-
download(training_data_url, training_data_path)
30+
url = "https://zenodo.org/record/6657410/files/train_40k.csv?download=1"
31+
path = "train_40k.csv"
32+
response = requests.get(url)
33+
with open(path, "wb") as file:
34+
file.write(response.content)
4835

4936
# Load training data into pandas dataframe
50-
training_data = pd.read_csv(training_data_path).fillna(" ")
37+
training_data = pd.read_csv(path).fillna(" ")
5138

5239
# We will use logistic regression classifiers for every parent node
5340
lr = LogisticRegression(max_iter=1000)

docs/source/conf.py

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,18 @@
1313
#
1414
import os
1515
import sys
16-
sys.path.insert(0, os.path.abspath('./../..'))
17-
sys.path.insert(0, os.path.abspath('./../../hiclass'))
16+
17+
sys.path.insert(0, os.path.abspath("./../.."))
18+
sys.path.insert(0, os.path.abspath("./../../hiclass"))
1819
print(sys.path)
1920

2021
import sphinx_code_tabs
2122

2223
# -- Project information -----------------------------------------------------
2324

24-
project = 'hiclass'
25-
copyright = '2022, Fabio Malcher Miranda, Niklas Köhnecke'
26-
author = 'Fabio Malcher Miranda, Niklas Köhnecke'
25+
project = "hiclass"
26+
copyright = "2022, Fabio Malcher Miranda, Niklas Köhnecke"
27+
author = "Fabio Malcher Miranda, Niklas Köhnecke"
2728

2829

2930
# -- General configuration ---------------------------------------------------
@@ -32,15 +33,15 @@
3233
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
3334
# ones.
3435
extensions = [
35-
'sphinx.ext.autodoc',
36-
'sphinx.ext.napoleon',
37-
'sphinx.ext.autosectionlabel',
38-
'sphinx_code_tabs',
39-
'sphinx_gallery.gen_gallery',
36+
"sphinx.ext.autodoc",
37+
"sphinx.ext.napoleon",
38+
"sphinx.ext.autosectionlabel",
39+
"sphinx_code_tabs",
40+
"sphinx_gallery.gen_gallery",
4041
]
4142

4243
# Add any paths that contain templates here, relative to this directory.
43-
templates_path = ['_templates']
44+
templates_path = ["_templates"]
4445

4546
# List of patterns, relative to source directory, that match files and
4647
# directories to ignore when looking for source files.
@@ -55,12 +56,13 @@
5556
use_rtd_scheme = False
5657
try:
5758
import sphinx_rtd_theme
59+
5860
extensions.extend(["sphinx_rtd_theme"])
5961
use_rtd_scheme = True
6062
except ImportError:
6163
print("sphinx_rtd_theme was not installed, using alabaster as fallback!")
6264

63-
html_theme = 'sphinx_rtd_theme' if use_rtd_scheme else 'alabaster'
65+
html_theme = "sphinx_rtd_theme" if use_rtd_scheme else "alabaster"
6466

6567

6668
# Add any paths that contain custom static files (such as style sheets) here,
@@ -76,6 +78,6 @@
7678
html_theme_options["sidebar_width"] = "230px"
7779

7880
sphinx_gallery_conf = {
79-
'examples_dirs': '../examples',
80-
'gallery_dirs': 'auto_examples',
81-
}
81+
"examples_dirs": "../examples",
82+
"gallery_dirs": "auto_examples",
83+
}

docs/source/index.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,15 @@ Welcome to hiclass' documentation!
3030
:target: https://opensource.org/licenses/BSD-3-Clause
3131
:alt: License
3232

33+
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
34+
:target: https://github.com/psf/black
35+
3336
.. toctree::
34-
:titlesonly:
37+
:includehidden:
38+
:maxdepth: 3
3539

3640
introduction/index
3741
get_started/index
3842
auto_examples/index
3943
algorithms/index
40-
41-
.. toctree::
42-
:maxdepth: 3
43-
44-
api/index
44+
api/index

0 commit comments

Comments
 (0)