Skip to content

21 add possibility to add callable or other metrics within the class #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .coverage
Binary file not shown.
4 changes: 2 additions & 2 deletions .github/workflows/build_wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ on:
jobs:
run_pytest:
name: Run tests on min and max Python versions
runs-on: self-hosted
runs-on: ubuntu-latest
strategy:
fail-fast: true
matrix:
Expand Down Expand Up @@ -61,7 +61,7 @@ jobs:

build_sdist:
name: Build source distribution
runs-on: self-hosted
runs-on: ubuntu-latest
needs: run_pytest
steps:
- uses: actions/checkout@v4
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pr_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:

build_test_sdist:
name: Test source distribution
runs-on: self-hosted
runs-on: ubuntu-latest
needs: run_pytest
strategy:
fail-fast: true
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:
jobs:
pytest:
name: Run pytest
runs-on: self-hosted
runs-on: ubuntu-latest
strategy:
fail-fast: true
matrix:
Expand Down
44 changes: 44 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,33 @@
# Changelog

All notable changes to this project will be documented in this file.

## [1.4.0] - 2025-06-19

### Contributors

- [@quentinhaenn](Quentin Haenn) - Main developer and maintainer

### Added

- Added support for custom MDS solvers in the `RadiusClustering` class.
- Updated the documentation to include examples of using custom MDS solvers.
- Added more examples and tutorials to the documentation.

### Changed

- Improved documentation and examples for the `RadiusClustering` class.
- Updated the README to reflect the new features and improvements in version 1.4.0
- Updated the test cases to ensure compatibility with the new features.
- Refactored the main codebase to improve readability and maintainability.
- Prepared the codebase for future adds of MDS solvers and/or clustering algorithms.

## [1.3.0] - 2025-06-18

### Contributors

- [@quentinhaenn](Quentin Haenn) - Main developer and maintainer

### Added

- Full test coverage for the entire codebase.
Expand All @@ -17,3 +43,21 @@
- Updated all the attributes in the `RadiusClustering` class to fit `scikit-learn` standards and conventions.
- Updated the tests cases to reflect the changes in the `RadiusClustering` class.
- Updated README and documentation to reflect the new `radius` parameter and the deprecation of `threshold`.

## [1.2.0] - 2024-10

### Contributors

- [@quentinhaenn](Quentin Haenn) - Main developer and maintainer
- [@mickaelbaron](Mickaël Baron) - Contributor and maintainer

### Added

- Added CI/CD pipelines with GitHub Actions for automated testing and deployment.
- Added package metadata for better integration with PyPI.
- Added a badge for the GitHub Actions workflow status in the README.
- Added a badge for the Python version supported in the README.
- Added a badge for the code style (Ruff) in the README.
- Added a badge for the license in the README.
- Added CI/CD pipelines for PyPI deployment (including test coverage, compiling extensions and wheels, and uploading to PyPI).
- Resolving issues with compiling Cython extensions on Windows and MacOS.
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,24 @@ Radius clustering is a Python package that implements clustering under radius co
- Compatible with scikit-learn's API for clustering algorithms
- Supports radius-constrained clustering
- Provides options for exact and approximate solutions
- Easy to use and integrate with existing Python data science workflows
- Includes comprehensive documentation and examples
- Full test coverage to ensure reliability and correctness
- Supports custom MDS solvers for flexibility in clustering approaches
- Provides a user-friendly interface for clustering tasks

> [!CAUTION]
> **Deprecation Notice**: The `threshold` parameter in the `RadiusClustering` class has been deprecated. Please use the `radius` parameter instead for specifying the radius for clustering. It is planned to be completely removed in version 2.0.0. The `radius` parameter is now the standard way to define the radius for clustering, aligning with our objective of making the parameters' name more intuitive and user-friendly.

> [!NOTE]
> **NEW VERSIONS**: The package is currently under active development for new features and improvements, including some refactoring and enhancements to the existing codebase. Backwards compatibility is not guaranteed, so please check the [CHANGELOG](CHANGELOG.md) for details on changes and updates.

## Roadmap

- [x] Version 1.4.0:
- [x] Add support for custom MDS solvers
- [x] Improve documentation and examples
- [x] Add more examples and tutorials

## Installation

Expand All @@ -38,7 +56,7 @@ from radius_clustering import RadiusClustering
X = np.random.rand(100, 2) # Generate random data

# Create an instance of MdsClustering
rad_clustering = RadiusClustering(manner="approx", threshold=0.5)
rad_clustering = RadiusClustering(manner="approx", radius=0.5)

# Fit the model to the data
rad_clustering.fit(X)
Expand Down Expand Up @@ -109,5 +127,4 @@ The Radius Clustering work has been funded by:

- [1] [An iterated greedy algorithm for finding the minimum dominating set in graphs](https://www.sciencedirect.com/science/article/pii/S0378475422005055)
- [2] [An exact algorithm for the minimum dominating set problem](https://dl.acm.org/doi/abs/10.24963/ijcai.2023/622)


- [3] [Clustering under radius constraint using minimum dominating set](https://link.springer.com/chapter/10.1007/978-3-031-62700-2_2)
14 changes: 13 additions & 1 deletion docs/source/api.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,19 @@
API Reference
=============

.. automodule:: radius_clustering
This page documents the implementation details of the `radius_clustering` package.

RadiusClustering Class
----------------------

.. autoclass:: radius_clustering.RadiusClustering
:members:
:undoc-members:
:show-inheritance:

Algorithms Module
-----------------
.. automodule:: radius_clustering.algorithms
:members:
:undoc-members:
:show-inheritance:
110 changes: 108 additions & 2 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,20 @@
Usage
=====

Here's a basic example of how to use Radius Clustering:
This page provides a quick guide on how to use the `radius_clustering` package for clustering tasks. The package provides a simple interface for performing radius-based clustering on datasets based on the Minimum Dominating Set (MDS) algorithm.

This page is divided into three main sections:
1. **Basic Usage**: A quick example of how to use the `RadiusClustering` class and perform clustering with several parameters.
2. **Custom Dissimilarity Function**: How to use a custom dissimilarity function with the `RadiusClustering` class.
3. **Custom MDS Solver**: How to implement a custom MDS solver for more advanced clustering tasks, eventually with less guarantees on the results.


Basic Usage
-----------------

The `RadiusClustering` class provides a straightforward way to perform clustering based on a specified radius. You can choose between an approximate or exact method for clustering, depending on your needs.

Here's a basic example of how to use Radius Clustering with the `RadiusClustering` class, using the approximate method:

.. code-block:: python

Expand All @@ -22,4 +35,97 @@ Here's a basic example of how to use Radius Clustering:
# Get cluster labels
labels = rad.labels_

print(labels)
print(labels)

Similarly, you can use the exact method by changing the `manner` parameter to `"exact"`:
.. code-block:: python
# [...] Exact same code as above
rad = RadiusClustering(manner="exact", radius=0.5) #change this parameter
# [...] Exact same code as above

Custom Dissimilarity Function
-----------------------------

The main reason behind the `radius_clustering` package is that users eventually needs to use a dissimilarity function that is not a metric (or distance) function. Plus, sometimes context requires a domain-specific dissimilarity function that is not provided by default, and needs to be implemented by the user.

To use a custom dissimilarity function, you can pass it as a parameter to the `RadiusClustering` class. Here's an example of how to do this:
.. code-block:: python

from radius_clustering import RadiusClustering
import numpy as np

# Generate random data
X = np.random.rand(100, 2)

# Define a custom dissimilarity function
def dummy_dissimilarity(x, y):
return np.linalg.norm(x - y) + 0.1 # Example: add a constant to the distance

# Create an instance of MdsClustering with the custom dissimilarity function
rad = RadiusClustering(manner="approx", radius=0.5, metric=dummy_dissimilarity)

# Fit the model to the data
rad.fit(X)

# Get cluster labels
labels = rad.labels_

print(labels)


.. note::
The custom dissimilarity function will be passed to scikit-learn's `pairwise_distances` function, so it should be compatible with the expected input format and return type. See the scikit-learn documentation for more details on how to implement custom metrics.

Custom MDS Solver
-----------------

The two default solvers provided by the actual implementation of the `radius_clustering` package are focused on exactness (or proximity to exactness) of the results of a NP-hard problem. So, they may not be suitable for all use cases, especially when performance is a concern.
If you have your own implementation of a Minimum Dominating Set (MDS) solver, you can use it with the `RadiusClustering` class ny using the :py:func:'RadiusClustering.set_solver' method. It will check that the solver is compatible with the expected input format and return type, and will use it to perform clustering.

.. versionadded:: 1.4.0
The :py:func:`RadiusClustering.set_solver` method was added to allow users to set a custom MDS solver.
It is *NOT* backward compatible with previous versions of the package, as it comes with new structure and methods to handle custom solvers.

Here's an example of how to implement a custom MDS solver and use it with the `RadiusClustering` class, using NetworkX implementation of the dominating set problem :

.. code-block:: python

from radius_clustering import RadiusClustering
import time
import numpy as np
import networkx as nx

# Generate random data
X = np.random.rand(100, 2)

# Define a custom MDS solver using NetworkX
def custom_mds_solver(n, edges, nb_edges, random_state=None):
start = time.time()
graph = nx.Graph(edges)
centers = list(nx.algorithms.dominating_set(graph))
centers.sort()
end = time.time()
return centers, end - start

# Create an instance of MdsClustering with the custom MDS solver
rad = RadiusClustering(manner="approx", radius=0.5)
rad.set_solver(custom_mds_solver)

# Fit the model to the data
rad.fit(X)

# Get cluster labels
labels = rad.labels_

print(labels)

.. note::
The custom MDS solver should accept the same parameters as the default solvers, including the number of points `n`, the edges of the graph `edges`, the number of edges `nb_edges`, and an optional `random_state` parameter for reproducibility. It should return a list of centers and the time taken to compute them.
The `set_solver` method will check that the custom solver is compatible with the expected input format and return type, and will use it to perform clustering.
If the custom solver is not compatible, it will raise a `ValueError` with a descriptive message.

.. attention::
We cannot guarantee that the custom MDS solver will produce the same results as the default solvers, especially if it is not purposely designed to solve the Minimum Dominating Set problem but rather just finds a dominating set. The results may vary depending on the implementation and the specific characteristics of the dataset.
As an example, a benchmark of our solutions and a custom one using NetworkX is available in the `Example Gallery` section of the documentation, which shows that the custom solver may produce different results than the default solvers, especially in terms of the number of clusters and the time taken to compute them (see :ref:`sphx_glr_auto_examples_plot_benchmark_custom.py`).
However, it can be useful for specific use cases where performance is a concern or when you have a custom implementation that fits your needs better.

Loading