-
Notifications
You must be signed in to change notification settings - Fork 216
[ENH] Add whole-series ROCKAD anomaly detector #2871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pattplatt
wants to merge
12
commits into
aeon-toolkit:main
Choose a base branch
from
pattplatt:rockad_whole_series
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
1953830
reverted commits to sync with main, added whole series rockad impleme…
pattplatt a242919
added necessary tags
pattplatt 45fd693
fixed import of example code, added skip commands that doctests pass
pattplatt a9fd785
fixed learning_type parameter
pattplatt ae4017a
corrected learning_type tag to contain binary label
pattplatt 3028af4
Merge branch 'main' into rockad_whole_series
MatthewMiddlehurst 8682b2a
Merge branch 'main' into rockad_whole_series
TonyBagnall 9878e10
Merge remote-tracking branch 'origin/main' into rockad_whole_series
MatthewMiddlehurst 7469cfd
n_jobs
MatthewMiddlehurst d167c55
better random
MatthewMiddlehurst 503eeaa
correct check_random_state
MatthewMiddlehurst 6af856c
correct check_n_jobs
MatthewMiddlehurst File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
"""ROCKAD anomaly detector.""" | ||
|
||
__maintainer__ = [] | ||
__all__ = ["ROCKAD"] | ||
|
||
import warnings | ||
from typing import Optional | ||
|
||
import numpy as np | ||
from sklearn.neighbors import NearestNeighbors | ||
from sklearn.preprocessing import PowerTransformer | ||
from sklearn.utils import check_random_state, resample | ||
|
||
from aeon.anomaly_detection.collection.base import BaseCollectionAnomalyDetector | ||
from aeon.transformations.collection.convolution_based import Rocket | ||
from aeon.utils.validation import check_n_jobs | ||
|
||
|
||
class ROCKAD(BaseCollectionAnomalyDetector): | ||
""" | ||
ROCKET-based whole-series Anomaly Detector (ROCKAD). | ||
|
||
ROCKAD [1]_ leverages the ROCKET transformation for feature extraction from | ||
time series data and applies the scikit learn k-nearest neighbors (k-NN) | ||
approach with bootstrap aggregation for robust semi-supervised anomaly detection. | ||
The data gets transformed into the ROCKET feature space. | ||
Then the whole-series are compared based on the feature space by | ||
finding the nearest neighbours. The time-point based ROCKAD anomaly detector | ||
can be found at aeon/anomaly_detection/series/distance_based/_rockad.py | ||
|
||
This class supports both univariate and multivariate time series and | ||
provides options for normalizing features, applying power transformations, | ||
and customizing the distance metric. | ||
|
||
Parameters | ||
---------- | ||
n_estimators : int, default=10 | ||
Number of k-NN estimators to use in the bootstrap aggregation. | ||
n_kernels : int, default=100 | ||
Number of kernels to use in the ROCKET transformation. | ||
normalise : bool, default=False | ||
Whether to normalize the ROCKET-transformed features. | ||
n_neighbors : int, default=5 | ||
Number of neighbors to use for the k-NN algorithm. | ||
n_jobs : int, default=1 | ||
Number of parallel jobs to use for the k-NN algorithm and ROCKET transformation. | ||
metric : str, default="euclidean" | ||
Distance metric to use for the k-NN algorithm. | ||
power_transform : bool, default=True | ||
Whether to apply a power transformation (Yeo-Johnson) to the features. | ||
random_state : int, default=None | ||
Random seed for reproducibility. | ||
|
||
Attributes | ||
---------- | ||
rocket_transformer_ : Optional[Rocket] | ||
Instance of the ROCKET transformer used to extract features, set after fitting. | ||
list_baggers_ : Optional[list[NearestNeighbors]] | ||
List containing k-NN estimators used for anomaly scoring, set after fitting. | ||
power_transformer_ : PowerTransformer | ||
Transformer used to apply power transformation to the features. | ||
|
||
References | ||
---------- | ||
.. [1] Theissler, A., Wengert, M., Gerschner, F. (2023). | ||
ROCKAD: Transferring ROCKET to Whole Time Series Anomaly Detection. | ||
In: Crémilleux, B., Hess, S., Nijssen, S. (eds) Advances in Intelligent | ||
Data Analysis XXI. IDA 2023. Lecture Notes in Computer Science, | ||
vol 13876. Springer, Cham. https://doi.org/10.1007/978-3-031-30047-9_33 | ||
|
||
Examples | ||
-------- | ||
>>> import numpy as np | ||
>>> from aeon.anomaly_detection.collection import ROCKAD | ||
>>> rng = np.random.default_rng(seed=42) | ||
>>> X_train = rng.normal(loc=0.0, scale=1.0, size=(10, 100)) | ||
>>> X_test = rng.normal(loc=0.0, scale=1.0, size=(5, 100)) | ||
>>> X_test[4][50:58] -= 5 | ||
>>> detector = ROCKAD() # doctest: +SKIP | ||
>>> detector.fit(X_train) # doctest: +SKIP | ||
>>> detector.predict(X_test) # doctest: +SKIP | ||
array([24.11974147, 23.93866453, 21.3941765 , 22.26811959, 64.9630108 ]) | ||
""" | ||
|
||
_tags = { | ||
"anomaly_output_type": "anomaly_scores", | ||
"learning_type:semi_supervised": True, | ||
"capability:univariate": True, | ||
"capability:multivariate": True, | ||
"capability:missing_values": False, | ||
"capability:multithreading": True, | ||
"fit_is_empty": False, | ||
} | ||
|
||
def __init__( | ||
self, | ||
n_estimators=10, | ||
n_kernels=100, | ||
normalise=False, | ||
n_neighbors=5, | ||
metric="euclidean", | ||
power_transform=True, | ||
n_jobs=1, | ||
random_state=None, | ||
MatthewMiddlehurst marked this conversation as resolved.
Show resolved
Hide resolved
|
||
): | ||
|
||
self.n_estimators = n_estimators | ||
self.n_kernels = n_kernels | ||
self.normalise = normalise | ||
self.n_neighbors = n_neighbors | ||
self.n_jobs = n_jobs | ||
self.metric = metric | ||
self.power_transform = power_transform | ||
self.random_state = random_state | ||
|
||
self.rocket_transformer_: Optional[Rocket] = None | ||
self.list_baggers_: Optional[list[NearestNeighbors]] = None | ||
self.power_transformer_: Optional[PowerTransformer] = None | ||
|
||
super().__init__() | ||
|
||
def _fit(self, X: np.ndarray, y: Optional[np.ndarray] = None): | ||
self._n_jobs = check_n_jobs(self.n_jobs) | ||
MatthewMiddlehurst marked this conversation as resolved.
Show resolved
Hide resolved
|
||
rng = check_random_state(self.random_state) | ||
|
||
self.rocket_transformer_ = Rocket( | ||
n_kernels=self.n_kernels, | ||
normalise=self.normalise, | ||
n_jobs=self._n_jobs, | ||
random_state=self.random_state, | ||
) | ||
# XT: (n_cases, n_kernels*2) | ||
Xt = self.rocket_transformer_.fit_transform(X) | ||
Xt = Xt.astype(np.float64) | ||
|
||
if self.power_transform: | ||
self.power_transformer_ = PowerTransformer() | ||
try: | ||
Xtp = self.power_transformer_.fit_transform(Xt) | ||
|
||
except Exception: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why does power_transform fail? Perhaps also print out the thrown exception in the warning? |
||
warnings.warn( | ||
"Power Transform failed and thus has been disabled. ", | ||
UserWarning, | ||
stacklevel=2, | ||
) | ||
self.power_transformer_ = None | ||
Xtp = Xt | ||
else: | ||
Xtp = Xt | ||
|
||
self.list_baggers_ = [] | ||
|
||
for _ in range(self.n_estimators): | ||
# Initialize estimator | ||
estimator = NearestNeighbors( | ||
n_neighbors=self.n_neighbors, | ||
n_jobs=self._n_jobs, | ||
metric=self.metric, | ||
algorithm="kd_tree", | ||
) | ||
# Bootstrap Aggregation | ||
Xtp_scaled_sample = resample( | ||
Xtp, | ||
replace=True, | ||
n_samples=None, | ||
random_state=rng.randint(np.iinfo(np.int32).max), | ||
stratify=None, | ||
MatthewMiddlehurst marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
|
||
# Fit estimator and append to estimator list | ||
estimator.fit(Xtp_scaled_sample) | ||
self.list_baggers_.append(estimator) | ||
|
||
def _predict(self, X) -> np.ndarray: | ||
MatthewMiddlehurst marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
Return the anomaly scores for the input data. | ||
|
||
Parameters | ||
---------- | ||
X (array-like): The input data. | ||
|
||
Returns | ||
------- | ||
np.ndarray: The predicted probabilities. | ||
""" | ||
y_scores = np.zeros((len(X), self.n_estimators)) | ||
# Transform into rocket feature space | ||
# XT: (n_cases, n_kernels*2) | ||
Xt = self.rocket_transformer_.transform(X) | ||
|
||
Xt = Xt.astype(np.float64) | ||
|
||
if self.power_transformer_ is not None: | ||
# Power Transform using yeo-johnson | ||
Xtp = self.power_transformer_.transform(Xt) | ||
|
||
else: | ||
Xtp = Xt | ||
|
||
for idx, bagger in enumerate(self.list_baggers_): | ||
# Get scores from each estimator | ||
distances, _ = bagger.kneighbors(Xtp) | ||
|
||
# Compute mean distance of nearest points in window | ||
scores = distances.mean(axis=1).reshape(-1, 1) | ||
scores = scores.squeeze() | ||
|
||
y_scores[:, idx] = scores | ||
|
||
# Average the scores to get the final score for each whole-series | ||
collection_anomaly_scores = y_scores.mean(axis=1) | ||
|
||
return collection_anomaly_scores |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
"""Tests for whole-series anomaly detection.""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
"""Tests for the ROCKAD anomaly detector.""" | ||
|
||
import numpy as np | ||
import pytest | ||
from sklearn.utils import check_random_state | ||
|
||
from aeon.anomaly_detection.collection import ROCKAD | ||
|
||
|
||
def test_rockad_univariate(): | ||
"""Test ROCKAD univariate output.""" | ||
rng = check_random_state(seed=2) | ||
train_series = rng.normal(loc=0.0, scale=1.0, size=(10, 100)) | ||
test_series = rng.normal(loc=0.0, scale=1.0, size=(5, 100)) | ||
|
||
test_series[0][50:58] -= 5 | ||
|
||
ad = ROCKAD(n_estimators=100, n_kernels=10, n_neighbors=9) | ||
|
||
ad.fit(train_series) | ||
pred = ad.predict(test_series) | ||
|
||
assert pred.shape == (5,) | ||
assert pred.dtype == np.float64 | ||
assert 0 <= np.argmax(pred) <= 1 | ||
|
||
|
||
def test_rockad_multivariate(): | ||
"""Test ROCKAD multivariate output.""" | ||
rng = check_random_state(seed=2) | ||
train_series = rng.normal(loc=0.0, scale=1.0, size=(10, 3, 100)) | ||
test_series = rng.normal(loc=0.0, scale=1.0, size=(5, 3, 100)) | ||
|
||
test_series[0][0][50:58] -= 5 | ||
|
||
ad = ROCKAD(n_estimators=1000, n_kernels=100, n_neighbors=9) | ||
|
||
ad.fit(train_series) | ||
pred = ad.predict(test_series) | ||
|
||
assert pred.shape == (5,) | ||
assert pred.dtype == np.float64 | ||
assert 0 <= np.argmax(pred) <= 1 | ||
|
||
|
||
def test_rockad_incorrect_input(): | ||
"""Test ROCKAD with invalid inputs.""" | ||
rng = check_random_state(seed=2) | ||
series = rng.normal(size=(10, 5)) | ||
|
||
with pytest.warns( | ||
UserWarning, match=r"Power Transform failed and thus has been disabled." | ||
): | ||
ad = ROCKAD() | ||
ad.fit(series) | ||
Comment on lines
+46
to
+55
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems to be failing (or inconsistently passing)? |
||
|
||
train_series = rng.normal(loc=0.0, scale=1.0, size=(10, 100)) | ||
test_series = rng.normal(loc=0.0, scale=1.0, size=(3, 100)) | ||
|
||
with pytest.raises( | ||
ValueError, | ||
match=( | ||
r"Expected n_neighbors <= n_samples_fit, but n_neighbors = 100, " | ||
r"n_samples_fit = 10, n_samples = 3" | ||
), | ||
): | ||
ad = ROCKAD(n_estimators=100, n_kernels=10, n_neighbors=100) | ||
ad.fit(train_series) | ||
ad.predict(test_series) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the internal type hints, may as well have type hints on the constructor argument?