Skip to content

ireydiak/pyad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PYAD

Simple Python unsupervised and semi-supervised Anomaly Detection framework that implements deep and shallow machine learning models published in various papers.

Models implemented

Model Paper
ALAD Adversarially Learned Anomaly Detection
DAGMM [Deep Autoencoding Gaussian Mixture Model
For Unsupervised Anomaly Detection](https://bzong.github.io/doc/iclr18-dagmm.pdf)
DeepSVDD Deep One-Class Classification
DROCC DROCC: Deep Robust One-Class Classification
DSEBM Deep Structured Energy Based Models for Anomaly Detection
GOAD Classification-Based Anomaly Detection for General Data
SOM-DAGMM Self-Organizing Map assisted Deep Autoencoding Gaussian Mixture Model for Intrusion Detection
MemAE Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection
NeuTraLAD Neural Transformation Learning for Deep Anomaly Detection Beyond Images
OC-SVM Support Vector Method for Novelty Detection
LOF LOF: identifying density-based local outliers

Installation

Using conda:

conda env create -f environment.yaml

Using pip:

pip install -r requirements.txt

Reproduce experiments

The experiments are defined in YAML configuration files under config. Simply upload your data in the data folder and use the preprocessing scripts to clean them or simply change the variables in _data.yaml to use a different path. Configurations are split in three different files: data, trainer and the model.

python main.py --config=./config/<data_fname> --config=./config/<trainer_fname> --config=./config/<model_fname>

where <data_fname>, <trainer_fname>, and <model_fname> refer respectively to the data, trainer, and model configuration files.

Example training an AutoEncoder on Arrhythmia:

python main.py --config=./config/_data.yaml --config=./config/_trainer.yaml --config=./config/autoencoder.yaml

Powershell utility script

We also provide a utility powershell script to automate training of multiple models on the same dataset. To launch it, simply run the following commands

cd scripts
conda activate <my-env>
./train.ps1 <absolute-path-to-config> <data-config-fname> <trainer-config-fname>

where <my-env>, <absolute-path-to-config>, <data-config-fname>, and <trainer-config-fname> refer respectively to name of your conda environment, the absolute path to the configuration folder, the name of the data configuration file, and the name of the training configuration file.

Example on Arrhythmia:

cd scripts
conda activate pyad
./train.ps1 C:\Users\me\path\to\pyad\config\arrhythmia _data.yaml _trainer.yaml

Neptune logger

Optionally, you can log your experiments using Neptune. To do so, add a logger key in the init_args of the trainer and define NEPTUNE_API_TOKEN and NEPTUNE_PROJECT as environment variables.

# _trainer.yaml
trainer:
  class_path: pyad.models.trainer.ModuleTrainer
  init_args:
    ...params
    logger:
      class_path: pyad.loggers.NeptuneLogger

About

Implementations of various deep and shallow anomaly detection algorithms for tabular data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published