Project repository for DTU 02476 - MLOps courses in January 2023.
Chuansheng Liu, Xindi Wu, Chongchong Li, Mouadh Sadani
The goal of the project is to use convolutional neural network-based architecture to classify images in computer vision.
Because the task is to classify images, we are going to use Pytorch Image Models framework to achieve our project goal.
From the framework, we will import and modify the needed model. Besides, the framework provides many tools for data processing, tuning, training. We will also use whichever is useful to our project.
We are going to use ImageNet 1000 (mini). It includes 1000 classes. It is a more compressed version of the ImageNet dataset and contains 38,7k images. The ImageNet dataset is used widely for classification challenges and is useful to develop Computer Vision and Deep Learning algorithms.
The model we expect to use is ResNeSt. It is a ResNet variant which stacking several Split-Attention blocks (conposed by featuremap group and split attention operations). It is easy to work with, computational efficient, and universally improves the learned feature representations to boost performance across image classification.
Configure Environment:
pip install -r requirements
pip install -r requirements_tests
or
make requirements
Download data and models:
dvc pull
or download data from: https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000
Train model:
python src/models/train_model.py
or
make train
Inference:
python src/models/predict_model.py
or
make predict
Run unittest with coverage
coverage run --source=./src -m pytest tests/
or
make tests
create api (need Signoz)
make api
├── LICENSE
│
├── Makefile <- Makefile with commands like `make train`.
│
├── README.md <- The top-level README for developers using this project.
│
├── app <- A fastapi to do inference.
│
├── conf
│ ├── data <- Configurations for dataset.
│ └── experiment <- Configurations for training.
│
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── docs <- A default Sphinx project; see sphinx-doc.org for details
│
├── model_store <- Applications for local and cloud deployment.
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
│
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │ └── make_dataset.py
│ │
│ ├── features <- Scripts to turn raw data into features for modeling
│ │ └── build_features.py
│ │
│ ├── models <- Scripts to train models and then use trained models to make
│ │ │ predictions
│ │ ├── predict_model.py
│ │ └── train_model.py
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations
│ └── visualize.py
│
├── tests <- Unit tests code
│
└── tox.ini <- tox file with settings for running tox; see tox.readthedocs.io
Project based on the cookiecutter data science project template. #cookiecutterdatascience
Please note that all the lists are exhaustive meaning that I do not expect you to have completed every point on the checklist for the exam.
- Create a git repository
- Make sure that all team members have write access to the github repository
- Create a dedicated environment for you project to keep track of your packages
- Create the initial file structure using cookiecutter
- Fill out the make_dataset.py file such that it downloads whatever data you need and
- Add a model file and a training script and get that running
- Remember to fill out the requirements.txt file with whatever dependencies that you are using
- Remember to comply with good coding practices (pep8) while doing the project
- Do a bit of code typing and remember to document essential parts of your code
- Setup version control for your data or part of your data
- Construct one or multiple docker files for your code
- Build the docker files locally and make sure they work as intended
- Write one or multiple configurations files for your experiments
- Used Hydra to load the configurations and manage your hyperparameters
- When you have something that works somewhat, remember at some point to to some profiling and see if you can optimize your code
- Use Weights & Biases to log training progress and other important metrics/artifacts in your code. Additionally, consider running a hyperparameter optimization sweep.
- Use Pytorch-lightning (if applicable) to reduce the amount of boilerplate in your code
- Write unit tests related to the data part of your code
- Write unit tests related to model construction and or model training
- Calculate the coverage.
- Get some continuous integration running on the github repository
- Create a data storage in GCP Bucket for you data and preferable link this with your data version control setup
- Create a trigger workflow for automatically building your docker images
- Get your model training in GCP using either the Engine or Vertex AI
- Create a FastAPI application that can do inference using your model
- If applicable, consider deploying the model locally using torchserve
- Deploy your model in GCP using either Functions or Run as the backend
- Check how robust your model is towards data drifting
- Setup monitoring for the system telemetry of your deployed model
- Setup monitoring for the performance of your deployed model
- If applicable, play around with distributed data loading
- If applicable, play around with distributed model training
- Play around with quantization, compilation and pruning for you trained models to increase inference speed
- Revisit your initial project description. Did the project turn out as you wanted?
- Make sure all group members have a understanding about all parts of the project
- Uploaded all your code to github