It is challenging to learn machine learning. For me, great examples for common workflows are crtical. So I built out over 20 well-documented demonstration workflows that apply machine learning to accomplish common data science tasks to support my students in my Data Analytics and Geostatistics, Spatial Data Analytics and Machine Learning courses and anyone else learning data analytics and machine learning.
Michael Pyrcz, Professor, The University of Texas at Austin, Data Analytics, Geostatistics and Machine Learning
Pyrcz, M.J., 2024, MachineLearningDemos: Python Machine Learning Demonstration Workflows Repository (0.0.1) (0.0.1). Zenodo. TBD
Here's some highlights from recent updates:
I spent quite a bit of time checking, updating and improving all of the workflows for this first release.
- improved documentation with concepts and theory from my courses to motivate the workflows
- improved code comments
- improved data and model visualization
I'm quite happy with the current state. I feel that this set of well-documented workflow for machine learning in Python now lives up to its goal - to launch anyone into building machine learning workflows! I'm stoked to help out, Michael
A minimum environment includes:
- Python 3.7.10 - due to the depdendency of GeostatsPy on the Numba package for code acceleration
- GeostatsPy - I am continuously testing these workflow with the most current version, GeostatsPy(Pyrcz et al., 2021)
- MatPlotLib - plotting
- NumPy - gridded data and array math
- Pandas - tabulated data
- SciPy - statistics module
- scikit-learn - most of the machine learnng models
The required datasets are available in the GeoDataSets repository and linked in the workflows.
More than 20 well-documented demonstration workflow for common machine learning workflows in Python.
- utilizing synthetic data from my GeoDataSets repository
- small and often 2D examples for fast run times and ease of interpretation
- often used and cited in my courses for repeatable educational content
Common geostatistical workflows that are included:
- multivariate analysis
- feature selection
- feature transformations
- cluster analysis
- principal component analysis
- linear regression
- ridge regression
- LASSO regression
- Bayesian linear regression
- naive Bayes classification
- polynomial regression
- k-nearest neighbours
- decision trees
- bagging trees and random forest
- gradiate boosting
- support vector machines
Firstly, if you haven't installed GeostatsPy, here's the GitHub repository GeostatsPy GitHub. GeostatsPy is available on the Python Package Index (PyPI) GeostatsPy PyPI.
To install GeostatsPy, use pip
pip install geostatspyThe functions rely on the following packages:
- numpy - for ndarrays
- pandas - for DataFrames
- numpy.linalg - for linear algebra
- numba - for numerical speed up
- scipy - for fast nearest neighbor search
- matplotlib.pyplot - for plotting
- tqdm - for progress bar
- statsmodels - for weighted (debiased) statistics
These packages should be available with any modern Python distribution (e.g. https://www.anaconda.com/download/).
If you get a package import error, you may have to first install some of these packages. This can usually be accomplished by opening up a command window on Windows and then typing 'python -m pip install [package-name]'. More assistance is available with the respective package docs.
Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions
With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development.
For more about Michael check out these links:
I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.
-
Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you!
-
Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!
-
I can be reached at [email protected].
I'm always happy to discuss,
Michael
Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin
More functionality will be added soon.
