I'm Plamen, a passionate software engineer and open-source contributor since 2018.
My journey in open source began at DAI-Lab, where I tackled a variety of complex challenges that deepened my skills in software design, machine learning, and distributed systems. Since then, I've been actively contributing to open-source projects that aim to push the boundaries of data science, automation, machine learning and synthetic data generation.
I'm currently part of the awesome team at DataCebo, the creators of SDV β the largest ecosystem for synthetic data generation and evaluation. My work spans feature development, code refactoring, and maintaining several open-source tools, including:
-
SDV: The Synthetic Data Vault, a powerful synthetic data generation tool that maintains the same format and statistical properties as the real data.
-
RDT: Reversible Data Transforms, a Python library for transforming raw data into fully numerical data.
-
CTGAN: A collection of deep learning-based synthetic data generators for single table data.
-
Copulas: A Python library for modeling multivariate distributions and sampling from them using copula functions.
-
DeepEcho: A synthetic data generation Python library for mixed-type, multivariate time series.
-
SDMetrics: A library that evaluates synthetic data by comparing it to the real data you're trying to mimic.
-
SDGym: Synthetic Data Gym, a framework for benchmarking the performance of synthetic data generators based on SDV and SDMetrics.
Here are some of the other open-source projects Iβve contributed to over the years:
-
SteganoGAN: A tool for creating steganographic images using adversarial training.
-
MLPrimitives: Pipelines and primitives for machine learning and data science.
-
MLBlocks: A simple framework for composing end-to-end tunable machine learning pipelines.
-
BTB: Bayesian Tuning and Bandits, a tool for hyperparameter tuning and model selection.
-
AutoBazaar: An AutoML system combining BTB, MLPrimitives, and MLBlocks.
-
mit-d3m-ta2: MIT-Featuretools TA2 submission for the D3M program.
-
ATM: Auto Tune Models, an AutoML system designed with ease of use in mind.
-
Orion: A machine learning library for unsupervised time series anomaly detection.
-
SigPro: An end-to-end solution for efficiently applying multiple signal processing techniques to raw time series data.
-
Draco: A collection of end-to-end solutions for machine learning problems commonly found in monitoring wind energy production systems.
If youβre working on something interesting β especially around synthetic data, AutoML, or time series β Iβd love to hear from you!
- πΌ LinkedIn
- π¦ X (Twitter)