[BLOG] Hyperparameter and Data Centric Model Optimization with MLflow

## Summary

This template is intended to capture a few base requirements that are needed to be met prior to filing a PR that contains a new blog post submission. 

Please fill out this form in its entirety so that an MLflow maintainer can review and work with you in the process of drafting your blog content and in reviewing your blog submission PR. 

PRs that are filed without a linked Blog Post Submission issue and a subsequent agreement on the content and topics covered for the blog post are not guaranteed to be reviewed or merged.

## Acknowledgements

- [x] `ack/guide` I have read through the [contributing guide](https://github.com/mlflow/mlflow-website/blob/main/CONTRIBUTING.md)

- **Not done yet, but definitely once proposal is accepted**
-  [ ] `ack/readme` I have configured my local development environment so that I can build a local instance of the MLflow website by following the [development guide](https://github.com/mlflow/mlflow-website/blob/main/DEVELOPMENT_GUIDE.md)

- [x] `ack/legal` I have verified that there are no legal considerations associated with the nature of the blog post, its content, or references to organizations, ideas, or individuals contained within my post. If I mention a particular organization, idea, or person, I will provide evidence of consent to post by any organization or individual that is mentioned prior to filing my PR. 

## Proposed Title

*Hyperparameter and Data Centric Model Optimization with MLflow*

## Abstract

You are working on your new machine learning project with data and model parameters that you are probably not sure that are going work and make your model thrive into production.

So you start experimenting and change thing here and there, but you are to lazy to structure your project that way to know what model hyperparameters and data splits you used just 5’ ago. 

There it comes MLflow! To help you gain the lost time of dead end experiments and lend a hand in the tedious and time consuming task of best model discovery.

- Note: You can find a similar paradigm in MLflow [[tutorials](https://mlflow.org/docs/latest/ml/traditional-ml/tutorials/hyperparameter-tuning/notebooks/hyperparameter-tuning-with-child-runs/)](https://mlflow.org/docs/latest/ml/traditional-ml/tutorials/hyperparameter-tuning/notebooks/hyperparameter-tuning-with-child-runs/) and this guide helps to build up on the notion of ML grid experiment tracking

## Types of grid experiments

In a typical ML project, there are two main approaches to train your models.

- Model centric approach: Keep the data fixed and work on model parameters
- Data centric approach: Keep the model fixed and work on the data

Discuss a little more on two approaches and elaborate why MLflow fits well on both of them using nested runs.

- Probably have a gif image with nested runs and experiment overview from MLflow similar to the one in tutorials

## Coding examples

Coding examples will be implemented with production grade level code (object-oriented) 

1. **Model centric approach**

In this approach we keep the data the same, and improve the model architecture and hparams.

A paradigm with regression and a DNN, using MLflow, Tensorflow & Optuna.

Iterating on nested runs, the champion model will be finalized from the best of the child runs scoring on the testing set and will be logged in the parent run.

2. **Data centric approach**

Once we have found the best model parameters we can test our model stability in difference data split using Kfold CV. 

Iterating on nested runs, the champion model will be finalized from the best of the child runs scoring on the testing set and will be logged in the parent run

Ingredients:

- mlflow.set_experiments, mlflow.log_input, mlflow.data, mlflow.log_params, mlflow.log_metric, mlflow.set_tag, mlflow.log_model

## Final Thoughts and things to consider:

- Nested run can track model performance for efficiency
- Smart experiment naming to ease the discovery
- Large model tracking

## Resources

MLflow tutorials, Model & Data centric references

## Blog Type

- [ ] `blog/how-to`: A how-to guide to using core MLflow functionality, focused on a common use case user journey
- [x] `blog/deep-dive`: An in-depth guide that covers a specific feature in MLflow
- [ ] `blog/use-case`: A comprehensive overview of a real-world project that leverages MLflow
- [x] `blog/best-practices`: A comprehensive tutorial that covers usage patterns of MLflow, focusing on an MLOps journey
- [ ] `blog/tips`: A short blog covering tips and tricks for using MLflow APIs or the MLflow UI components
- [ ] `blog/features`: A feature-focused announcement that introduces a significant new feature that is recently or not-yet released
- [ ] `blog/meetup`: A report on an MLflow community event or other Linux Foundation MLflow Ambassador Program event
- [ ] `blog/news`: Summaries of significant mentions of MLflow or major initiatives for the MLflow project

## Topics Covered in Blog

- [ ] `topic/genai`: Highlights MLflow's use in training, tuning, or deploying GenAI applications
- [x] `topic/tracking`: Covering the use of Model Tracking APIs and integrated Model Flavors
- [ ] `topic/deployment`: Featuring topics related to the deployment of MLflow models and the MLflow Model Registry
- [x] `topic/training`: Concerned with the development loop of training and tuning models using MLflow for tracking
- [ ] `topic/mlflow-service`: Topics related to the deployment of the MLflow Tracking Service or the MLflow Deployments Server
- [x] `topic/core`: Topics covering core MLflow APIs and related features
- [ ] `topic/advanced`: Featuring guides on Custom Model Development or usage of the plugin architecture of MLflow
- [ ] `topic/ui`: Covering features of the MLflow UI
- [ ] `topic/other`: < please fill in >

Thank you for your proposal! An MLflow Maintainer will reach out to you with next steps!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BLOG] Hyperparameter and Data Centric Model Optimization with MLflow #337

Summary

Acknowledgements

Proposed Title

Abstract

Types of grid experiments

Coding examples

Final Thoughts and things to consider:

Resources

Blog Type

Topics Covered in Blog

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BLOG] Hyperparameter and Data Centric Model Optimization with MLflow #337

Description

Summary

Acknowledgements

Proposed Title

Abstract

Types of grid experiments

Coding examples

Final Thoughts and things to consider:

Resources

Blog Type

Topics Covered in Blog

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions