[BLOG]

## Summary

- MLflow 3.8.x release extends and expands its built-in out-of-the-box evaluation scorers, with an additional 47 third party integrations and their associated metrics, to 66 in total.
- The integration introduces a shared interface to call into judges from third party frameworks, beginning with RAGAS and DeepEval.
- However simple or complex your agentic workflow is, you can select the appropriate MLflow scorer for your specific evaluation requirements.

In particular, this release integrates a comprehensive set of DeepEval and RAGAS third-party LLM evaluators, along with additional predefined scorers, making MLflow one of its kind in offering a vast array of extensive evaluation metrics, with a total of 66 scorers.


## Acknowledgements

- [ x] `ack/guide` I have read through the [contributing guide](https://github.com/mlflow/mlflow-website/blob/main/CONTRIBUTING.md)

- [ x] `ack/readme` I have configured my local development environment so that I can build a local instance of the MLflow website by following the [development guide](https://github.com/mlflow/mlflow-website/blob/main/DEVELOPMENT_GUIDE.md)

- [x ] `ack/legal` I have verified that there are no legal considerations associated with the nature of the blog post, its content, or references to organizations, ideas, or individuals contained within my post. If I mention a particular organization, idea, or person, I will provide evidence of consent to post by any organization or individual that is mentioned prior to filing my PR. 

## Proposed Title

Comprehensive Agent Evaluation with DeepEval and Ragas Scorers in MLflow
subtitle: With third-party integrations, MLflow’s total evaluation metrics suite extends to over sixty scorers


## Abstract

The MLflow 3.8 release integrates a comprehensive set of DeepEval and RAGAS third-party LLM evaluators, along with additional predefined scorers, making MLflow one of its kind in offering a vast array of extensive evaluation metrics, with a total of 66 scorers.

We discuss and showcase how to employ these newly integrated third-party scorers using MLflow unified interface with an elaborate example. It also explores when and why you want to use these newly DeepEval and RAGAS scorers and their respective metrics -- all with the idea of making this extended suite of comprehensive scorers easier to assess and measure the quality of Agent workflows. 


## Blog Type

- [ ] `blog/how-to`: A how-to guide to using core MLflow functionality, focused on a common use case user journey
- [ x] `blog/deep-dive`: An in-depth guide that covers a specific feature in MLflow
- [ ] `blog/use-case`: A comprehensive overview of a real-world project that leverages MLflow
- [ ] `blog/best-practices`: A comprehensive tutorial that covers usage patterns of MLflow, focusing on an MLOps journey
- [ ] `blog/tips`: A short blog covering tips and tricks for using MLflow APIs or the MLflow UI components
- [ x] `blog/features`: A feature-focused announcement that introduces a significant new feature that is recently or not-yet released
- [ ] `blog/meetup`: A report on an MLflow community event or other Linux Foundation MLflow Ambassador Program event
- [ ] `blog/news`: Summaries of significant mentions of MLflow or major initiatives for the MLflow project

## Topics Covered in Blog

- [ ] `topic/genai`: Highlights MLflow's use in training, tuning, or deploying GenAI applications
- [ ] `topic/tracking`: Covering the use of Model Tracking APIs and integrated Model Flavors
- [ ] `topic/deployment`: Featuring topics related to the deployment of MLflow models and the MLflow Model Registry
- [ ] `topic/training`: Concerned with the development loop of training and tuning models using MLflow for tracking
- [ ] `topic/mlflow-service`: Topics related to the deployment of the MLflow Tracking Service or the MLflow Deployments Server
- [ ] `topic/core`: Topics covering core MLflow APIs and related features
- [ ] `topic/advanced`: Featuring guides on Custom Model Development or usage of the plugin architecture of MLflow
- [ ] `topic/ui`: Covering features of the MLflow UI
- [ ] `topic/other`: using thirdy-party DeeEval and RAGAS scorers to evaluated agentic workflows, such as multitur-conversational bots

Thank you for your proposal! An MLflow Maintainer will reach out to you with next steps!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BLOG] #428

Summary

Acknowledgements

Proposed Title

Abstract

Blog Type

Topics Covered in Blog

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BLOG] #428

Description

Summary

Acknowledgements

Proposed Title

Abstract

Blog Type

Topics Covered in Blog

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions