-
Notifications
You must be signed in to change notification settings - Fork 40
[BLOG] #428
Copy link
Copy link
Open
Labels
ack/guideI have read through and am familiar with the contributing guideI have read through and am familiar with the contributing guideack/legalI have read and understand the legal considerations for blog postingI have read and understand the legal considerations for blog postingack/readmeI have configured my local development environment for building the website locallyI have configured my local development environment for building the website locallyblog/deep-diveI want to write an in-depth guide blogI want to write an in-depth guide blogblog/featuresI want to write about an upcoming feature of MLflowI want to write about an upcoming feature of MLflow
Metadata
Metadata
Assignees
Labels
ack/guideI have read through and am familiar with the contributing guideI have read through and am familiar with the contributing guideack/legalI have read and understand the legal considerations for blog postingI have read and understand the legal considerations for blog postingack/readmeI have configured my local development environment for building the website locallyI have configured my local development environment for building the website locallyblog/deep-diveI want to write an in-depth guide blogI want to write an in-depth guide blogblog/featuresI want to write about an upcoming feature of MLflowI want to write about an upcoming feature of MLflow
Summary
In particular, this release integrates a comprehensive set of DeepEval and RAGAS third-party LLM evaluators, along with additional predefined scorers, making MLflow one of its kind in offering a vast array of extensive evaluation metrics, with a total of 66 scorers.
Acknowledgements
[ x]
ack/guideI have read through the contributing guide[ x]
ack/readmeI have configured my local development environment so that I can build a local instance of the MLflow website by following the development guide[x ]
ack/legalI have verified that there are no legal considerations associated with the nature of the blog post, its content, or references to organizations, ideas, or individuals contained within my post. If I mention a particular organization, idea, or person, I will provide evidence of consent to post by any organization or individual that is mentioned prior to filing my PR.Proposed Title
Comprehensive Agent Evaluation with DeepEval and Ragas Scorers in MLflow
subtitle: With third-party integrations, MLflow’s total evaluation metrics suite extends to over sixty scorers
Abstract
The MLflow 3.8 release integrates a comprehensive set of DeepEval and RAGAS third-party LLM evaluators, along with additional predefined scorers, making MLflow one of its kind in offering a vast array of extensive evaluation metrics, with a total of 66 scorers.
We discuss and showcase how to employ these newly integrated third-party scorers using MLflow unified interface with an elaborate example. It also explores when and why you want to use these newly DeepEval and RAGAS scorers and their respective metrics -- all with the idea of making this extended suite of comprehensive scorers easier to assess and measure the quality of Agent workflows.
Blog Type
blog/how-to: A how-to guide to using core MLflow functionality, focused on a common use case user journeyblog/deep-dive: An in-depth guide that covers a specific feature in MLflowblog/use-case: A comprehensive overview of a real-world project that leverages MLflowblog/best-practices: A comprehensive tutorial that covers usage patterns of MLflow, focusing on an MLOps journeyblog/tips: A short blog covering tips and tricks for using MLflow APIs or the MLflow UI componentsblog/features: A feature-focused announcement that introduces a significant new feature that is recently or not-yet releasedblog/meetup: A report on an MLflow community event or other Linux Foundation MLflow Ambassador Program eventblog/news: Summaries of significant mentions of MLflow or major initiatives for the MLflow projectTopics Covered in Blog
topic/genai: Highlights MLflow's use in training, tuning, or deploying GenAI applicationstopic/tracking: Covering the use of Model Tracking APIs and integrated Model Flavorstopic/deployment: Featuring topics related to the deployment of MLflow models and the MLflow Model Registrytopic/training: Concerned with the development loop of training and tuning models using MLflow for trackingtopic/mlflow-service: Topics related to the deployment of the MLflow Tracking Service or the MLflow Deployments Servertopic/core: Topics covering core MLflow APIs and related featurestopic/advanced: Featuring guides on Custom Model Development or usage of the plugin architecture of MLflowtopic/ui: Covering features of the MLflow UItopic/other: using thirdy-party DeeEval and RAGAS scorers to evaluated agentic workflows, such as multitur-conversational botsThank you for your proposal! An MLflow Maintainer will reach out to you with next steps!