Skip to content

Conversation

@williamcaban
Copy link

@williamcaban williamcaban commented Nov 16, 2025

MLflow Prompt Registry Provider

Summary

This PR adds a new remote MLflow provider for the Prompts API, enabling centralized prompt management and versioning using MLflow's Prompt Registry (MLflow 3.4+).

What's New

Remote Provider: remote::mlflow

A production-ready provider that integrates Llama Stack's Prompts API with MLflow's centralized prompt registry, supporting:

  • Version Control: Immutable prompt versioning with full history
  • Default Version Management: Easy version switching via aliases
  • Auto Variable Extraction: Automatic detection of {{ variable }} placeholders
  • Centralized Storage: Team collaboration via shared MLflow server
  • Metadata Preservation: Llama Stack metadata stored as MLflow tags

Quick Start

1. Configure Llama Stack

Basic configuration with SQLite (default):

prompts:
  - provider_id: reference-prompts
    provider_type: inline::reference
    config:
      run_config:
        storage:
          stores:
            prompts:
              type: sqlite
              db_path: ./prompts.db

With PostgreSQL:

prompts:
  - provider_id: postgres-prompts
    provider_type: inline::reference
    config:
      run_config:
        storage:
          stores:
            prompts:
              type: postgres
              url: postgresql://user:pass@localhost/llama_stack

2. Use the Prompts API

from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:5000")

# Create a prompt
prompt = client.prompts.create(
    prompt="Summarize the following text in {{ num_sentences }} sentences:\n\n{{ text }}",
    variables=["num_sentences", "text"]
)
print(f"Created prompt: {prompt.prompt_id} (v{prompt.version})")

# Retrieve prompt
retrieved = client.prompts.get(prompt_id=prompt.prompt_id)
print(f"Retrieved: {retrieved.prompt}")

# Update prompt (creates version 2)
updated = client.prompts.update(
    prompt_id=prompt.prompt_id,
    prompt="Summarize in exactly {{ num_sentences }} sentences:\n\n{{ text }}",
    version=1,
    set_as_default=True
)
print(f"Updated to version: {updated.version}")

# List all prompts
prompts = client.prompts.list()
print(f"Found {len(prompts.data)} prompts")

# Delete prompt
client.prompts.delete(prompt_id=prompt.prompt_id)

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamcaban this is the right direction

few things jump out for me -

  • the existing impl needs to be moved to be an inline::reference impl (let the llm know it's in src/llama_stack/core/prompts/prompts.py)
  • mlflow credential handling needs to be added, should follow the pattern in inference (provider-data backstopped by config)
  • tests don't need to be standalone

@williamcaban williamcaban force-pushed the feat/mlflow-prompt-registry branch from dd0758c to cb69e41 Compare November 23, 2025 17:17
@williamcaban williamcaban requested a review from cdoern as a code owner November 23, 2025 17:17
@williamcaban williamcaban marked this pull request as draft November 23, 2025 17:27
@williamcaban williamcaban force-pushed the feat/mlflow-prompt-registry branch from cb69e41 to 1e68fa8 Compare November 24, 2025 02:29
@williamcaban williamcaban marked this pull request as ready for review November 24, 2025 02:33
@williamcaban williamcaban changed the title WIP feat: Add MLflow Prompt Registry provider feat: Add MLflow Prompt Registry provider Nov 24, 2025
@williamcaban
Copy link
Author

PR is now ready for review and includes the following updates:

  • Moved the previous prompts.py as an inline provider (see inline_reference.mdx for details)
  • Defined a remote provider with MLflow supporting authentication (see remote_mlflow.mdx for details)
  • Removed any dependencies on prompt caching

"""Create ID mapper instance."""
return PromptIDMapper(use_metadata=True)

def test_to_mlflow_name_valid_id(self, mapper):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these tests could be shortened. i understand Claude often generates them easily but sometimes they're a little excessive.

Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small nits but this is looking very exciting!

@@ -0,0 +1,92 @@
---
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for adding the prompts docs!!

we also have this in the UI, maybe we should mention it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to making it available in the UI? If so, could that be a follow-up PR? I would prefer to avoid adding more to this one.

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to maintain a mapping from prompt id to mlflow prompt name?

@williamcaban
Copy link
Author

williamcaban commented Nov 24, 2025

@mattf

why do we need to maintain a mapping from prompt id to mlflow prompt name?

The idea is to distinguish Llama Stack-managed prompts from other prompts that might exist in the same MLflow registry.

@williamcaban williamcaban force-pushed the feat/mlflow-prompt-registry branch from 041cd2e to 8f150ec Compare November 25, 2025 14:25
@mattf
Copy link
Collaborator

mattf commented Nov 25, 2025

why do we need to maintain a mapping from prompt id to mlflow prompt name?

The idea is to distinguish Llama Stack-managed prompts from other prompts that might exist in the same MLflow registry.

when would a deployer want to set use_metadata_tags=False? can we always use metadata and skip the id/name translations?

@williamcaban
Copy link
Author

why do we need to maintain a mapping from prompt id to mlflow prompt name?

The idea is to distinguish Llama Stack-managed prompts from other prompts that might exist in the same MLflow registry.

when would a deployer want to set use_metadata_tags=False? can we always use metadata and skip the id/name translations?

We can remove that option because, in practice, setting it to false has significant downsides. Let me remove the option.

@mattf
Copy link
Collaborator

mattf commented Nov 26, 2025

why do we need to maintain a mapping from prompt id to mlflow prompt name?

The idea is to distinguish Llama Stack-managed prompts from other prompts that might exist in the same MLflow registry.

when would a deployer want to set use_metadata_tags=False? can we always use metadata and skip the id/name translations?

We can remove that option because, in practice, setting it to false has significant downsides. Let me remove the option.

thanks. do we still need to have the id mapping?

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamcaban why create a func for token extraction and a client but then not use them?

by requesting pr review you're asking others to read all this code (over 3k lines). please review it all before requesting.

Comment on lines +120 to +102
if self.config.auth_credential is not None:
import os

# MLflow reads MLFLOW_TRACKING_TOKEN from environment
os.environ["MLFLOW_TRACKING_TOKEN"] = self.config.auth_credential.get_secret_value()
logger.debug("Set MLFLOW_TRACKING_TOKEN from config auth_credential")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because you use env.MLFLOW_TRACKING_TOKEN in the sample_run_config, the auth_credential will already be set from the MLFLOW_TRACKING_TOKEN.

is setting MLFLOW_TRACKING_TOKEN the only way to communicate the token to the client?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be either—an environment variable at runtime or a setting in the config file.

logger.debug("Set MLFLOW_TRACKING_TOKEN from config auth_credential")

# Initialize client
self.mlflow_client = MlflowClient()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used in the edge case of managing prompts created outside Llama Stack. I'll update the code for clarity.

@williamcaban williamcaban force-pushed the feat/mlflow-prompt-registry branch from 79a3454 to 7705651 Compare November 26, 2025 14:21
Add a new remote provider that integrates MLflow's Prompt Registry with
Llama Stack's prompts API, enabling centralized prompt management and
versioning using MLflow as the backend.

Features:
- Full implementation of Llama Stack Prompts protocol
- Support for prompt versioning and default version management
- Automatic variable extraction from Jinja2-style templates
- MLflow tag-based metadata for efficient prompt filtering
- Flexible authentication (config, environment variables, per-request)
- Bidirectional ID mapping (pmpt_<hex> ↔ llama_prompt_<hex>)
- Comprehensive error handling and validation

Implementation:
- Remote provider: src/llama_stack/providers/remote/prompts/mlflow/
- Inline reference provider: src/llama_stack/providers/inline/prompts/reference/
- MLflow 3.4+ required for Prompt Registry API support
- Deterministic ID mapping ensures consistency across conversions

Testing:
- 15 comprehensive unit tests (config validation, ID mapping)
- 18 end-to-end integration tests (full CRUD workflows)
- GitHub Actions workflow for automated CI testing with MLflow server
- Integration test fixtures with automatic server setup

Documentation:
- Complete provider configuration reference
- Setup and usage examples with code samples
- Authentication options and security best practices

Signed-off-by: William Caban <[email protected]>
Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants