-
Notifications
You must be signed in to change notification settings - Fork 70
feat: add new prompt best practices section #894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
PeriniM
wants to merge
14
commits into
main
Choose a base branch
from
marco/best-practices-prompt
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+137
−1
Open
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
12558e1
feat: add new prompt best practices section
PeriniM 2c95f9c
Update docs/prompt_engineering/tutorials/best_practices.mdx
PeriniM 3f04124
Update docs/prompt_engineering/tutorials/best_practices.mdx
PeriniM 95b8d1f
Update docs/prompt_engineering/tutorials/best_practices.mdx
PeriniM 5fec635
Update docs/prompt_engineering/tutorials/best_practices.mdx
PeriniM 9be8527
Update docs/prompt_engineering/tutorials/best_practices.mdx
PeriniM fe2ecc9
Update docs/prompt_engineering/tutorials/best_practices.mdx
PeriniM 9b7c0d9
Update docs/prompt_engineering/tutorials/best_practices.mdx
PeriniM 2412e87
Update docs/prompt_engineering/tutorials/best_practices.mdx
PeriniM ca62b96
docs(prompts): integrated review
PeriniM 8f4bb33
docs: linting + format
PeriniM 6edbdad
docs: removed how to
PeriniM c8da2a3
docs: fix broken links
PeriniM 4398075
docs: added canary deployment part
PeriniM File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
--- | ||
sidebar_label: Best Practices | ||
sidebar_position: 5 | ||
--- | ||
|
||
# Prompt management best practices | ||
|
||
Here are our recommended best practices for managing your prompts. This guide will help you establish a robust workflow for developing, testing, and deploying prompts using LangSmith. | ||
|
||
## The prompt development lifecycle: iterating in the playground | ||
PeriniM marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We recommend treating prompt development as an iterative, experimental process. The LangSmith Playground is the ideal environment for this initial "development" phase. | ||
|
||
Using the [Playground](/prompt_engineering/how_to_guides#playground), you and your team can: | ||
|
||
- **Rapidly iterate on prompts:** Modify prompt templates and see how the changes affect the output immediately. | ||
- **Compare different LLMs:** Test the same prompt against various models (e.g., GPT-4o vs. Claude 3 Opus vs. Llama 3) side-by-side to find the best one for the job. This is crucial, as a prompt's effectiveness can vary significantly between models. | ||
- **Test with diverse inputs:** Run the prompt and model configuration against a range of different inputs to check for edge cases and ensure reliability. | ||
- **Optimize the prompt**: Use the [Prompt Canvas](/prompt_engineering/how_to_guides/prompt_canvas) feature to have an LLM improve your prompt. | ||
➡️ **See the blog post:** [LangChain Changelog](https://changelog.langchain.com/announcements/prompt-canvas-for-streamlined-prompt-engineering) | ||
- **Develop and test tool calling:** Configure tools and functions that the LLM can call, and test the full interaction within the Playground. | ||
- **Refine your app:** Run experiments directly against your dataset in the Playground to see changes in real time as you iterate on prompts. Share experiments with teammates to get feedback and collaboratively optimize performance. | ||
|
||
Once you are satisfied with a prompt and its configuration in the Playground, you can save it as a new commit to your prompt's history. While the Playground UI is great for experimentation, you can also create and update prompts programmatically for more automated workflows using the LangSmith SDK. | ||
|
||
**➡️ See the docs:** [Manage prompts programmatically](/prompt_engineering/how_to_guides/manage_prompts_programatically) | ||
|
||
**➡️ See the SDK reference:** [client.create_prompt](https://docs.smith.langchain.com/reference/python/client/langsmith.client.Client#langsmith.client.Client.create_prompt) | ||
|
||
## Manage prompts through different environments | ||
|
||
Prompts are not static text; they are a fundamental component of your LLM application's logic, just like source code. A minor change can significantly alter an LLM's response or tool selection, making structured lifecycle management essential. The **LangSmith Prompt Hub** provides a central workspace to manage this complexity. This guide details the complete workflow for using the Hub to test prompts in development, validate them in staging, and deploy them confidently to production. | ||
|
||
## Update application prompts based on prompt tags | ||
|
||
LangSmith provides a collaborative interface to iterate on prompts and share them with your team. After some initial testing in the Playground, you'll want to see how the prompt interacts within the context of your application. In LangSmith’s Prompt Hub, you can apply [_prompt commit tags_](/prompt_engineering/how_to_guides/prompt_tags) for new versions of the prompt without requiring code changes each time. | ||
|
||
You can assign a meaningful name (e.g., `dev`, `staging`, `prod`) to a specific version (commit) of your prompt. This allows you to create a dynamic reference to the prompt version you want to use in a particular environment. | ||
|
||
For instance, you can have a `dev` tag pointing to the latest, most experimental version of your prompt, a `staging` tag for a more stable version undergoing final testing, and a `prod` tag for the version you trust to be in your live application. As you promote a prompt from development to production, you move the tag from one commit to another within the LangSmith UI. | ||
|
||
To implement this workflow, you reference the tag in your application code instead of a static commit hash. This enables you to update the prompt in your application without a new code deployment. | ||
|
||
### How to pull a prompt via commit tag in your environments | ||
|
||
LangSmith's Prompt Tags feature is designed for exactly this workflow. Instead of hardcoding a specific prompt version in your application, you reference the tag. | ||
|
||
For example, your development environment could pull the prompt tagged `dev`, while your production application pulls the one tagged `prod`. | ||
|
||
```python | ||
# In your development environment, fetch the latest experimental prompt | ||
prompt_dev = client.pull_prompt("your-prompt-name:dev") | ||
|
||
# In your staging environment, fetch the release candidate | ||
prompt_staging = client.pull_prompt("your-prompt-name:staging") | ||
|
||
# In production, this code always fetches the stable prompt version currently tagged as "prod" | ||
prompt_prod = client.pull_prompt("your-prompt-name:prod") | ||
``` | ||
|
||
**➡️ Learn more in the official documentation:** [Prompt Tags](https://docs.smith.langchain.com/prompt_engineering/how_to_guides/prompt_tags) | ||
|
||
 | ||
|
||
### Best practice: evaluate prompt changes before promotion to production | ||
|
||
A typical pipeline consists of several automated stages that run in sequence. If any stage fails, the pipeline stops and notifies the team. | ||
|
||
 | ||
|
||
- **Stage 1: Trigger** | ||
The pipeline starts automatically when a new prompt version is created. | ||
- **How:** This can be a `git push` to your main branch or a **webhook** triggered from LangSmith on every new prompt commit. | ||
- **Stage 2: Linting & unit tests** | ||
This stage performs quick, low-cost checks. | ||
- **Linting:** A simple script checks for basic syntax. For example, does the prompt template contain all the required input variables (e.g., `{question}`, `{context}`)? | ||
- **Unit Tests:** These verify the _structure_ of the output, not the quality. You can use a framework like `pytest` to make a few calls to the new prompt and assert things like: | ||
- "Does the output always return valid JSON?" | ||
- "Does it contain the expected keys?" | ||
- "Is the list length correct?" | ||
- **Stage 3: Quality evaluation** | ||
The new prompt version is run against your evaluation dataset to ensure it meets quality standards. | ||
- **How:** This can be done in a few ways: | ||
- **Programmatic SDK Evals:** For checks against known ground truths, a script can use the LangSmith SDK's `evaluate` [function](https://docs.smith.langchain.com/evaluation#7-run-and-view-results). This executes the new prompt against every example in your dataset, and the results are automatically scored by your chosen programmatic evaluators (e.g., for JSON validity, string matching). | ||
- **Advanced Qualitative Evals with `openevals`:** For more nuanced quality checks (like helpfulness, style, or adherence to complex instructions), you can leverage the `openevals` library. This library integrates directly with `pytest` and allows you to define sophisticated, "LLM-as-a-judge" evaluations. You can create tests that use another LLM to score the quality of your prompt's output. The [LangSmith integration](https://github.com/langchain-ai/openevals?tab=readme-ov-file#langsmith-integration) automatically traces and visualizes all these evaluation runs in LangSmith, which provides a detailed view of the results. | ||
- **The Check:** The pipeline then compares the new prompt's aggregate evaluation scores (e.g., average correctness, helpfulness score, latency, cost) against the current production prompt's scores. | ||
- **Stage 4: Continuous deployment (promotion)** | ||
Based on the evaluation results, the prompt is automatically promoted. | ||
- **Pass/Fail Logic:** The pipeline checks if the new prompt is "better" than the current one based on criteria such as higher correctness score, no drop in helpfulness, within cost budget. | ||
- **Promotion to `staging`:** If it passes, a script uses the LangSmith SDK to move the `staging` tag to this new commit hash. Your staging application, which pulls the `your-prompt:staging` tag, will automatically start using the new prompt. | ||
- **Promotion to `prod`:** This is often a **manual step**, a team member can move the `prod` tag in the LangSmith UI after reviewing performance in staging. However, if your evaluation pipeline is trustworthy and consistently reflects real-world performance, this step can be automated (e.g., [canary deployments](https://semaphore.io/blog/what-is-canary-deployment) with rollback monitoring), and the prod tag can be advanced programmatically using the LangSmith SDK. | ||
|
||
## Sync prompts in production | ||
|
||
For better version control, collaboration, and integration with your existing CI/CD pipelines, synchronizing your LangSmith prompts with an external source code repository will give you a full commit history alongside your application code. | ||
|
||
### Best practice: use webhooks for synchronization | ||
PeriniM marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The most effective way to automate this is by using webhooks. You can configure LangSmith to send a notification to a service every time a new version of a prompt is saved. This creates a seamless bridge between the user-friendly prompt editing environment in LangSmith and your version control system. | ||
|
||
### Webhook synchronization flow | ||
|
||
This diagram shows the sequence of events when a prompt is updated in LangSmith and automatically synced to a GitHub repository. | ||
|
||
 | ||
|
||
### How to implement this with LangSmith | ||
|
||
LangSmith allows you to configure a webhook for your workspace that will fire on every prompt commit. You can point this webhook to your own service (like an AWS Lambda function or a small server) to handle the synchronization logic. | ||
|
||
**➡️ Learn more in the official documentation:** [Trigger a webhook on prompt commit](https://docs.smith.langchain.com/prompt_engineering/how_to_guides/trigger_webhook), [How to Sync Prompts with GitHub](https://docs.smith.langchain.com/prompt_engineering/tutorials/prompt_commit) | ||
|
||
## Use a prompt in production without repeated API calls | ||
|
||
### Best practice: cache prompts in your application | ||
|
||
To avoid putting an API call to LangSmith in the "hot path" of your application, you should implement a caching strategy. The prompt doesn't change on every request, so you can fetch it once and reuse it. Caching not only improves performance but also increases resilience, as your application can continue to function using the last-known prompt even if it temporarily can't reach the LangSmith API. | ||
|
||
### Caching strategies | ||
|
||
There are two primary strategies for caching prompts, each with its own trade-offs. | ||
|
||
- **1. Local in-memory caching** | ||
This is the simplest caching method. The prompt is fetched from LangSmith and stored directly in the memory of your application instance. | ||
- **How it works**: On application startup, or on the first request for a prompt, fetch it from LangSmith and store it in a global variable or a simple cache object. Set a Time-To-Live (TTL) on the cached item (e.g., 5-10 minutes). Subsequent requests use the in-memory version until the TTL expires, at which point it's fetched again. | ||
- **Pros**: Extremely fast access with sub-millisecond latency; no additional infrastructure required. | ||
- **Cons**: The cache is lost if the application restarts. Each instance of your application (if you have more than one server) will have its own separate cache, which could lead to brief inconsistencies when a prompt is updated. | ||
- **Best for**: Single-instance applications, development environments, or applications where ultimate consistency across all nodes is not critical. | ||
- **2. Distributed caching** | ||
This approach uses an external, centralized caching service that is shared by all instances of your application. | ||
- **How it works**: Your application instances connect to a shared caching service like **Redis** or **Memcached**. When a prompt is needed, the application first checks the distributed cache. If it's not there (a "cache miss"), it fetches the prompt from LangSmith, stores it in the cache, and then uses it. | ||
PeriniM marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- **Pros**: The cache is persistent and is not lost on application restarts. All application instances share the same cache, ensuring consistency. Highly scalable for large, distributed systems. | ||
- **How it works**: Your application instances connect to a shared caching service like [Redis](https://redis.io/) or [Memcached](https://memcached.org/). When a prompt is needed, the application first checks the distributed cache. If it's not there (a "cache miss"), it fetches the prompt from LangSmith, stores it in the cache, and then uses it. | ||
- **Best for**: Scalable, multi-instance production applications where consistency and resilience are top priorities. Using a service like Redis is the industry-standard approach for robust application caching. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+51.8 KB
docs/prompt_engineering/tutorials/static/prompt-best-practice-cicd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+69.9 KB
docs/prompt_engineering/tutorials/static/prompt-best-practice-git.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+83.6 KB
docs/prompt_engineering/tutorials/static/prompt-best-practice-sequence.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.