Skip to content

docs(examples): add NemoGuard colang 2.0 example configs #1289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions examples/configs/nemoguards_v2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# NeMoGuard Safety Rails Example

This example showcases the use of NVIDIA's NeMoGuard NIMs for comprehensive AI safety including content moderation, topic control, and jailbreak detection.

## Configuration Files

- `config.yml` - Defines the models configuration including the main LLM and three NeMoGuard NIMs for safety checks
- `prompts.yml` - Contains prompt templates for content safety and topic control checks
- `rails.co` - Implements input and output rails that integrate content safety, topic safety, and jailbreak detection checks
- `main.co` - The entry point Colang 2 file that imports core functionality and activates the LLM continuation flow

## NeMoGuard NIMs Used

1. **Content Safety** (`nvidia/llama-3.1-nemoguard-8b-content-safety`) - Checks for unsafe content across 23 safety categories
2. **Topic Control** (`nvidia/llama-3.1-nemoguard-8b-topic-control`) - Ensures conversations stay within allowed topics
3. **Jailbreak Detection** - Detects and prevents jailbreak attempts (configured via `nim_server_endpoint`)

## Documentation

For more details about NeMoGuard NIMs and deployment options, see:

- [NeMo Guardrails Documentation](https://docs.nvidia.com/nemo/guardrails/index.html)
- [Llama 3.1 NemoGuard 8B ContentSafety NIM](https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-contentsafety/latest/)
- [Llama 3.1 NemoGuard 8B TopicControl NIM](https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-topiccontrol/latest/)
- [NemoGuard JailbreakDetect NIM](https://docs.nvidia.com/nim/nemoguard-jailbreakdetect/latest/)
- [NeMoGuard Models on NVIDIA API Catalog](https://build.nvidia.com/search?q=nemoguard)
18 changes: 18 additions & 0 deletions examples/configs/nemoguards_v2/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
colang_version: 2.x
models:
- type: main
engine: nim
model: meta/llama-3.3-70b-instruct
- type: content_safety
engine: nim
model: nvidia/llama-3.1-nemoguard-8b-content-safety
- type: topic_control
engine: nim
model: nvidia/llama-3.1-nemoguard-8b-topic-control

rails:
config:
jailbreak_detection:
nim_base_url: "https://ai.api.nvidia.com"
nim_server_endpoint: "/v1/security/nvidia/nemoguard-jailbreak-detect"
api_key_env_var: NVIDIA_API_KEY
5 changes: 5 additions & 0 deletions examples/configs/nemoguards_v2/main.co
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import core
import llm

flow main
activate llm continuation
104 changes: 104 additions & 0 deletions examples/configs/nemoguards_v2/prompts.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
prompts:
- task: topic_safety_check_input $model=topic_control
content: |
You are to act as a customer service agent, providing users with factual information in accordance to the knowledge base. Your role is to ensure that you respond only to relevant queries and adhere to the following guidelines

Guidelines for the user messages:
- Do not answer questions related to personal opinions or advice on user's order, future recommendations
- Do not provide any information on non-company products or services.
- Do not answer enquiries unrelated to the company policies.
- Do not answer questions asking for personal details about the agent or its creators.
- Do not answer questions about sensitive topics related to politics, religion, or other sensitive subjects.
- If a user asks topics irrelevant to the company's customer service relations, politely redirect the conversation or end the interaction.
- Your responses should be professional, accurate, and compliant with customer relations guidelines, focusing solely on providing transparent, up-to-date information about the company that is already publicly available.
- allow user comments that are related to small talk and chit-chat.

- task: content_safety_check_input $model=content_safety
content: |
Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.

<BEGIN UNSAFE CONTENT CATEGORIES>
S1: Violence.
S2: Sexual.
S3: Criminal Planning/Confessions.
S4: Guns and Illegal Weapons.
S5: Controlled/Regulated Substances.
S6: Suicide and Self Harm.
S7: Sexual (minor).
S8: Hate/Identity Hate.
S9: PII/Privacy.
S10: Harassment.
S11: Threat.
S12: Profanity.
S13: Needs Caution.
S14: Other.
S15: Manipulation.
S16: Fraud/Deception.
S17: Malware.
S18: High Risk Gov Decision Making.
S19: Political/Misinformation/Conspiracy.
S20: Copyright/Trademark/Plagiarism.
S21: Unauthorized Advice.
S22: Illegal Activity.
S23: Immoral/Unethical.
<END UNSAFE CONTENT CATEGORIES>

<BEGIN CONVERSATION>

user: {{ user_input }}

<END CONVERSATION>

Provide your safety assessment for the above conversation in the below output JSON format:
{"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}

Do not include anything other than the output JSON in your response.
Output JSON:
output_parser: nemoguard_parse_prompt_safety
max_tokens: 50

- task: content_safety_check_output $model=content_safety
content: |
Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.

<BEGIN UNSAFE CONTENT CATEGORIES>
S1: Violence.
S2: Sexual.
S3: Criminal Planning/Confessions.
S4: Guns and Illegal Weapons.
S5: Controlled/Regulated Substances.
S6: Suicide and Self Harm.
S7: Sexual (minor).
S8: Hate/Identity Hate.
S9: PII/Privacy.
S10: Harassment.
S11: Threat.
S12: Profanity.
S13: Needs Caution.
S14: Other.
S15: Manipulation.
S16: Fraud/Deception.
S17: Malware.
S18: High Risk Gov Decision Making.
S19: Political/Misinformation/Conspiracy.
S20: Copyright/Trademark/Plagiarism.
S21: Unauthorized Advice.
S22: Illegal Activity.
S23: Immoral/Unethical.
<END UNSAFE CONTENT CATEGORIES>

<BEGIN CONVERSATION>

user: {{ user_input }}

response: agent: {{ bot_response }}

<END CONVERSATION>

Provide your safety assessment for the above conversation in the below output JSON format:
{"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}

Do not include anything other than the output JSON in your response.
Output JSON:
output_parser: nemoguard_parse_response_safety
max_tokens: 50
12 changes: 12 additions & 0 deletions examples/configs/nemoguards_v2/rails.co
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import guardrails
import nemoguardrails.library.content_safety
import nemoguardrails.library.topic_safety
import nemoguardrails.library.jailbreak_detection

flow input rails $input_text
content safety check input $model="content_safety"
topic safety check input $model="topic_control"
jailbreak detection model

flow output rails $output_text
content safety check output $model="content_safety"