diff --git a/docs/user-guides/community/pangea.md b/docs/user-guides/community/pangea.md new file mode 100644 index 000000000..57aabb518 --- /dev/null +++ b/docs/user-guides/community/pangea.md @@ -0,0 +1,85 @@ +# Pangea AI Guard integration + +The Pangea guardrail uses configurable detection policies (called *recipes*) from the [AI Guard service](https://pangea.cloud/docs/ai-guard/) to identify and mitigate risks in AI application traffic, including: + +- Prompt injection attacks (with over 99% efficacy) +- 50+ types of PII and sensitive content, with support for custom patterns +- Toxicity, violence, self-harm, and other unwanted content +- Malicious links, IPs, and domains +- 100 spoken languages, with allowlist and denylist controls + +All detections are logged in an audit trail for analysis, attribution, and incident response. +You can also configure webhooks to trigger alerts for specific detection types. + +The following environment variable is required to use the Pangea AI Guard integration: + +- `PANGEA_API_TOKEN`: Pangea API token with access to the AI Guard service. + +You can also optionally set: + +- `PANGEA_BASE_URL_TEMPLATE`: Template for constructing the base URL for API requests. The `{SERVICE_NAME}` placeholder will be replaced with the service name slug. + Defaults to `https://ai-guard.aws.us.pangea.cloud` for Pangea's hosted (SaaS) deployment. + +## Setup + +Colang v1: + +```yaml +# config.yml + +rails: + config: + pangea: + input: + recipe: pangea_prompt_guard + output: + recipe: pangea_llm_response_guard + + input: + flows: + - pangea ai guard input + + output: + flows: + - pangea ai guard output +``` + +Colang v2: + +```yaml +# config.yml + +colang_version: "2.x" + +rails: + config: + pangea: + input: + recipe: pangea_prompt_guard + output: + recipe: pangea_llm_response_guard +``` + +``` +# rails.co + +import guardrails +import nemoguardrails.library.pangea + +flow input rails $input_text + pangea ai guard input + +flow output rails $output_text + pangea ai guard output +``` + +## Next steps + +- Explore example configurations for integrating Pangea AI Guard with your preferred Colang version: + - [Pangea AI Guard for NeMo Guardrails v1](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/examples/configs/pangea) + - [Pangea AI Guard for NeMo Guardrails v2](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/examples/configs/pangea_v2) + - [Pangea AI Guard without LLM (guardrails only)](https://github.com/NVIDIA/NeMo-Guardrails/tree/develop/examples/configs/pangea_v2_no_llm) – Use this setup to evaluate AI Guard’s detection and response capabilities independently. +- Adjust your detection policies to fit your application’s risk profile. See the [AI Guard Recipes](https://pangea.cloud/docs/ai-guard/recipes) documentation for configuration details. +- Enable [AI Guard webhooks](https://pangea.cloud/docs/ai-guard/recipes#add-webhooks-to-detectors) to receive real-time alerts for detections in your NeMo Guardrails-powered application. +- Monitor and analyze detection activity in the [AI Guard Activity Log](https://pangea.cloud/docs/ai-guard/activity-log) for auditing and attribution. +- Learn more about [AI Guard Deployment Options](https://pangea.cloud/docs/deployment-models/) to understand how and where AI Guard can run to protect your AI applications. diff --git a/docs/user-guides/guardrails-library.md b/docs/user-guides/guardrails-library.md index a6af17311..b2f703aee 100644 --- a/docs/user-guides/guardrails-library.md +++ b/docs/user-guides/guardrails-library.md @@ -25,6 +25,7 @@ NeMo Guardrails comes with a library of built-in guardrails that you can easily - [Private AI PII detection](#private-ai-pii-detection) - [Fiddler Guardrails for Safety and Hallucination Detection](#fiddler-guardrails-for-safety-and-hallucination-detection) - [Prompt Security Protection](#prompt-security-protection) + - [Pangea AI Guard](#pangea-ai-guard) - OpenAI Moderation API - *[COMING SOON]* 4. Other @@ -866,6 +867,26 @@ rails: For more details, check out the [Prompt Security Integration](./community/prompt-security.md) page. +### Pangea AI Guard + +NeMo Guardrails supports using [Pangea AI Guard](https://pangea.cloud/services/ai-guard/) for protecting data and +interactions with LLMs within AI-powered applications. + +#### Example usage + +```yaml +rails: + input: + flows: + - pangea ai guard input + + output: + flows: + - pangea ai guard output +``` + +For more details, check out the [Pangea AI Guard Integration](./community/pangea.md) page. + ## Other ### Jailbreak Detection diff --git a/docs/user-guides/llm-support.md b/docs/user-guides/llm-support.md index 3437ebb8c..7cecd735f 100644 --- a/docs/user-guides/llm-support.md +++ b/docs/user-guides/llm-support.md @@ -40,6 +40,7 @@ If you want to use an LLM and you cannot see a prompt in the [prompts folder](ht | Patronus Evaluate API _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | | Fiddler Fast Faitfhulness Hallucination Detection _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | Fiddler Fast Safety & Jailbreak Detection _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | +| Pangea AI Guard integration _(LLM independent)_ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | Table legend: diff --git a/examples/configs/pangea/README.md b/examples/configs/pangea/README.md new file mode 100644 index 000000000..686b6dcb3 --- /dev/null +++ b/examples/configs/pangea/README.md @@ -0,0 +1,14 @@ +# Pangea Example + +This example demonstrates how to integrate with the [Pangea AI Guard](https://pangea.cloud/services/ai-guard/) API for protecting data and interactions with LLMs within AI-powered applications + +To test this configuration you can use the CLI Chat by running the following command from the `examples/configs/pangea` directory: + +```bash +poetry run nemoguardrails chat --config=. +``` + +Documentation: + +- [Full Pangea integration guide](../../../docs/user-guides/community/pangea.md) +- [Configuration options and setup instructions](../../../docs/user-guides/community/pangea.md#setup) diff --git a/examples/configs/pangea/config.yml b/examples/configs/pangea/config.yml new file mode 100644 index 000000000..89ba759bc --- /dev/null +++ b/examples/configs/pangea/config.yml @@ -0,0 +1,24 @@ +models: + - type: main + engine: openai + model: gpt-4o-mini + +instructions: + - type: general + content: | + You are a helpful assistant. + +rails: + config: + pangea: + input: + recipe: pangea_prompt_guard + output: + recipe: pangea_llm_response_guard + + input: + flows: + - pangea ai guard input + output: + flows: + - pangea ai guard output diff --git a/examples/configs/pangea_v2/README.md b/examples/configs/pangea_v2/README.md new file mode 100644 index 000000000..8aa5b9b3f --- /dev/null +++ b/examples/configs/pangea_v2/README.md @@ -0,0 +1,14 @@ +# Pangea Example + +This example demonstrates how to integrate with the [Pangea AI Guard](https://pangea.cloud/services/ai-guard/) API for protecting data and interactions with LLMs within AI-powered applications + +To test this configuration you can use the CLI Chat by running the following command from the `examples/configs/pangea_v2` directory: + +```bash +poetry run nemoguardrails chat --config=. +``` + +Documentation: + +- [Full Pangea integration guide](../../../docs/user-guides/community/pangea.md) +- [Configuration options and setup instructions](../../../docs/user-guides/community/pangea.md#setup) diff --git a/examples/configs/pangea_v2/config.yml b/examples/configs/pangea_v2/config.yml new file mode 100644 index 000000000..6110d4d97 --- /dev/null +++ b/examples/configs/pangea_v2/config.yml @@ -0,0 +1,19 @@ +colang_version: "2.x" + +models: + - type: main + engine: openai + model: gpt-4o-mini + +instructions: + - type: general + content: | + You are a helpful assistant. + +rails: + config: + pangea: + input: + recipe: pangea_prompt_guard + output: + recipe: pangea_llm_response_guard diff --git a/examples/configs/pangea_v2/main.co b/examples/configs/pangea_v2/main.co new file mode 100644 index 000000000..e95376eab --- /dev/null +++ b/examples/configs/pangea_v2/main.co @@ -0,0 +1,5 @@ +import core +import llm + +flow main + activate llm continuation diff --git a/examples/configs/pangea_v2/rails.co b/examples/configs/pangea_v2/rails.co new file mode 100644 index 000000000..635748084 --- /dev/null +++ b/examples/configs/pangea_v2/rails.co @@ -0,0 +1,8 @@ +import guardrails +import nemoguardrails.library.pangea + +flow input rails $input_text + pangea ai guard input + +flow output rails $output_text + pangea ai guard output diff --git a/examples/configs/pangea_v2_no_llm/config.yml b/examples/configs/pangea_v2_no_llm/config.yml new file mode 100644 index 000000000..93a55c408 --- /dev/null +++ b/examples/configs/pangea_v2_no_llm/config.yml @@ -0,0 +1,12 @@ +colang_version: "2.x" + +# No models section - guardrails only mode +# No LLM is required since we're only using Pangea APIs + +rails: + config: + pangea: + input: + recipe: pangea_prompt_guard + output: + recipe: pangea_llm_response_guard diff --git a/examples/configs/pangea_v2_no_llm/main.co b/examples/configs/pangea_v2_no_llm/main.co new file mode 100644 index 000000000..94ce17784 --- /dev/null +++ b/examples/configs/pangea_v2_no_llm/main.co @@ -0,0 +1,12 @@ +import core + +flow main + activate message handler + +# Allow continuation after blocked messages in guardrails only mode +flow message handler + when user said something + global $user_message + # At this point, $user_message contains the processed value from input rails + bot say "Processed message: {$user_message}" + activate message handler # Reactivate for next message diff --git a/examples/configs/pangea_v2_no_llm/rails.co b/examples/configs/pangea_v2_no_llm/rails.co new file mode 100644 index 000000000..635748084 --- /dev/null +++ b/examples/configs/pangea_v2_no_llm/rails.co @@ -0,0 +1,8 @@ +import guardrails +import nemoguardrails.library.pangea + +flow input rails $input_text + pangea ai guard input + +flow output rails $output_text + pangea ai guard output diff --git a/nemoguardrails/library/pangea/__init__.py b/nemoguardrails/library/pangea/__init__.py new file mode 100644 index 000000000..9ba9d4310 --- /dev/null +++ b/nemoguardrails/library/pangea/__init__.py @@ -0,0 +1,14 @@ +# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/nemoguardrails/library/pangea/actions.py b/nemoguardrails/library/pangea/actions.py new file mode 100644 index 000000000..f29f7907d --- /dev/null +++ b/nemoguardrails/library/pangea/actions.py @@ -0,0 +1,150 @@ +# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging +import os +from collections.abc import Mapping +from typing import Any, Optional + +import httpx +from pydantic import BaseModel +from pydantic_core import to_json +from typing_extensions import Literal, cast + +from nemoguardrails.actions import action +from nemoguardrails.rails.llm.config import PangeaRailConfig, RailsConfig + +log = logging.getLogger(__name__) + + +class Message(BaseModel): + role: str + content: str + + +class TextGuardResult(BaseModel): + prompt_messages: Optional[list[Message]] = None + """Updated structured prompt, if applicable.""" + + blocked: Optional[bool] = None + """Whether or not the prompt triggered a block detection.""" + + transformed: Optional[bool] = None + """Whether or not the original input was transformed.""" + + # Additions. + bot_message: Optional[str] = None + user_message: Optional[str] = None + + +class TextGuardResponse(BaseModel): + result: TextGuardResult + + +def get_pangea_config(config: RailsConfig) -> PangeaRailConfig: + if not hasattr(config.rails.config, "pangea") or config.rails.config.pangea is None: + return PangeaRailConfig() + + return cast(PangeaRailConfig, config.rails.config.pangea) + + +@action(is_system_action=True) +async def pangea_ai_guard( + mode: Literal["input", "output"], + config: RailsConfig, + context: Mapping[str, Any] = {}, + user_message: Optional[str] = None, + bot_message: Optional[str] = None, +) -> TextGuardResult: + pangea_base_url_template = os.getenv( + "PANGEA_BASE_URL_TEMPLATE", "https://{SERVICE_NAME}.aws.us.pangea.cloud" + ) + pangea_api_token = os.getenv("PANGEA_API_TOKEN") + + if not pangea_api_token: + raise ValueError("PANGEA_API_TOKEN environment variable is not set.") + + pangea_config = get_pangea_config(config) + + user_message = user_message or context.get("user_message") + bot_message = bot_message or context.get("bot_message") + + if not any([user_message, bot_message]): + raise ValueError("Either user_message or bot_message must be provided.") + + messages: list[Message] = [] + if config.instructions: + messages.extend( + [ + Message(role="system", content=instruction.content) + for instruction in config.instructions + ] + ) + if user_message: + messages.append(Message(role="user", content=user_message)) + if mode == "output" and bot_message: + messages.append(Message(role="assistant", content=bot_message)) + + recipe = ( + pangea_config.input.recipe + if mode == "input" and pangea_config.input + else ( + pangea_config.output.recipe + if mode == "output" and pangea_config.output + else None + ) + ) + + async with httpx.AsyncClient( + base_url=pangea_base_url_template.format(SERVICE_NAME="ai-guard") + ) as client: + data = {"messages": messages, "recipe": recipe} + # Remove `None` values. + data = {k: v for k, v in data.items() if v is not None} + + response = await client.post( + "/v1/text/guard", + content=to_json(data), + headers={ + "Accept": "application/json", + "Authorization": f"Bearer {pangea_api_token}", + "Content-Type": "application/json", + "User-Agent": "NeMo Guardrails (https://github.com/NVIDIA/NeMo-Guardrails)", + }, + ) + try: + response.raise_for_status() + text_guard_response = TextGuardResponse(**response.json()) + except Exception as e: + log.error("Error calling Pangea AI Guard API: %s", e) + return TextGuardResult( + prompt_messages=messages, + blocked=False, + transformed=False, + bot_message=bot_message, + user_message=user_message, + ) + + result = text_guard_response.result + prompt_messages = result.prompt_messages or [] + + result.bot_message = next( + (m.content for m in prompt_messages if m.role == "assistant"), bot_message + ) + result.user_message = next( + (m.content for m in prompt_messages if m.role == "user"), user_message + ) + + return result diff --git a/nemoguardrails/library/pangea/flows.co b/nemoguardrails/library/pangea/flows.co new file mode 100644 index 000000000..5be9f2b4f --- /dev/null +++ b/nemoguardrails/library/pangea/flows.co @@ -0,0 +1,31 @@ +# INPUT RAILS + +flow pangea ai guard input + $result = await PangeaAiGuardAction(mode="input") + + if $result.blocked + if $system.config.enable_rails_exceptions + send PangeaAiGuardRailException(message="Response not allowed. The response was blocked by the 'pangea ai guard input' flow.") + else + bot inform answer unknown + abort + + if $result.transformed + global $user_message + $user_message = $result.user_message + +# OUTPUT RAILS + +flow pangea ai guard output + $result = await PangeaAiGuardAction(mode="output") + + if $result.blocked + if $system.config.enable_rails_exceptions + send PangeaAiGuardRailException(message="Response not allowed. The response was blocked by the 'pangea ai guard output' flow.") + else + bot inform answer unknown + abort + + if $result.transformed + global $bot_message + $bot_message = $result.bot_message diff --git a/nemoguardrails/library/pangea/flows.v1.co b/nemoguardrails/library/pangea/flows.v1.co new file mode 100644 index 000000000..c754eb4dc --- /dev/null +++ b/nemoguardrails/library/pangea/flows.v1.co @@ -0,0 +1,31 @@ +# INPUT RAILS + +define subflow pangea ai guard input + $result = execute pangea_ai_guard(mode="input") + + if $result.blocked + if $config.enable_rails_exceptions + create event PangeaAiGuardRailException(message="Response not allowed. The response was blocked by the 'pangea ai guard input' flow.") + else + bot inform answer unknown + stop + + if $result.transformed + $bot_message = $result.bot_message + $user_message = $result.user_message + +# OUTPUT RAILS + +define subflow pangea ai guard output + $result = execute pangea_ai_guard(mode="output") + + if $result.blocked + if $config.enable_rails_exceptions + create event PangeaAiGuardRailException(message="Response not allowed. The response was blocked by the 'pangea ai guard output' flow.") + else + bot inform answer unknown + stop + + if $result.transformed + $bot_message = $result.bot_message + $user_message = $result.user_message diff --git a/nemoguardrails/rails/llm/config.py b/nemoguardrails/rails/llm/config.py index d0e0cf03e..2aa347f8d 100644 --- a/nemoguardrails/rails/llm/config.py +++ b/nemoguardrails/rails/llm/config.py @@ -752,6 +752,28 @@ class ClavataRailConfig(BaseModel): ) +class PangeaRailOptions(BaseModel): + """Configuration data for the Pangea AI Guard API""" + + recipe: str = Field( + description="""Recipe key of a configuration of data types and settings defined in the Pangea User Console. It + specifies the rules that are to be applied to the text, such as defang malicious URLs.""" + ) + + +class PangeaRailConfig(BaseModel): + """Configuration data for the Pangea AI Guard API""" + + input: Optional[PangeaRailOptions] = Field( + default=None, + description="Pangea configuration for an Input Guardrail", + ) + output: Optional[PangeaRailOptions] = Field( + default=None, + description="Pangea configuration for an Output Guardrail", + ) + + class RailsConfigData(BaseModel): """Configuration data for specific rails that are supported out-of-the-box.""" @@ -800,6 +822,11 @@ class RailsConfigData(BaseModel): description="Configuration for Clavata.", ) + pangea: Optional[PangeaRailConfig] = Field( + default_factory=PangeaRailConfig, + description="Configuration for Pangea.", + ) + class Rails(BaseModel): """Configuration of specific rails.""" diff --git a/tests/test_pangea_ai_guard.py b/tests/test_pangea_ai_guard.py new file mode 100644 index 000000000..79f2c822d --- /dev/null +++ b/tests/test_pangea_ai_guard.py @@ -0,0 +1,171 @@ +# SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pytest +from pytest_httpx import HTTPXMock + +from nemoguardrails import RailsConfig +from tests.utils import TestChat + +input_rail_config = RailsConfig.from_content( + yaml_content=""" + models: [] + rails: + input: + flows: + - pangea ai guard input + """ +) +output_rail_config = RailsConfig.from_content( + yaml_content=""" + models: [] + rails: + output: + flows: + - pangea ai guard output + """ +) + + +@pytest.mark.unit +@pytest.mark.parametrize("config", (input_rail_config, output_rail_config)) +def test_pangea_ai_guard_blocked( + httpx_mock: HTTPXMock, monkeypatch: pytest.MonkeyPatch, config: RailsConfig +): + monkeypatch.setenv("PANGEA_API_TOKEN", "test-token") + httpx_mock.add_response( + is_reusable=True, + json={ + "result": { + "blocked": True, + "transformed": False, + "prompt_messages": [], + } + }, + ) + + chat = TestChat( + config, + llm_completions=[ + " express greeting", + ' "James Bond\'s email is j.bond@mi6.co.uk"', + ], + ) + + chat >> "Hi!" + chat << "I don't know the answer to that." + + +@pytest.mark.unit +def test_pangea_ai_guard_input_transform( + httpx_mock: HTTPXMock, monkeypatch: pytest.MonkeyPatch +): + monkeypatch.setenv("PANGEA_API_TOKEN", "test-token") + httpx_mock.add_response( + is_reusable=True, + json={ + "result": { + "blocked": False, + "transformed": True, + "prompt_messages": [ + { + "role": "user", + "content": "James Bond's email is ", + }, + { + "role": "assistant", + "content": "Oh, that is interesting.", + }, + ], + } + }, + ) + + chat = TestChat(input_rail_config, llm_completions=[' "Oh, that is interesting."']) + + chat >> "James Bond's email is j.bond@mi6.co.uk" + chat << "Oh, that is interesting." + + +@pytest.mark.unit +def test_pangea_ai_guard_output_transform( + httpx_mock: HTTPXMock, monkeypatch: pytest.MonkeyPatch +): + monkeypatch.setenv("PANGEA_API_TOKEN", "test-token") + httpx_mock.add_response( + is_reusable=True, + json={ + "result": { + "blocked": False, + "transformed": True, + "prompt_messages": [ + { + "role": "assistant", + "content": "James Bond's email is ", + } + ], + } + }, + ) + + chat = TestChat( + output_rail_config, + llm_completions=[ + " express greeting", + ' "James Bond\'s email is j.bond@mi6.co.uk"', + ], + ) + + chat >> "Hi!" + chat << "James Bond's email is " + + +@pytest.mark.unit +@pytest.mark.parametrize("status_code", frozenset({429, 500, 502, 503, 504})) +def test_pangea_ai_guard_error( + httpx_mock: HTTPXMock, monkeypatch: pytest.MonkeyPatch, status_code: int +): + monkeypatch.setenv("PANGEA_API_TOKEN", "test-token") + httpx_mock.add_response( + is_reusable=True, status_code=status_code, json={"result": {}} + ) + + chat = TestChat(output_rail_config, llm_completions=[" Hello!"]) + + chat >> "Hi!" + chat << "Hello!" + + +@pytest.mark.unit +def test_pangea_ai_guard_missing_env_var(): + chat = TestChat(input_rail_config, llm_completions=[]) + chat >> "Hi!" + chat << "I'm sorry, an internal error has occurred." + + +@pytest.mark.unit +def test_pangea_ai_guard_malformed_response( + httpx_mock: HTTPXMock, monkeypatch: pytest.MonkeyPatch +): + monkeypatch.setenv("PANGEA_API_TOKEN", "test-token") + httpx_mock.add_response(is_reusable=True, text="definitely not valid JSON") + + chat = TestChat( + input_rail_config, + llm_completions=[' "James Bond\'s email is j.bond@mi6.co.uk"'], + ) + + chat >> "Hi!" + chat << "James Bond's email is j.bond@mi6.co.uk"