Skip to content

docs(examples): add Colang 2.0 example for sensitive data detection #1301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions examples/configs/sensitive_data_detection_v2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Presidio-based Sensitive Data Detection Example

This example demonstrates how to detect and redact sensitive data using [Presidio](https://github.com/Microsoft/presidio).

## Prerequisites

- `Presidio`

You can install it with:

```bash
poetry run pip install presidio-analyzer presidio-anonymizer
```

> **Note**
>
> Presidio may come with an unsupported version of `numpy`. To reinstall the supported version, run:
> ```bash
> poetry install
> ```

- `en_core_web_lg` spaCy model

You can download it with:

```bash
poetry run python -m spacy download en_core_web_lg
```

## Running example

To test this configuration, run the CLI chat from the `examples/configs/sensitive_data_detection_v2` directory:

```bash
poetry run nemoguardrails chat --config=.
```

## Documentation

- [Presidio-based Sensitive Data Detection configuration](../../../docs/user-guides/guardrails-library.md#presidio-based-sensitive-data-detection)
- [Presidio Integration guide](../../../docs/user-guides/community/presidio.md)
29 changes: 29 additions & 0 deletions examples/configs/sensitive_data_detection_v2/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
colang_version: "2.x"

models:
- type: main
engine: openai
model: gpt-4o-mini

rails:
config:
sensitive_data_detection:
input:
score_threshold: 0.4
entities:
- PERSON
- EMAIL_ADDRESS
- PHONE_NUMBER
- CREDIT_CARD
- US_SSN
- LOCATION

output:
score_threshold: 0.4
entities:
- PERSON
- EMAIL_ADDRESS
- PHONE_NUMBER
- CREDIT_CARD
- US_SSN
- LOCATION
10 changes: 10 additions & 0 deletions examples/configs/sensitive_data_detection_v2/flows.co
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import guardrails
import nemoguardrails.library.sensitive_data_detection

flow input rails $input_text
"""Check user utterances before they get further processed."""
await mask sensitive data on input

flow output rails $output_text
"""Check response before sending it to user."""
await mask sensitive data on output
5 changes: 5 additions & 0 deletions examples/configs/sensitive_data_detection_v2/main.co
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import core
import llm

flow main
activate llm continuation
5 changes: 4 additions & 1 deletion nemoguardrails/library/sensitive_data_detection/flows.co
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ flow detect sensitive data on input

flow mask sensitive data on input
"""Mask any sensitive data found in the user input."""
global $user_message
$user_message = await MaskSensitiveDataAction(source="input", text=$user_message)


Expand All @@ -28,10 +29,11 @@ flow detect sensitive data on output

flow mask sensitive data on output
"""Mask any sensitive data found in the bot output."""
global $bot_message
$bot_message = await MaskSensitiveDataAction(source="output", text=$bot_message)


# RETRIVAL RAILS
# RETRIEVAL RAILS


flow detect sensitive data on retrieval
Expand All @@ -45,4 +47,5 @@ flow detect sensitive data on retrieval

flow mask sensitive data on retrieval
"""Mask any sensitive data found in the relevant chunks from the knowledge base."""
global $relevant_chunks
$relevant_chunks = await MaskSensitiveDataAction(source="retrieval", text=$relevant_chunks)