Skip to content

OpenAI Privacy Filter model support #14

@its-nikhil

Description

@its-nikhil

Feature Description

Add support for OpenAI Privacy Filter model https://huggingface.co/openai/privacy-filter

Use Case

PII Detection. Privacy Filter predicts spans across eight categories:

  • private_person
  • private_address
  • private_email
  • private_phone
  • private_url
  • private_date
  • account_number
  • secret

Proposed Solution

Privacy Filter is a small model with frontier personal data detection capability. It is designed for high-throughput privacy workflows, and is able to perform context-aware detection of PII in unstructured text. It can run locally, which means that PII can be masked or redacted without leaving your machine. It processes long inputs efficiently, making redaction decisions in a quick, single pass.

Alternatives Considered

Much better results than existing Regex based solution. On the PII-Masking-300k⁠(opens in a new window) benchmark, Privacy Filter achieves an F1 score of 96% (94.04% precision and 98.04% recall). On a corrected version of the benchmark that accounts for dataset annotation issues identified during review, the F1 score is 97.43% (96.79% precision and 98.08% recall).

Example Usage

# How you imagine using this feature
from localmod.classifiers.pii import PIIDetector
    
detector = PIIDetector()

# ...

Additional Context

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions