Skip to content

Creating PII Recognizers for text from transcribed audio #1939

@RicardoAALL

Description

@RicardoAALL

Hello, I'm quite new to the world of integrating models into software so I was hoping for some advice on how to proceed.

I'm working with conversational text transcribed from audio but I'm running into issues with the text being a bit too unstructured for the built-in recognizers.

For example, the conversation may go something like:

EMPLOYEE: If you'd like we can move onto the payment.
CUSTOMER: Yes, I'd like to pay with my credit card.
EMPLOYEE: Okay, what is the number on the card.
CUSTOMER: 1-2-3-4.
EMPLOYEE: 1-2-3-4.
CUSTOMER: 5-6-7-8.
EMPLOYEE: 5-6-7-8.
CUSTOMER: Sorry, it's actually 5-6-7-7.
EMPLOYEE: Okay, just to confirm 1-2-3-4-5-6-7-7.
CUSTOMER: Yes, that's right.
EMPLOYEE: Okay, go ahead.
...

I've built my own Credit Card recognizer but it proved to be either too heavy handed, matching all numbers, or not heavy handed enough, skipping all PII. I tried modifying confidence scores, default thresholds, and creating a context aware enhancer but I haven't made much progress on the issue.

I'd appreciate any help or insight on how best to tackle this problem. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions