-
Notifications
You must be signed in to change notification settings - Fork 999
docs(analyzer): explain custom Docker language images #1919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -75,3 +75,28 @@ the `docker build` phase and the models defined in it are installed automaticall | |||||
|
|
||||||
| For `transformers` based models, the configuration [can be found here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/conf/transformers.yaml). | ||||||
| A docker file supporting transformers models [can be found here](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/Dockerfile.transformers). | ||||||
|
|
||||||
| ### Building custom Docker images for more languages | ||||||
|
|
||||||
| If you want to support languages beyond English in a custom Docker image, start with the NLP configuration file that the image copies during build: | ||||||
|
|
||||||
| - `presidio-analyzer/presidio_analyzer/conf/default.yaml` for the standard spaCy-based image | ||||||
| - `presidio-analyzer/presidio_analyzer/conf/transformers.yaml` for the transformers image | ||||||
| - `presidio-analyzer/presidio_analyzer/conf/stanza.yaml` for the Stanza image | ||||||
|
|
||||||
| Then pass that file to the Docker build through `NLP_CONF_FILE`. For example: | ||||||
|
|
||||||
| ```bash | ||||||
| docker build -f presidio-analyzer/Dockerfile \ | ||||||
| --build-arg NLP_CONF_FILE=presidio_analyzer/conf/default.yaml \ | ||||||
|
||||||
| --build-arg NLP_CONF_FILE=presidio_analyzer/conf/default.yaml \ | |
| --build-arg NLP_CONF_FILE=presidio-analyzer/presidio_analyzer/conf/default.yaml \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section suggests only editing/passing
NLP_CONF_FILE, but enabling additional languages in the container also typically requires updatingsupported_languagesinpresidio-analyzer/presidio_analyzer/conf/default_analyzer.yamland ensuring the language has appropriate entries enabled inpresidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml. Since the Dockerfiles already exposeANALYZER_CONF_FILEandRECOGNIZER_REGISTRY_CONF_FILEbuild args, consider documenting those alongsideNLP_CONF_FILE(and mentioning that all three configs must be consistent) to avoid unsupported-language errors and the recognizer warnings mentioned below.