BasicLangExtractRecognizer silently drops provider.language_model_params (timeout, num_ctx, ...)

## Summary

`BasicLangExtractRecognizer` loads `langextract.model.provider.language_model_params` from the yaml into `self._language_model_params` and forwards it to `langextract.extract(..., language_model_params=...)`. But because the same call also passes `config=ModelConfig(...)` (built from `provider.kwargs` only), `langextract/extraction.py` takes the `elif config:` branch at line 240, which **does not read `language_model_params`** — only the `else:` branch at line 255 does.

Net effect: any value under `provider.language_model_params` in the yaml — including `timeout` and `num_ctx` — is **silently dropped**. Ollama falls back to its 120 s default, which is the source of the confusing

```
InferenceRuntimeError: Ollama Model timed out (timeout=120, num_threads=None)
```

users see even after setting `timeout: 1800` (or any other value) in the yaml. The default bundled config at `presidio-analyzer/presidio_analyzer/conf/langextract_config_basic.yaml` is also affected — the `timeout: 240` and `num_ctx: 8192` it ships with have no effect at runtime.

## Reproduction

```yaml
# repro_config.yaml
lm_recognizer:
  supported_entities: [PERSON]
langextract:
  prompt_file: "langextract_prompts/default_pii_phi_prompt.j2"
  examples_file: "langextract_prompts/default_pii_phi_examples.yaml"
  model:
    model_id: "gemma3:27b"
    provider:
      name: "ollama"
      kwargs:
        model_url: "http://localhost:11434"
      language_model_params:
        timeout: 1800
        num_ctx: 8192
  entity_mappings:
    person: PERSON
```

```python
from presidio_analyzer.predefined_recognizers.third_party.basic_langextract_recognizer import BasicLangExtractRecognizer

r = BasicLangExtractRecognizer(config_path="repro_config.yaml")
cfg = r._get_provider_params()["config"]
print(cfg.provider_kwargs)
# {'model_url': 'http://localhost:11434'}
# ^^ timeout and num_ctx are missing — they are stored on r._language_model_params
#    but never reach the provider.
```

Running any slow local model (e.g. `gemma3:27b` or any model whose first inference takes longer than 120 s on the host) then fails with:

```
langextract.core.exceptions.InferenceRuntimeError: Ollama Model timed out (timeout=120, num_threads=None)
```

## Root cause

`BasicLangExtractRecognizer._get_provider_params()` returns `{"config": self.lx_model_config}`. In `langextract/extraction.py`:

```python
elif config:                                       # taken for BasicLangExtractRecognizer
    language_model = factory.create_model(
        config=config,
        examples=...,
        use_schema_constraints=use_schema_constraints,
        fence_output=fence_output,
    )
else:                                              # only here is language_model_params read
    base_lm_kwargs = {...}
    base_lm_kwargs.update(language_model_params or {})
    filtered_kwargs = {k: v for k, v in base_lm_kwargs.items() if v is not None}
    config = factory.ModelConfig(model_id=model_id, provider_kwargs=filtered_kwargs)
    language_model = factory.create_model(config=config, ...)
```

Only `provider.kwargs` survives as `ModelConfig.provider_kwargs` and reaches `OllamaLanguageModel(**provider_kwargs)`, whose constructor accepts `timeout=` and forwards `num_ctx` via `**kwargs` to every `_ollama_query()` call.

## Why `AzureOpenAILangExtractRecognizer` is not affected

Its `_get_provider_params()` returns `{"model_id": ..., "language_model_params": ...}` — no `config`, so `lx.extract()` takes the `else:` branch where `language_model_params` is actually applied. The bug is specific to `BasicLangExtractRecognizer`.

## Why existing unit tests did not catch this

`presidio-analyzer/tests/test_basic_langextract_recognizer.py:568-570` asserts:

```python
assert "language_model_params" in call_kwargs
assert call_kwargs["language_model_params"]["timeout"] == 180
assert call_kwargs["language_model_params"]["num_ctx"] == 8192
```

i.e. it verifies that `language_model_params` is *passed* to `lx.extract()` — but never that it reaches the `ModelConfig` or the provider. The assertions are satisfied while the effective behaviour is broken. This issue adds regression tests that assert values arrive on `ModelConfig.provider_kwargs` as well.

## Proposed fix

Merge `self._language_model_params` into `self.provider_kwargs` before building `lx_factory.ModelConfig`, using `setdefault` so explicit `provider.kwargs:` entries still win. One-line behaviour change in `basic_langextract_recognizer.py`. PR forthcoming.

## Environment

- `presidio-analyzer` 2.2.362
- `langextract` 1.x
- Python 3.12, macOS 15.5 (Darwin 25.5.0)
- Ollama 0.x, model `gemma3:27b` / `gemma4:31b`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BasicLangExtractRecognizer silently drops provider.language_model_params (timeout, num_ctx, ...) #1942

Summary

Reproduction

Root cause

Why `AzureOpenAILangExtractRecognizer` is not affected

Why existing unit tests did not catch this

Proposed fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BasicLangExtractRecognizer silently drops provider.language_model_params (timeout, num_ctx, ...) #1942

Description

Summary

Reproduction

Root cause

Why AzureOpenAILangExtractRecognizer is not affected

Why existing unit tests did not catch this

Proposed fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Why `AzureOpenAILangExtractRecognizer` is not affected