DataDefinition should restrict column processing, not just annotate it

# `DataDefinition` should restrict column processing, not just annotate it

## Description

When creating a `Dataset` with an explicit `DataDefinition` specifying `numerical_columns` and `categorical_columns`, Evidently still iterates over and attempts to infer types for **all columns in the DataFrame**, not just the ones specified. This leads to unexpected errors and defeats the purpose of providing an explicit schema.

## Steps to Reproduce

```python
import pandas as pd
from evidently import Dataset
from evidently.core.datasets import DataDefinition

df = pd.DataFrame({
    "feature_a": [1, 2, 3, 4, 5],
    "feature_b": ["x", "y", "z", "x", "y"],
    "internal_col": [None, None, None, None, None],  # Not relevant to analysis
})

# Explicitly define only the columns I care about
definition = DataDefinition(
    numerical_columns=["feature_a"],
    categorical_columns=["feature_b"],
)

# This still fails because Evidently processes "internal_col"
dataset = Dataset.from_pandas(df, data_definition=definition)
```

## Expected Behavior

When I provide a `DataDefinition` with explicit column lists, Evidently should **only** process those columns. The `DataDefinition` should act as a contract/schema that restricts scope, not just a hint that gets merged with auto-inference.

Columns not mentioned in the `DataDefinition` should be ignored entirely.

## Actual Behavior

Evidently iterates over all columns in the DataFrame and runs `infer_column_type` on each, regardless of whether they appear in the `DataDefinition`. This causes:

1. **Unexpected errors** from columns the user explicitly chose not to include (e.g., all-null columns, malformed data)
2. **Wasted computation** on columns that won't be used in the analysis
3. **Confusion** about what `DataDefinition` actually does

## Current Workaround

Subset the DataFrame manually before passing to Evidently:

```python
cols = ["feature_a", "feature_b"]
dataset = Dataset.from_pandas(df[cols], data_definition=definition)
```

This works but is redundant—I'm specifying the columns twice (once in the subset, once in `DataDefinition`).

## Suggested Behavior

If `DataDefinition` specifies columns explicitly, only process those columns:

```python
def _generate_data_definition(self, data, reserved_fields, service_columns):
    # If user provided explicit column lists, only iterate over those
    columns_to_process = set()
    if self._data_definition:
        if self._data_definition.numerical_columns:
            columns_to_process.update(self._data_definition.numerical_columns)
        if self._data_definition.categorical_columns:
            columns_to_process.update(self._data_definition.categorical_columns)
        # ... etc for other column types
    
    # Fall back to all columns only if no explicit definition provided
    if not columns_to_process:
        columns_to_process = set(data.columns)
    
    for column in columns_to_process:
        # ... existing logic
```

Alternatively, add a parameter like `strict=True` to `DataDefinition` that enforces this behavior for users who want it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataDefinition should restrict column processing, not just annotate it #1765

`DataDefinition` should restrict column processing, not just annotate it

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Current Workaround

Suggested Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DataDefinition should restrict column processing, not just annotate it #1765

Description

DataDefinition should restrict column processing, not just annotate it

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Current Workaround

Suggested Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`DataDefinition` should restrict column processing, not just annotate it