Detect inconsistent attribute values and suggest normalization mappings

## Context

Many product catalogs contain inconsistencies in how attribute values are written, such as `"Red"`, `"red"`, `"RED"`, or `"rouge"`. These inconsistencies harm the quality of filters, facets, analytics, and search relevance.

Today, this creates friction and frustration: although data quality isn’t Algolia’s direct responsibility, the platform is often perceived as “not smart enough” to unify and make sense of inconsistent data. MCP can change that by automating the analysis and offering safe, scalable corrections based on actual usage.

## Opportunity

This is a high-leverage opportunity for MCP to:
1. **Detect** attributes that suffer from inconsistent or overly diverse values.
2. **Cluster** variant values into canonical groups.
3. **Suggest** normalization mappings in a structured format.
4. **Propose or preview** an enrichment or transformation config to apply these mappings.

The outcome: cleaner data, better UX, and smarter merchandising, with minimal manual effort.

## Proposed behavior

The MCP node should:
- Analyze attributes with high cardinality and inconsistent formatting.
- Identify likely normalization candidates (e.g. color, size, brand).
- Cluster values using heuristics (e.g. casing, spelling distance, language overlap).
- Generate a normalization plan mapping variants to canonical values.
- Provide:
  - A justification of why each attribute was flagged.
  - A sample diff showing the proposed changes.
  - Safe next-step actions (e.g. generate transformer config, preview enrichment).

## Example prompt

```
Can you analyze my index and find facets or other kind of attributes with inconsistent values? I'd like to normalize them if possible.
```

## Expected output

```json
{
  "attribute": "color",
  "reasonFlagged": "High number of variants with similar semantics and inconsistent formatting",
  "normalizationPlan": [
    {
      "canonical": "red",
      "variants": ["Red", "RED", "rouge"]
    },
    {
      "canonical": "blue",
      "variants": ["Bleu", "Blue", "blue"]
    }
  ],
  "previewDiff": [
    { "objectID": "001", "original": "Red", "suggested": "red" },
    { "objectID": "002", "original": "rouge", "suggested": "red" }
  ],
  "recommendedNextActions": [
    "🔧 Generate a transformer config to normalize 'color' values using this mapping",
    "🧪 Preview enrichment for 500 products to validate impact",
    "📘 Read the guide on attribute normalization best practices: https://algolia.com/doc/"
  ]
}
```

## Value

- Enables clients to clean data at scale without manual audits.
- Improves consistency in filtering, facets, and UX.
- Gives CSMs and merchandisers actionable insights with direct execution paths.
- Shows Algolia as a proactive partner, not just a search engine.

## Notes

- Could evolve to include confidence scores or user feedback on suggested clusters.
- Cluster logic could optionally use locale info or historical query logs for disambiguation.
- An Algolia transformation could be proposed or configured to run a clean on the data on a regular basis (e.g. daily)
- Pairing this with analytics (e.g. how often a variant is filtered) could guide prioritization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Detect inconsistent attribute values and suggest normalization mappings #32

Context

Opportunity

Proposed behavior

Example prompt

Expected output

Value

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Detect inconsistent attribute values and suggest normalization mappings #32

Description

Context

Opportunity

Proposed behavior

Example prompt

Expected output

Value

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions