Skip to content

Detect inconsistent attribute values and suggest normalization mappings #32

@emirbelkahia

Description

@emirbelkahia

Context

Many product catalogs contain inconsistencies in how attribute values are written, such as "Red", "red", "RED", or "rouge". These inconsistencies harm the quality of filters, facets, analytics, and search relevance.

Today, this creates friction and frustration: although data quality isn’t Algolia’s direct responsibility, the platform is often perceived as “not smart enough” to unify and make sense of inconsistent data. MCP can change that by automating the analysis and offering safe, scalable corrections based on actual usage.

Opportunity

This is a high-leverage opportunity for MCP to:

  1. Detect attributes that suffer from inconsistent or overly diverse values.
  2. Cluster variant values into canonical groups.
  3. Suggest normalization mappings in a structured format.
  4. Propose or preview an enrichment or transformation config to apply these mappings.

The outcome: cleaner data, better UX, and smarter merchandising, with minimal manual effort.

Proposed behavior

The MCP node should:

  • Analyze attributes with high cardinality and inconsistent formatting.
  • Identify likely normalization candidates (e.g. color, size, brand).
  • Cluster values using heuristics (e.g. casing, spelling distance, language overlap).
  • Generate a normalization plan mapping variants to canonical values.
  • Provide:
    • A justification of why each attribute was flagged.
    • A sample diff showing the proposed changes.
    • Safe next-step actions (e.g. generate transformer config, preview enrichment).

Example prompt

Can you analyze my index and find facets or other kind of attributes with inconsistent values? I'd like to normalize them if possible.

Expected output

{
  "attribute": "color",
  "reasonFlagged": "High number of variants with similar semantics and inconsistent formatting",
  "normalizationPlan": [
    {
      "canonical": "red",
      "variants": ["Red", "RED", "rouge"]
    },
    {
      "canonical": "blue",
      "variants": ["Bleu", "Blue", "blue"]
    }
  ],
  "previewDiff": [
    { "objectID": "001", "original": "Red", "suggested": "red" },
    { "objectID": "002", "original": "rouge", "suggested": "red" }
  ],
  "recommendedNextActions": [
    "🔧 Generate a transformer config to normalize 'color' values using this mapping",
    "🧪 Preview enrichment for 500 products to validate impact",
    "📘 Read the guide on attribute normalization best practices: https://algolia.com/doc/"
  ]
}

Value

  • Enables clients to clean data at scale without manual audits.
  • Improves consistency in filtering, facets, and UX.
  • Gives CSMs and merchandisers actionable insights with direct execution paths.
  • Shows Algolia as a proactive partner, not just a search engine.

Notes

  • Could evolve to include confidence scores or user feedback on suggested clusters.
  • Cluster logic could optionally use locale info or historical query logs for disambiguation.
  • An Algolia transformation could be proposed or configured to run a clean on the data on a regular basis (e.g. daily)
  • Pairing this with analytics (e.g. how often a variant is filtered) could guide prioritization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions