-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Context
Many product catalogs contain inconsistencies in how attribute values are written, such as "Red"
, "red"
, "RED"
, or "rouge"
. These inconsistencies harm the quality of filters, facets, analytics, and search relevance.
Today, this creates friction and frustration: although data quality isn’t Algolia’s direct responsibility, the platform is often perceived as “not smart enough” to unify and make sense of inconsistent data. MCP can change that by automating the analysis and offering safe, scalable corrections based on actual usage.
Opportunity
This is a high-leverage opportunity for MCP to:
- Detect attributes that suffer from inconsistent or overly diverse values.
- Cluster variant values into canonical groups.
- Suggest normalization mappings in a structured format.
- Propose or preview an enrichment or transformation config to apply these mappings.
The outcome: cleaner data, better UX, and smarter merchandising, with minimal manual effort.
Proposed behavior
The MCP node should:
- Analyze attributes with high cardinality and inconsistent formatting.
- Identify likely normalization candidates (e.g. color, size, brand).
- Cluster values using heuristics (e.g. casing, spelling distance, language overlap).
- Generate a normalization plan mapping variants to canonical values.
- Provide:
- A justification of why each attribute was flagged.
- A sample diff showing the proposed changes.
- Safe next-step actions (e.g. generate transformer config, preview enrichment).
Example prompt
Can you analyze my index and find facets or other kind of attributes with inconsistent values? I'd like to normalize them if possible.
Expected output
{
"attribute": "color",
"reasonFlagged": "High number of variants with similar semantics and inconsistent formatting",
"normalizationPlan": [
{
"canonical": "red",
"variants": ["Red", "RED", "rouge"]
},
{
"canonical": "blue",
"variants": ["Bleu", "Blue", "blue"]
}
],
"previewDiff": [
{ "objectID": "001", "original": "Red", "suggested": "red" },
{ "objectID": "002", "original": "rouge", "suggested": "red" }
],
"recommendedNextActions": [
"🔧 Generate a transformer config to normalize 'color' values using this mapping",
"🧪 Preview enrichment for 500 products to validate impact",
"📘 Read the guide on attribute normalization best practices: https://algolia.com/doc/"
]
}
Value
- Enables clients to clean data at scale without manual audits.
- Improves consistency in filtering, facets, and UX.
- Gives CSMs and merchandisers actionable insights with direct execution paths.
- Shows Algolia as a proactive partner, not just a search engine.
Notes
- Could evolve to include confidence scores or user feedback on suggested clusters.
- Cluster logic could optionally use locale info or historical query logs for disambiguation.
- An Algolia transformation could be proposed or configured to run a clean on the data on a regular basis (e.g. daily)
- Pairing this with analytics (e.g. how often a variant is filtered) could guide prioritization.
Metadata
Metadata
Assignees
Labels
No labels