Conversation
asya999
left a comment
There was a problem hiding this comment.
A few comments on the high value ones
| from: "reviews", | ||
| let: { pid: "$_id" }, | ||
| pipeline: [ | ||
| { $match: { $expr: { $eq: ["$productId", "$$pid"] } } }, |
There was a problem hiding this comment.
Should use local/foreign syntax on join equality rather than $expr
skills/mongodb-schema-design/references/antipattern-massive-arrays.md
Outdated
Show resolved
Hide resolved
skills/mongodb-schema-design/references/antipattern-schema-drift.md
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,103 @@ | |||
| --- | |||
| title: Avoid Unbounded Arrays | |||
There was a problem hiding this comment.
Is this significantly enough different from massive arrays?
There was a problem hiding this comment.
I don't know, but it's always seemed like an issue for me when we did the work around this anti-pattern in Compass that we were talking about "unbounded" arrays but really meant "large" arrays (since we weren't actually observing whether they were potentially growing or not)
There was a problem hiding this comment.
I've been thinking about this. Unbounded means conceptually it never stops growing (so a list of my orders is an example) but it's true if it's likely to stay small (a list of my shipping addresses) then it's not "as big" of a problem.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new MongoDB Agent skill focused on schema design patterns and anti-patterns, providing reference markdown rules that can be triggered during schema modeling, migrations, and performance troubleshooting.
Changes:
- Introduces
mongodb-schema-designskill definition with rule catalog and trigger phrases. - Adds reference documents covering schema anti-patterns, fundamentals, relationship modeling, design patterns, and schema validation.
- Includes “Verify with” diagnostic snippets and official MongoDB documentation links throughout references.
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/mongodb-schema-design/SKILL.md | Defines the skill, trigger phrases, and an index of all reference rules |
| skills/mongodb-schema-design/references/antipattern-bloated-documents.md | Guidance to split hot/cold fields to avoid oversized hot-path documents |
| skills/mongodb-schema-design/references/antipattern-excessive-lookups.md | Recommends denormalization/extended refs to reduce frequent $lookup on hot paths |
| skills/mongodb-schema-design/references/antipattern-massive-arrays.md | Caps large arrays and suggests recent-subset + overflow collection pattern |
| skills/mongodb-schema-design/references/antipattern-schema-drift.md | Explains schema drift risks and enforcing structure with validation/versioning |
| skills/mongodb-schema-design/references/antipattern-unbounded-arrays.md | Avoids unbounded arrays to prevent 16MB risk and performance degradation |
| skills/mongodb-schema-design/references/antipattern-unnecessary-collections.md | Discourages collection-per-partition designs; prefers indexing/bucketing/time-series |
| skills/mongodb-schema-design/references/antipattern-unnecessary-indexes.md | Index audit/removal process (hide → monitor → drop) to reduce write/cache overhead |
| skills/mongodb-schema-design/references/fundamental-16mb-awareness.md | Explains the 16MB BSON limit and patterns to design around it |
| skills/mongodb-schema-design/references/fundamental-data-together.md | Core modeling principle: store data accessed together; reduce app-side joins |
| skills/mongodb-schema-design/references/fundamental-document-model.md | Avoids 1:1 SQL table mapping; advocates aggregate-oriented documents |
| skills/mongodb-schema-design/references/fundamental-embed-vs-reference.md | Decision framework for embedding vs referencing with examples and diagnostics |
| skills/mongodb-schema-design/references/fundamental-schema-validation.md | Introduces JSON Schema validation and phased rollout approach |
| skills/mongodb-schema-design/references/pattern-approximation.md | Batching counters to reduce write load when exact real-time counts aren’t required |
| skills/mongodb-schema-design/references/pattern-archive.md | Archival strategies using $merge, deletion, scheduling, and storage options |
| skills/mongodb-schema-design/references/pattern-attribute.md | Key-value attribute array pattern for sparse/variable fields and indexing |
| skills/mongodb-schema-design/references/pattern-bucket.md | Bucket pattern for bounded grouping aligned to access patterns (pagination/time windows) |
| skills/mongodb-schema-design/references/pattern-computed.md | Pre-computing expensive aggregations and maintaining materialized results |
| skills/mongodb-schema-design/references/pattern-document-versioning.md | Full snapshot revision history pattern for audit/compliance/rollback |
| skills/mongodb-schema-design/references/pattern-extended-reference.md | Caching selected referenced fields to reduce repeated $lookup |
| skills/mongodb-schema-design/references/pattern-outlier.md | Outlier isolation pattern for exceptional documents with very large arrays |
| skills/mongodb-schema-design/references/pattern-polymorphic.md | Single-collection polymorphic pattern with discriminator + indexing/validation strategies |
| skills/mongodb-schema-design/references/pattern-schema-versioning.md | Schema evolution using schemaVersion and online migration approaches |
| skills/mongodb-schema-design/references/pattern-subset.md | Hot/cold split (subset pattern) to improve working-set efficiency |
| skills/mongodb-schema-design/references/pattern-time-series-collections.md | Time series collections guidance (metaField, granularity, TTL, sharding notes) |
| skills/mongodb-schema-design/references/relationship-many-to-many.md | Many-to-many modeling strategies (directional embedding, bidirectional, reference-only) |
| skills/mongodb-schema-design/references/relationship-one-to-few.md | Embedding bounded arrays for 1:few relationships with validation limits |
| skills/mongodb-schema-design/references/relationship-one-to-many.md | Referencing pattern for 1:many with indexing and orphan checks |
| skills/mongodb-schema-design/references/relationship-one-to-one.md | Embedding one-to-one data to reduce round-trips and preserve atomicity |
| skills/mongodb-schema-design/references/relationship-one-to-squillions.md | “One-to-squillions” pattern: separate collection + summaries to avoid unbounded arrays |
| skills/mongodb-schema-design/references/relationship-tree-structures.md | Tree/hierarchy modeling patterns and comparisons (parent refs, ancestors, paths, nested sets) |
| skills/mongodb-schema-design/references/validation-action-levels.md | Guidance for choosing validationLevel/validationAction during migrations |
| skills/mongodb-schema-design/references/validation-json-schema.md | JSON Schema validation structures, nested/array/conditional validation examples |
| skills/mongodb-schema-design/references/validation-rollout-strategy.md | Staged validation rollout plan (warn → backfill → error) with downgrade considerations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
skills/mongodb-schema-design/references/relationship-one-to-many.md
Outdated
Show resolved
Hide resolved
skills/mongodb-schema-design/references/validation-json-schema.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Adds a new MongoDB Schema Design skill package with reference material covering schema fundamentals, relationship modeling patterns, design patterns, validation rollout guidance, and common anti-patterns.
Changes:
- Introduces
skills/mongodb-schema-design/SKILL.mdto define triggers/scope and provide a categorized index of rules. - Adds a comprehensive set of reference docs under
skills/mongodb-schema-design/references/covering patterns, anti-patterns, relationships, and validation. - Provides “verify with” sections and example commands/snippets throughout to help operationalize guidance.
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/mongodb-schema-design/SKILL.md | Defines the skill metadata, trigger phrases, and an index of all rule documents. |
| skills/mongodb-schema-design/references/antipattern-bloated-documents.md | Adds guidance for splitting hot/cold fields to reduce working set pressure. |
| skills/mongodb-schema-design/references/antipattern-excessive-lookups.md | Adds guidance to reduce frequent $lookup on hot paths via denormalization/extended refs. |
| skills/mongodb-schema-design/references/antipattern-massive-arrays.md | Adds guidance to cap/overflow large arrays even when bounded. |
| skills/mongodb-schema-design/references/antipattern-schema-drift.md | Adds guidance to prevent inconsistent document shapes via validation/versioning. |
| skills/mongodb-schema-design/references/antipattern-unbounded-arrays.md | Adds guidance to avoid unbounded arrays that risk 16MB and degrade updates/indexes. |
| skills/mongodb-schema-design/references/antipattern-unnecessary-collections.md | Adds guidance to avoid “collection-per-partition” designs; prefer indexes/bucketing/TS. |
| skills/mongodb-schema-design/references/antipattern-unnecessary-indexes.md | Adds guidance for auditing/hiding/dropping redundant or unused indexes. |
| skills/mongodb-schema-design/references/fundamental-16mb-awareness.md | Adds guidance on the 16MB BSON limit and mitigation strategies (ref, GridFS, monitoring). |
| skills/mongodb-schema-design/references/fundamental-data-together.md | Adds “accessed together stored together” framing and examples. |
| skills/mongodb-schema-design/references/fundamental-document-model.md | Adds “don’t 1:1 map SQL tables” guidance and migration framing. |
| skills/mongodb-schema-design/references/fundamental-embed-vs-reference.md | Adds a decision framework and examples for embedding vs referencing. |
| skills/mongodb-schema-design/references/fundamental-schema-validation.md | Adds baseline schema validation guidance and how to adopt on existing collections. |
| skills/mongodb-schema-design/references/pattern-approximation.md | Adds batching/approximate counter pattern to reduce hot write load. |
| skills/mongodb-schema-design/references/pattern-archive.md | Adds archive pattern options and operational guidance for moving cold/historical data. |
| skills/mongodb-schema-design/references/pattern-attribute.md | Adds attribute pattern for sparse/variable fields with a single multikey index. |
| skills/mongodb-schema-design/references/pattern-bucket.md | Adds bucket pattern guidance for grouping/pagination-style access. |
| skills/mongodb-schema-design/references/pattern-computed.md | Adds computed-value pattern guidance to avoid repeated expensive aggregations. |
| skills/mongodb-schema-design/references/pattern-document-versioning.md | Adds document version history pattern (revisions collection) distinct from schema versioning. |
| skills/mongodb-schema-design/references/pattern-extended-reference.md | Adds extended reference pattern for caching frequently-read referenced fields. |
| skills/mongodb-schema-design/references/pattern-outlier.md | Adds outlier pattern guidance to isolate exceptional large documents/arrays. |
| skills/mongodb-schema-design/references/pattern-polymorphic.md | Adds polymorphic single-collection discriminator pattern and indexing/validation considerations. |
| skills/mongodb-schema-design/references/pattern-schema-versioning.md | Adds schema versioning guidance for online migrations and backwards compatibility. |
| skills/mongodb-schema-design/references/pattern-subset.md | Adds subset (hot/cold split) pattern for cache efficiency. |
| skills/mongodb-schema-design/references/pattern-time-series-collections.md | Adds time series collection guidance (metaField, granularity, indexing, sharding notes). |
| skills/mongodb-schema-design/references/relationship-many-to-many.md | Adds many-to-many modeling strategies (embed directionally, bidirectional embed, reference-only). |
| skills/mongodb-schema-design/references/relationship-one-to-few.md | Adds one-to-few embedding guidance with enforcement strategies. |
| skills/mongodb-schema-design/references/relationship-one-to-many.md | Adds one-to-many referencing guidance with indexing and integrity checks. |
| skills/mongodb-schema-design/references/relationship-one-to-one.md | Adds one-to-one embedding guidance and when to avoid it. |
| skills/mongodb-schema-design/references/relationship-one-to-squillions.md | Adds “one-to-squillions” guidance to avoid unbounded arrays; keep summaries in parent. |
| skills/mongodb-schema-design/references/relationship-tree-structures.md | Adds multiple tree modeling patterns and comparison guidance. |
| skills/mongodb-schema-design/references/validation-action-levels.md | Adds guidance for choosing validationLevel/validationAction, including staged rollout. |
| skills/mongodb-schema-design/references/validation-json-schema.md | Adds JSON schema validation guidance with examples for nested docs/arrays/conditionals. |
| skills/mongodb-schema-design/references/validation-rollout-strategy.md | Adds a “warn → error” production rollout strategy including downgrade considerations. |
Comments suppressed due to low confidence (3)
skills/mongodb-schema-design/references/validation-json-schema.md:1
productsis referenced without being defined in this snippet, so the example as-written won’t run and may distract from the validation point. Consider either definingproducts(e.g., from a query) or rewriting the “Later in your application” section to avoid using an undeclared variable.
skills/mongodb-schema-design/references/validation-json-schema.md:1- This example mixes
bsonType(object) with JSON Schematype(string/number). Even if intentional for numeric widening, the mix is easy to misread as an inconsistency/error. Consider standardizing onbsonTypeeverywhere, and for numeric fields represent “any numeric BSON type” via an explicitbsonTypearray (or add a short note explaining whytypeis being used here).
skills/mongodb-schema-design/references/relationship-tree-structures.md:1 - Nearby examples use
async/awaitforfindOne(), but this snippet uses a synchronousfindOne()result immediately. To avoid confusion for readers using the Node driver (wherefindOne()is async), consider aligning these examples to a single style (either make this snippet explicitly “mongosh” style, or addawaitand mark it as driver code).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
skills/mongodb-schema-design/references/pattern-time-series-collections.md
Show resolved
Hide resolved
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new MongoDB Schema Design skill to the skills/ catalog, providing a structured set of schema design fundamentals, relationship patterns, design patterns, schema validation guidance, and anti-patterns to improve LLM guidance for MongoDB data modeling.
Changes:
- Introduces
skills/mongodb-schema-design/SKILL.mdto define the skill and index the rule/reference set. - Adds 33 reference documents covering fundamentals, relationship modeling, design patterns, schema validation, and common anti-patterns.
- Includes “Verify with” sections and operational guidance intended to help users validate recommendations against real workloads.
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/mongodb-schema-design/SKILL.md | Skill definition + categorized index of all reference rules |
| skills/mongodb-schema-design/references/antipattern-bloated-documents.md | Anti-pattern guidance for hot/cold splitting and cache efficiency |
| skills/mongodb-schema-design/references/antipattern-excessive-lookups.md | Anti-pattern guidance to reduce repeated $lookup on hot paths |
| skills/mongodb-schema-design/references/antipattern-massive-arrays.md | Anti-pattern guidance for overly large (even if bounded) arrays |
| skills/mongodb-schema-design/references/antipattern-schema-drift.md | Anti-pattern guidance to prevent schema drift via validation/versioning |
| skills/mongodb-schema-design/references/antipattern-unbounded-arrays.md | Anti-pattern guidance to avoid unbounded arrays and 16MB risk |
| skills/mongodb-schema-design/references/antipattern-unnecessary-collections.md | Anti-pattern guidance to avoid over-partitioning into many collections |
| skills/mongodb-schema-design/references/antipattern-unnecessary-indexes.md | Anti-pattern guidance for auditing/removing redundant/unused indexes |
| skills/mongodb-schema-design/references/fundamental-16mb-awareness.md | Fundamental constraints and mitigation patterns for the 16MB limit |
| skills/mongodb-schema-design/references/fundamental-data-together.md | Core principle: store co-accessed data together; query-shaped docs |
| skills/mongodb-schema-design/references/fundamental-document-model.md | Fundamental shift away from SQL table-mapping toward aggregates |
| skills/mongodb-schema-design/references/fundamental-embed-vs-reference.md | Embed vs reference decision framework with examples/verification |
| skills/mongodb-schema-design/references/fundamental-schema-validation.md | Baseline guidance for using MongoDB schema validation |
| skills/mongodb-schema-design/references/pattern-approximation.md | Pattern for batching high-frequency counters (approximate values) |
| skills/mongodb-schema-design/references/pattern-archive.md | Pattern for moving historical data out of hot collections |
| skills/mongodb-schema-design/references/pattern-attribute.md | Pattern for sparse/variable attributes using {k,v} arrays |
| skills/mongodb-schema-design/references/pattern-bucket.md | Pattern for bucketing series data into bounded “page-like” docs |
| skills/mongodb-schema-design/references/pattern-computed.md | Pattern for precomputing expensive aggregations for fast reads |
| skills/mongodb-schema-design/references/pattern-document-versioning.md | Pattern for storing full historical snapshots in a revisions collection |
| skills/mongodb-schema-design/references/pattern-extended-reference.md | Pattern for denormalizing display fields to reduce $lookup |
| skills/mongodb-schema-design/references/pattern-outlier.md | Pattern for isolating exceptional large documents/arrays into overflow |
| skills/mongodb-schema-design/references/pattern-polymorphic.md | Pattern for heterogeneous docs with a discriminator and index strategies |
| skills/mongodb-schema-design/references/pattern-schema-versioning.md | Pattern for evolving schemas safely with schemaVersion |
| skills/mongodb-schema-design/references/pattern-subset.md | Pattern for separating hot vs cold fields into different collections |
| skills/mongodb-schema-design/references/pattern-time-series-collections.md | Pattern guidance for MongoDB time series collections |
| skills/mongodb-schema-design/references/relationship-many-to-many.md | Modeling options for many-to-many with primary query direction |
| skills/mongodb-schema-design/references/relationship-one-to-few.md | Modeling 1-to-few with embedded arrays + bounding strategies |
| skills/mongodb-schema-design/references/relationship-one-to-many.md | Modeling 1-to-many with references + indexing/verification |
| skills/mongodb-schema-design/references/relationship-one-to-one.md | Modeling 1-to-1 via embedding when co-accessed |
| skills/mongodb-schema-design/references/relationship-one-to-squillions.md | Modeling extreme fan-out (“squillions”) with refs + summaries |
| skills/mongodb-schema-design/references/relationship-tree-structures.md | Modeling hierarchical/tree data with multiple supported patterns |
| skills/mongodb-schema-design/references/validation-action-levels.md | Guidance on validationLevel/validationAction selection and rollout |
| skills/mongodb-schema-design/references/validation-json-schema.md | How to define validators with JSON Schema and test/verify them |
| skills/mongodb-schema-design/references/validation-rollout-strategy.md | Safe staged rollout strategy for validation (warn → error) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { $group: { | ||
| _id: null, | ||
| avgTotal: { $avg: "$total" }, | ||
| avgImages: { $avg: "$imagesSize" }, |
There was a problem hiding this comment.
if we are worried about this, I'd wonder if $max would also be valuable - either $max of $bsonSize or $max of $size of array. But LLM cannot run this (can it?) - do we provide this so that it can suggest this as a verification method to the user?
| ## Verify with | ||
|
|
||
| ```javascript | ||
| // Find pipelines with multiple $lookup stages |
There was a problem hiding this comment.
wouldn't it be better to add an $or filter for specifically agg with $lookup? I know it's a little ugly given our command format but it would literally find all $lookup pipelines, rather than just the ones that exceed an arbitrary threshold...
| { $indexStats: {} } | ||
| ]) | ||
| // Look for "productId_1" with high ops - good | ||
| // Missing index = every $lookup is a collection scan |
There was a problem hiding this comment.
couldn't it be a compound index with productId prefix? or in cases of multiple join conditions, productId wouldn't even need to be first..
| { _id: 3, firstName: "Carol", lastName: "Smith", email: "carol@ex.com" } | ||
|
|
||
| // Version 4 (2023) - email is now array | ||
| { _id: 4, firstName: "Dave", lastName: "Jones", emails: ["dave@ex.com", "d@work.com"] } |
There was a problem hiding this comment.
the issue isn't that it's an array (since equality matches scalar and array containing the same) it's that the name of the field now changed to emails from email
| } | ||
|
|
||
| // Queries fail silently or unintentionally return all documents | ||
| db.users.find({ email: "test@ex.com" }) // Misses users with emails[] array |
There was a problem hiding this comment.
I'd say with name of field email
| bsonType: "object", | ||
| required: ["email", "profile"], | ||
| properties: { | ||
| email: { |
There was a problem hiding this comment.
why wouldn't we allow an array (with the correct name) since some people could have multiple email addresses?
| } | ||
|
|
||
| // Optional heavy check for maintenance windows: | ||
| // validate can be slow and can take an exclusive lock on the collection. |
There was a problem hiding this comment.
this looks for corruption, is this related to validator and schema? I'm concerned this doesn't include explanation what it's useful for.
| const validator = info?.options?.validator | ||
| db.users.find({ $nor: [validator] }) | ||
|
|
||
| // Optional heavyweight check (slow, can block due to a collection lock): |
There was a problem hiding this comment.
again, not sure why this is here - also I'd prefer we don't add to confusion about blocking and locking.
| | Large child documents | User → Orders | Orders have line items, addresses | | ||
| | Independent queries | Department → Employees | Query employees directly | | ||
| | Different lifecycles | Author → Articles | Archive articles separately | | ||
| | Frequent child updates | Post → Comments | Adding comments shouldn't lock post | |
There was a problem hiding this comment.
we don't lock on updates.
| } | ||
| } | ||
| }, | ||
| validationLevel: "moderate" // Don't block existing invalid docs |
There was a problem hiding this comment.
the word "block" is ambiguous - can we say instead "error on" (everywhere)?
MongoDB Agent Skill Submission
Skill Information
Skill Name: MongoDB Schema Design
Skill Directory:
skills/mongodb-schema-designUse Case
We want to be able to give our users accurate and helpful schema modeling guidance. This can be in the context of general performance improvement investigations or when initially designing schemas for example.
Value Proposition
We know existing LLMs do not perform as well as they could when presented with questions around MongoDB schema design, and we do want to remedy that gap.
Special Considerations
None (yet). We will likely want to have more MCP server integration in the future, esp. around gathering workload information.
Validation Prompts
Author Self-Validation
skill-validatorlocallySME Review
SME: @asya999, @johnlpage
Additional Context
The main gaps currently are: