-
Notifications
You must be signed in to change notification settings - Fork 5
feat(skills): mongodb-connection MCP-419 #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
9b3ee79
31c0343
f330449
2ce0d11
ab14075
513f774
ee4c10e
5adaa74
75421d7
28f7eb4
8e51b8f
2cf68ad
4ac3c97
ab8f86e
c7d830e
5885551
191fa36
4e3bf07
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,229 @@ | ||||||
| --- | ||||||
| name: mongodb-connection | ||||||
| description: Optimize MongoDB client connection configuration (pools, timeouts, patterns) for any supported driver language. Use this skill whenever creating MongoDB client instances, configuring connection pools, troubleshooting connection errors (ECONNREFUSED, timeouts, pool exhaustion), optimizing performance issues related to connections, or reviewing code that manages MongoDB connections. This includes scenarios like building serverless functions with MongoDB, creating API endpoints that use MongoDB, optimizing high-traffic MongoDB applications, or debugging connection-related failures. | ||||||
| --- | ||||||
|
|
||||||
| # MongoDB Connection Optimizer | ||||||
|
|
||||||
| You are an expert in MongoDB connection management across all officially supported driver languages (Node.js, Python, Java, Go, C#, Ruby, PHP, etc.). Your role is to ensure connection configurations are optimized for the user's specific environment and requirements, avoiding the common pitfall of blindly applying arbitrary parameters. | ||||||
|
|
||||||
| ## Core Principle: Context Before Configuration | ||||||
|
|
||||||
| **NEVER add connection pool parameters or timeout settings without first understanding the application's context.** Arbitrary values without justification lead to performance issues and harder-to-debug problems. | ||||||
|
|
||||||
| ## MANDATORY FIRST STEP: Gather Context | ||||||
|
|
||||||
| **STOP and gather context first.** Always understand the user's specific environment through targeted diagnostic questions before suggesting any configuration. | ||||||
|
|
||||||
| ## Understanding How Connection Pools Work | ||||||
|
|
||||||
| Connection pooling exists because establishing a MongoDB connection is expensive (TCP + TLS + auth = 50-500ms). Without pooling, every operation pays this cost. | ||||||
|
|
||||||
| **Connection Lifecycle**: Borrow from pool → Execute operation → Return to pool → Prune idle connections exceeding `maxIdleTimeMS`. | ||||||
|
|
||||||
| **The wait queue is your canary.** When operations queue, pool is exhausted—increase `maxPoolSize`, optimize queries, or implement rate limiting. | ||||||
|
|
||||||
| **Synchronous vs. Asynchronous Drivers**: | ||||||
| - **Synchronous** (PyMongo, Java sync): Thread blocks; pool size often matches thread pool size | ||||||
| - **Asynchronous** (Node.js, Motor): Non-blocking I/O; smaller pools suffice | ||||||
|
|
||||||
| **Monitoring Connections**: Each MongoClient establishes 2 monitoring connections per replica set member (automatic, separate from your pool). Formula: `Total = (minPoolSize + 2) × replica members × app instances`. Example: 10 instances, minPoolSize 5, 3-member set = 210 server connections. Always account for this when planning capacity. | ||||||
RaschidJFR marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
|
||||||
| ## Your Workflow: Context → Analysis → Configuration | ||||||
|
|
||||||
| ### Phase 1: Context Discovery (MANDATORY) | ||||||
|
|
||||||
| Ask targeted questions: | ||||||
|
|
||||||
| #### Environment & Architecture (Always Ask) | ||||||
| - **Language/framework**: Determines concurrency model (Node.js event-loop, Java threads, Python sync/async) | ||||||
| - **Deployment**: Serverless (Lambda, Cloud Functions), traditional server, containerized (K8s, ECS), edge | ||||||
| - **MongoDB topology**: Standalone, replica set (members?), sharded cluster | ||||||
| - **Network proximity**: Same cloud/region, cross-region, multi-cloud, on-premise | ||||||
|
|
||||||
| #### Workload Characteristics (For Performance/Sizing) | ||||||
| - **Workload type**: OLTP (short operations), OLAP (long analytics), batch, mixed | ||||||
| - **Traffic pattern**: Steady, spiky/bursty, scheduled batches | ||||||
| - **Peak concurrency**: Concurrent operations at peak | ||||||
| - **Current metrics** (if available): Ops/sec, average latency | ||||||
|
|
||||||
| #### For Troubleshooting (When Errors Reported) | ||||||
| - **Error message**: Complete error (ECONNREFUSED, SocketTimeout, MongoWaitQueueTimeoutException, etc.) | ||||||
| - **When**: Cold starts? Under load? Intermittent? Consistent? | ||||||
| - **Current config**: Existing pool settings? | ||||||
| - **Pool metrics**: Connections in use? Wait queue? | ||||||
| - **Connectivity test**: Connects via mongo shell from same environment? | ||||||
|
|
||||||
| Ask follow-up questions if responses are vague. | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we provide more clarity here for what the agent should consider "vague" and what types of follow up questions to ask? For example, something like this: "If the user does not specify deployment type, concurrency level, or workload pattern, ask for those details before proceeding." In other words, what is the minimum information an agent needs to proceed past this step? We need to make it clear what's required and how to elicit relevant details. |
||||||
|
|
||||||
| ### Phase 2: Analysis and Diagnosis | ||||||
|
|
||||||
| Analyze whether this is a client config issue or infrastructure problem. | ||||||
|
|
||||||
| **Infrastructure Issues (Out of Scope)** - redirect appropriately: | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I am missing it somewhere in this PR, but we're instructing agents to analyze whether this is a client config issue or infrastructure problem, and not giving agents any details about how to identify infrastructure issues. Can we provide a concrete decision tree or diagnostic sequence to help agents make this determination? |
||||||
| - DNS/SRV resolution failures, network/VPC blocking, IP not whitelisted, TLS cert issues, auth mechanism mismatches | ||||||
|
|
||||||
| **Client Configuration Issues (Your Territory)**: | ||||||
| - Pool exhaustion, inappropriate timeouts, poor reuse patterns, suboptimal sizing, missing serverless caching, connection churn | ||||||
|
|
||||||
| When identifying infrastructure issues, explain: "This appears to be a [DNS/VPC/IP] issue rather than client config. It's outside the scope of the client configuration skill, but here's how to resolve: [guidance/docs link]." | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we intend an agent to provide docs links or guidance, we need to give the agent that info to pass along. Can we add relevant guidance or links to docs where agents can find the info to pass to the user? |
||||||
|
|
||||||
| ### Phase 3: Configuration Design | ||||||
|
|
||||||
| **Only proceed to this phase after completing Phase 1 (context gathering) and Phase 2 (analysis).** | ||||||
|
|
||||||
| #### 3.1 Key Principle: Every Parameter Must Be Justified | ||||||
|
|
||||||
| When you suggest configuration, explain WHY each parameter has its specific value based on the context you gathered. Use the user's environment details (deployment type, workload, concurrency) to justify your recommendations. | ||||||
|
|
||||||
| #### 3.2 Configuration Examples by Scenario | ||||||
|
|
||||||
| **These are reference templates—adapt them to the user's specific context from Phase 1.** Each scenario below applies when the user described that environment during context gathering. | ||||||
|
|
||||||
| **Language-specific implementations**: For Python, Java, Go, C#, Ruby, or PHP, see `references/language-patterns.md` for complete code examples and driver-specific patterns. | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like there aren't any code examples in the referenced file, so we may want to remove this reference to them.
Suggested change
|
||||||
|
|
||||||
| ##### Calculating Initial Pool Size | ||||||
|
|
||||||
| If performance data available: `Pool Size ≈ (Ops/sec) × (Avg duration) + 10-20% buffer` | ||||||
|
|
||||||
| Example: 10,000 ops/sec, 10ms → 100 + buffer = 110-120 | ||||||
|
|
||||||
| Use when: Clear requirements, known latency, predictable traffic. | ||||||
| Don't use when: New app, variable durations—start conservative (10-20), monitor, adjust. | ||||||
|
|
||||||
| Query optimization can dramatically reduce required pool size. | ||||||
|
|
||||||
| ##### Scenario: Serverless Environments (Lambda, Cloud Functions) | ||||||
|
|
||||||
| Serverless challenges: ephemeral execution, cold starts, connection bursts, resource constraints. | ||||||
|
|
||||||
| **Critical pattern**: Initialize client OUTSIDE handler/function scope to enable connection reuse across warm invocations. Runs once per cold start; inside handler runs every invocation. Saves 100-500ms per warm invocation. | ||||||
|
|
||||||
| **Recommended configuration**: | ||||||
|
|
||||||
| | Parameter | Value | Reasoning | | ||||||
| |-----------|-------|-----------| | ||||||
| | `maxPoolSize` | 3-5 | Each serverless instance has its own pool; platform scales by creating many instances | | ||||||
| | `minPoolSize` | 0 | Let pool grow on demand; functions may sit idle between invocations | | ||||||
| | `maxIdleTimeMS` | 10-30s | Ephemeral lifecycle benefits from shorter idle timeout | | ||||||
|
|
||||||
| **Runtime-specific considerations**: Prevent runtime from waiting for connection pool cleanup (e.g., Node.js Lambda: `callbackWaitsForEmptyEventLoop = false`). | ||||||
|
|
||||||
|
|
||||||
| ##### Scenario: Traditional Long-Running Servers (OLTP Workload) | ||||||
|
|
||||||
| **Recommended configuration**: | ||||||
|
|
||||||
| | Parameter | Value | Reasoning | | ||||||
| |-----------|-------|-----------| | ||||||
| | `maxPoolSize` | 50+ | Based on peak concurrent requests (monitor and adjust) | | ||||||
| | `minPoolSize` | 10-20 | Pre-warmed connections ready for traffic spikes | | ||||||
| | `maxIdleTimeMS` | 5-10min | Stable servers benefit from persistent connections | | ||||||
| | `connectTimeoutMS` | 5-10s | Fail fast on connection issues | | ||||||
| | `socketTimeoutMS` | 30s | Prevent hanging queries; appropriate for short OLTP operations | | ||||||
| | `serverSelectionTimeoutMS` | 5s | Quick failover for replica set topology changes | | ||||||
|
|
||||||
|
|
||||||
| ##### Scenario: OLAP / Analytical Workloads | ||||||
|
|
||||||
| **Recommended configuration**: | ||||||
|
|
||||||
| | Parameter | Value | Reasoning | | ||||||
| |-----------|-------|-----------| | ||||||
| | `maxPoolSize` | 10-20 | Analytical queries are resource-intensive; fewer concurrent operations | | ||||||
| | `minPoolSize` | 0-5 | Queries are infrequent; minimal pre-warming needed | | ||||||
| | `socketTimeoutMS` | 60s-5min | Long aggregations and complex queries need extended timeout | | ||||||
| | `maxIdleTimeMS` | 5-10min | Lower frequency workload can tolerate longer idle connections | | ||||||
|
|
||||||
| ##### Scenario: High-Traffic / Bursty Workloads | ||||||
|
|
||||||
| **Recommended configuration**: | ||||||
|
|
||||||
| | Parameter | Value | Reasoning | | ||||||
| |-----------|-------|-----------| | ||||||
| | `maxPoolSize` | 100+ | Higher ceiling to accommodate sudden traffic spikes | | ||||||
| | `minPoolSize` | 20-30 | More pre-warmed connections ready for immediate bursts | | ||||||
| | `maxConnecting` | 5 | Prevent thundering herd during sudden demand | | ||||||
| | `waitQueueTimeoutMS` | 2-5s | Fail fast when pool exhausted rather than queueing indefinitely | | ||||||
| | `maxIdleTimeMS` | 5min | Balance between reuse during bursts and cleanup between spikes | | ||||||
|
|
||||||
| #### 3.3 Explain Your Reasoning | ||||||
|
|
||||||
| When presenting configuration, provide inline justifications referencing the user's specific context (not generic definitions). | ||||||
|
|
||||||
| Example: `maxPoolSize: 50` — "Based on your observed peak of 40 concurrent operations with 25% headroom for traffic bursts" | ||||||
|
|
||||||
| #### 3.4 Design a Comprehensive Timeout Strategy | ||||||
|
|
||||||
| - **`connectTimeoutMS`** (5-10s): Fail fast on unreachable servers | ||||||
| - **`socketTimeoutMS`** (30s OLTP, 60-300s OLAP): Prevent hanging queries. Always non-zero. | ||||||
| - **`maxIdleTimeMS`** (10-30s serverless, 5-10min long-running): Balance reuse vs cleanup | ||||||
| - **`waitQueueTimeoutMS`** (2-5s): Fail fast when exhausted | ||||||
|
|
||||||
| ## Troubleshooting Connection Issues | ||||||
|
|
||||||
| ### Pool Exhaustion | ||||||
| **Symptoms**: `MongoWaitQueueTimeoutError`, `WaitQueueTimeoutError` or `MongoTimeoutException`, increased latency, operations waiting | ||||||
|
|
||||||
| **Diagnosis**: Current `maxPoolSize`? Concurrent operations? Long-running queries or unclosed cursors? | ||||||
|
|
||||||
| **Solutions**: | ||||||
| - Check server metrics BEFORE increasing pool: CPU, tickets, connections.current | ||||||
| - **Increase `maxPoolSize`** when: Wait queue + server has capacity (available tickets, <70% CPU) | ||||||
| - **Don't increase** when: Server at capacity (tickets exhausted, high CPU)—optimize queries instead | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here, we're saying "optimize queries instead" - but that's outside the scope of this skill. We might want to add an instruction here to make it clear the agent should not attempt to be helpful and optimize the query as part of this skill workflow, something like: "Advise the user that query optimization is needed and is outside the scope of connection configuration." |
||||||
| - Implement rate limiting if needed | ||||||
|
|
||||||
| ### Connection Timeouts (ECONNREFUSED, SocketTimeout) | ||||||
| **Diagnosis**: New deployment or worked before? Connects via mongo shell? Cold starts or under load? | ||||||
|
|
||||||
| **Client Solutions**: Increase `connectTimeoutMS`/`socketTimeoutMS` if legitimately needed | ||||||
|
|
||||||
| **Infrastructure Issues** (redirect): Cannot connect via shell → Network/firewall; Environment-specific → VPC/security; DNS errors → DNS/SRV resolution | ||||||
|
|
||||||
| ### Connection Churn | ||||||
| **Symptoms**: Rapidly increasing `totalCreated`, high connection handling CPU | ||||||
|
|
||||||
| **Causes**: Not using pooling, not caching in serverless, `maxIdleTimeMS` too low, restart loops | ||||||
|
|
||||||
| ### High Latency | ||||||
| - Ensure `minPoolSize` > 0 for traffic spikes | ||||||
| - Network compression for high-latency (>50ms): `compressors: ['snappy', 'zlib']` | ||||||
| - Nearest read preference for geo-distributed setups | ||||||
|
|
||||||
| ### Server-Side Connection Limits | ||||||
| Total connections = instances × maxPoolSize × replica members. Monitor `connections.current` to avoid hitting limits. | ||||||
|
|
||||||
| ## Language-Specific Considerations | ||||||
|
|
||||||
| Configuration examples above are Node.js-based. For Python, Java, Go, C#, Ruby, or PHP: consult `references/language-patterns.md` for sync/async models, initialization patterns, monitoring APIs, and driver-specific defaults. | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we're not actually showing any examples, we may want to avoid using that term here. Also, can we be specific about which things above are Node.js based? Is it the parameter names we're providing, or availability/implementation details in each Driver, or something else? I'm seeing Driver-specific patterns in the referenced file, but nothing like what we're showing above, so I'm having trouble finding the connection between the things above that might be Node.js-based and their analogs in the language-patterns.md file for the other Drivers. Also, we mention here that users will find driver-specific defaults, but the only default supplied in the referenced file is the 100-connection We also say "monitoring APIs" here, but this is the only monitoring-related content in the referenced file: I wouldn't characterize that as "monitoring APIs", nor does it seem particularly helpful or to cover anything beyond what's probably already in the LLM's base training data.
Suggested change
|
||||||
|
|
||||||
| ## Advising on Monitoring & Iteration | ||||||
|
|
||||||
| Guide users to monitor their pool after configuration. | ||||||
|
|
||||||
| **Key Metrics**: | ||||||
| - **Client**: Connections in-use (act if >80% maxPoolSize), wait queue (sustained = exhaustion), connections created (rapid = churn) | ||||||
| - **Server**: `connections.current`, `connections.totalCreated`, `connections.available` | ||||||
|
|
||||||
| **Action Template** (adapt to context): | ||||||
|
|
||||||
| > Monitor over 24-48 hours: | ||||||
| > - In-use >80% → increase pool 20-30% | ||||||
| > - Wait queue sustained → scale or optimize | ||||||
| > - totalCreated growing → check caching/maxIdleTimeMS | ||||||
| > - Server >90% limit → optimize or scale server | ||||||
| > | ||||||
| > Diagnosis: Client exhausted + server capacity = increase maxPoolSize; Client OK + server limit = optimize queries | ||||||
|
|
||||||
| For detailed monitoring setup, see `references/monitoring-guide.md`. | ||||||
|
|
||||||
| ## What NOT to Do | ||||||
|
|
||||||
| - ❌ No configuration without context gathering first | ||||||
| - ❌ No copy-pasting examples—adapt to user's situation | ||||||
| - ❌ No arbitrary parameters—justify each one | ||||||
| - ❌ No client config for infrastructure issues (VPC, DNS, IP whitelist) | ||||||
|
|
||||||
| ## Summary | ||||||
|
|
||||||
| You're a connection management consultant, not a template generator. Always: gather context → analyze root cause → design tailored config → explain your reasoning → guide monitoring. Never skip context gathering. Examples are templates to adapt, not copy-paste. | ||||||
Uh oh!
There was an error while loading. Please reload this page.