Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 20 additions & 20 deletions mongodb-atlas-mixin/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,18 @@ The MongoDB Atlas mixin contains the following dashboards:

and the following alerts:

- MongoDBAtlasHighNumberOfSlowNetworkRequests
- MongoDBAtlasCollExclusiveDeadlocks
- MongoDBAtlasCollIntentExclDeadlocks
- MongoDBAtlasCollSharedDeadlocks
- MongoDBAtlasCollIntentSharedDeadlocks
- MongoDBAtlasDBExclusiveDeadlocks
- MongoDBAtlasDBIntentExclDeadlocks
- MongoDBAtlasDBSharedDeadlocks
- MongoDBAtlasDBIntentSharedDeadlocks
- MongoDBAtlasSlowNetworkRequests
- MongoDBAtlasDiskSpaceLow
- MongoDBAtlasSlowHardwareIO
- MongoDBAtlasHighNumberOfTimeoutElections
- MongoDBAtlasHighNumberOfCollectionExclusiveDeadlocks
- MongoDBAtlasHighNumberOfCollectionIntentExclusiveDeadlocks
- MongoDBAtlasHighNumberOfCollectionSharedDeadlocks
- MongoDBAtlasHighNumberOfCollectionIntentSharedDeadlocks
- MongoDBAtlasHighNumberOfDatabaseExclusiveDeadlocks
- MongoDBAtlasHighNumberOfDatabaseIntentExclusiveDeadlocks
- MongoDBAtlasHighNumberOfDatabaseSharedDeadlocks
- MongoDBAtlasHighNumberOfDatabaseIntentSharedDeadlocks
- MongoDBAtlasElectionTimeouts

**Please note:**
- Some metrics may be reset if the MongoDB Atlas cluster is ever reset.
Expand Down Expand Up @@ -78,18 +78,18 @@ This mixin includes the MongoDB Atlas sharding overview dashboard, however the m

## Alerts overview

- MongoDBAtlasHighNumberOfSlowNetworkRequests: There is a high number of slow network requests.
- MongoDBAtlasCollExclusiveDeadlocks: There is a high number of collection exclusive deadlocks occurring.
- MongoDBAtlasCollIntentExclDeadlocks: There is a high number of collection intent-exclusive deadlocks occurring.
- MongoDBAtlasCollSharedDeadlocks: There is a high number of collection shared deadlocks occurring.
- MongoDBAtlasCollIntentSharedDeadlocks: There is a high number of collection intent-shared deadlocks occurring.
- MongoDBAtlasDBExclusiveDeadlocks: There is a high number of database exclusive deadlocks occurring.
- MongoDBAtlasDBIntentExclDeadlocks: There is a high number of database intent-exclusive deadlocks occurring.
- MongoDBAtlasDBSharedDeadlocks: There is a high number of database shared deadlocks occurring.
- MongoDBAtlasDBIntentSharedDeadlocks: There is a high number of database intent-shared deadlocks occurring.
- MongoDBAtlasSlowNetworkRequests: There is a high number of slow network requests.
- MongoDBAtlasDiskSpaceLow: Hardware is running out of disk space.
- MongoDBAtlasSlowHardwareIO: Read and write I/Os are taking too long to complete.
- MongoDBAtlasHighNumberOfTimeoutElections: There is a high number of elections being called due to the primary node timing out.
- MongoDBAtlasHighNumberOfCollectionExclusiveDeadlocks: There is a high number of collection exclusive-lock deadlocks.
- MongoDBAtlasHighNumberOfCollectionIntentExclusiveDeadlocks: There is a high number of collection intent-exclusive-lock deadlocks.
- MongoDBAtlasHighNumberOfCollectionSharedDeadlocks: There is a high number of collection shared-lock deadlocks.
- MongoDBAtlasHighNumberOfCollectionIntentSharedDeadlocks: There is a high number of collection intent-shared-lock deadlocks.
- MongoDBAtlasHighNumberOfDatabaseExclusiveDeadlocks: There is a high number of database exclusive-lock deadlocks.
- MongoDBAtlasHighNumberOfDatabaseIntentExclusiveDeadlocks: There is a high number of database intent-exclusive-lock deadlocks.
- MongoDBAtlasHighNumberOfDatabaseSharedDeadlocks: There is a high number of database shared-lock deadlocks.
- MongoDBAtlasHighNumberOfDatabaseIntentSharedDeadlocks: There is a high number of database intent-shared-lock deadlocks.
- MongoDBAtlasElectionTimeouts: There is a high number of elections being called due to the primary node timing out.

Default thresholds can be configured in `config.libsonnet`.
```js
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
{
prometheusAlerts+:: {
groups+: [
new(this): {
groups: [
{
name: 'mongodb-atlas-alerts',
name: this.config.uid + '-alerts',
rules: [
{
alert: 'MongoDBAtlasHighNumberOfCollectionExclusiveDeadlocks',
alert: 'MongoDBAtlasCollExclusiveDeadlocks',
expr: |||
sum without(cl_role,process_port,rs_nm,rs_state) (increase(mongodb_locks_Collection_deadlockCount_W[5m])) > %(alertsDeadlocks)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -18,14 +18,14 @@
description:
(
'The number of collection exclusive-lock deadlocks occurring on node {{$labels.instance}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} which is above the threshold of %(alertsDeadlocks)s.'
) % $._config,
) % this.config,
},
},
{
alert: 'MongoDBAtlasHighNumberOfCollectionIntentExclusiveDeadlocks',
alert: 'MongoDBAtlasCollIntentExclDeadlocks',
expr: |||
sum without(cl_role,process_port,rs_nm,rs_state) (increase(mongodb_locks_Collection_deadlockCount_w[5m])) > %(alertsDeadlocks)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -35,14 +35,14 @@
description:
(
'The number of collection intent-exclusive-lock deadlocks occurring on node {{$labels.instance}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} which is above the threshold of %(alertsDeadlocks)s.'
) % $._config,
) % this.config,
},
},
{
alert: 'MongoDBAtlasHighNumberOfCollectionSharedDeadlocks',
alert: 'MongoDBAtlasCollSharedDeadlocks',
expr: |||
sum without(cl_role,process_port,rs_nm,rs_state) (increase(mongodb_locks_Collection_deadlockCount_R[5m])) > %(alertsDeadlocks)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -52,14 +52,14 @@
description:
(
'The number of collection shared-lock deadlocks occurring on node {{$labels.instance}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} which is above the threshold of %(alertsDeadlocks)s.'
) % $._config,
) % this.config,
},
},
{
alert: 'MongoDBAtlasHighNumberOfCollectionIntentSharedDeadlocks',
alert: 'MongoDBAtlasCollIntentSharedDeadlocks',
expr: |||
sum without(cl_role,process_port,rs_nm,rs_state) (increase(mongodb_locks_Collection_deadlockCount_r[5m])) > %(alertsDeadlocks)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -69,14 +69,14 @@
description:
(
'The number of collection intent-shared-lock deadlocks occurring on node {{$labels.instance}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} which is above the threshold of %(alertsDeadlocks)s.'
) % $._config,
) % this.config,
},
},
{
alert: 'MongoDBAtlasHighNumberOfDatabaseExclusiveDeadlocks',
alert: 'MongoDBAtlasDBExclusiveDeadlocks',
expr: |||
sum without(cl_role,process_port,rs_nm,rs_state) (increase(mongodb_locks_Database_deadlockCount_W[5m])) > %(alertsDeadlocks)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -86,14 +86,14 @@
description:
(
'The number of database exclusive-lock deadlocks occurring on node {{$labels.instance}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} which is above the threshold of %(alertsDeadlocks)s.'
) % $._config,
) % this.config,
},
},
{
alert: 'MongoDBAtlasHighNumberOfDatabaseIntentExclusiveDeadlocks',
alert: 'MongoDBAtlasDBIntentExclDeadlocks',
expr: |||
sum without(cl_role,process_port,rs_nm,rs_state) (increase(mongodb_locks_Database_deadlockCount_w[5m])) > %(alertsDeadlocks)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -103,14 +103,14 @@
description:
(
'The number of database intent-exclusive-lock deadlocks occurring on node {{$labels.instance}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} which is above the threshold of %(alertsDeadlocks)s.'
) % $._config,
) % this.config,
},
},
{
alert: 'MongoDBAtlasHighNumberOfDatabaseSharedDeadlocks',
alert: 'MongoDBAtlasDBSharedDeadlocks',
expr: |||
sum without(cl_role,process_port,rs_nm,rs_state) (increase(mongodb_locks_Database_deadlockCount_R[5m])) > %(alertsDeadlocks)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -120,14 +120,14 @@
description:
(
'The number of database shared-lock deadlocks occurring on node {{$labels.instance}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} which is above the threshold of %(alertsDeadlocks)s.'
) % $._config,
) % this.config,
},
},
{
alert: 'MongoDBAtlasHighNumberOfDatabaseIntentSharedDeadlocks',
alert: 'MongoDBAtlasDBIntentSharedDeadlocks',
expr: |||
sum without(cl_role,process_port,rs_nm,rs_state) (increase(mongodb_locks_Database_deadlockCount_r[5m])) > %(alertsDeadlocks)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -137,14 +137,14 @@
description:
(
'The number of database intent-shared-lock deadlocks occurring on node {{$labels.instance}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} which is above the threshold of %(alertsDeadlocks)s.'
) % $._config,
) % this.config,
},
},
{
alert: 'MongoDBAtlasHighNumberOfSlowNetworkRequests',
alert: 'MongoDBAtlasSlowNetworkRequests',
expr: |||
sum without (cl_role,rs_nm,rs_state,process_port) (increase(mongodb_network_numSlowSSLOperations[5m])) + sum without (cl_role,rs_nm,rs_state,process_port) (increase(mongodb_network_numSlowDNSOperations[5m])) > %(alertsSlowNetworkRequests)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -154,14 +154,14 @@
description:
(
'The number of DNS and SSL operations taking more than 1 second to complete on node {{$labels.instance}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} which is above the threshold of %(alertsSlowNetworkRequests)s.'
) % $._config,
) % this.config,
},
},
{
alert: 'MongoDBAtlasDiskSpaceLow',
expr: |||
100 * ((sum without (disk_name) (hardware_disk_metrics_disk_space_used_bytes)) / clamp_min((sum without (disk_name) (hardware_disk_metrics_disk_space_used_bytes)) + (sum without (disk_name) (hardware_disk_metrics_disk_space_free_bytes)), 1)) > %(alertsHighDiskUsage)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -171,14 +171,14 @@
description:
(
'The amount of hardware disk space being used on node {{$labels.instance}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}}%% which is above the threshold of %(alertsHighDiskUsage)s%%.'
) % $._config,
) % this.config,
},
},
{
alert: 'MongoDBAtlasSlowHardwareIO',
expr: |||
(sum without (disk_name) (increase(hardware_disk_metrics_read_time_milliseconds[5m])) + sum without (disk_name) (increase(hardware_disk_metrics_write_time_milliseconds[5m]))) / 1000 > %(alertsSlowHardwareIO)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -188,14 +188,14 @@
description:
(
'The latency time for read and write I/Os on node {{$labels.instance}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} seconds which is above the threshold of %(alertsSlowHardwareIO)s seconds.'
) % $._config,
) % this.config,
},
},
{
alert: 'MongoDBAtlasHighNumberOfTimeoutElections',
alert: 'MongoDBAtlasElectionTimeouts',
expr: |||
sum without (cl_role,process_port,instance,rs_state) (increase(mongodb_electionMetrics_electionTimeout_called[5m])) > %(alertsHighTimeoutElections)s
||| % $._config,
||| % this.config,
'for': '5m',
labels: {
severity: 'warning',
Expand All @@ -204,8 +204,8 @@
summary: 'There is a high number of elections being called due to the primary node timing out.',
description:
(
'The number of elections being called due to the primary node timing out in replica set {{$labels.rs_m}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} which is above the threshold of %(alertsHighTimeoutElections)s.'
) % $._config,
'The number of elections being called due to the primary node timing out in replica set {{$labels.rs_nm}} in cluster {{$labels.cl_name}} is {{printf "%%.0f" $value}} which is above the threshold of %(alertsHighTimeoutElections)s.'
) % this.config,
},
},
],
Expand Down
53 changes: 40 additions & 13 deletions mongodb-atlas-mixin/config.libsonnet
Original file line number Diff line number Diff line change
@@ -1,18 +1,45 @@
{
_config+:: {
// sharding dashboard flag
enableShardingOverview: false,
local this = self,

dashboardTags: ['mongodb-atlas-mixin'],
dashboardPeriod: 'now-30m',
dashboardTimezone: 'default',
dashboardRefresh: '1m',
// Basic filtering - MongoDB Atlas uses job and cl_name (cluster name) as primary filters
filteringSelector: 'job="integrations/mongodb-atlas"',
groupLabels: ['job', 'cl_name'],
instanceLabels: ['instance'],

// alerts thresholds
alertsDeadlocks: 10, // count
alertsSlowNetworkRequests: 10, // count
alertsHighDiskUsage: 90, // percentage: 0-100
alertsSlowHardwareIO: 3, // seconds
alertsHighTimeoutElections: 10, // count
// Dashboard settings
dashboardTags: ['mongodb-atlas-mixin'],
uid: 'mongodb-atlas',
dashboardNamePrefix: 'MongoDB Atlas',
dashboardRefresh: '1m',
dashboardPeriod: 'now-30m',
dashboardTimezone: 'default',

// Sharding dashboard flag, enable this to generate the sharding overview dashboard
enableShardingOverview: true,

// Logs configuration (MongoDB Atlas does not have Loki logs by default)
enableLokiLogs: false, // note for users, this is not supported by the MongoDB Atlas mixin as there shouldn't be any logs to monitor yet
logLabels: [],
extraLogLabels: [],
logsVolumeGroupBy: 'level',
showLogsVolume: false,

// Alert thresholds with units
alertsDeadlocks: 10, // count
alertsSlowNetworkRequests: 10, // count
alertsHighDiskUsage: 90, // %
alertsSlowHardwareIO: 3, // seconds
alertsHighTimeoutElections: 10, // count

// Metrics source
metricsSource: 'prometheus',

// Import signal definitions (organized by dashboard)
signals+: {
cluster: (import './signals/cluster.libsonnet')(this),
elections: (import './signals/elections.libsonnet')(this),
operations: (import './signals/operations.libsonnet')(this),
performance: (import './signals/performance.libsonnet')(this),
sharding: (import './signals/sharding.libsonnet')(this),
},
}
Loading
Loading