Add metrics for couchdb processes

A common failure mode for the CHT is high CPU usage by CouchDB.

High CPU usage can be monitored in Watchdog using Node Exporter with Prometheus or any other infrastructure monitoring tool. However, without any additional information, it’s difficult to determine the root cause of performance issues. Currently, it’s often necessary to infer what CouchDB is doing from its logs, HAProxy logs, or other indirect sources, rather than directly observing the operations consuming CPU resources.

There are several tools that can inspect the Erlang process list, such as [etop](https://www.erlang.org/doc/apps/observer/etop.html) or [recon](https://ferd.github.io/recon/overview.html), which could be useful. It may also be possible to export relevant metrics directly to Prometheus.

Regardless of the implementation, CHT Watchdog should provide enough information to answer the common question:  
> "What is this CHT deployment doing that requires so much CPU?"


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add metrics for couchdb processes #171

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add metrics for couchdb processes #171

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions