Server monitoring agent. Single binary replaces node_exporter + postgres_exporter + process-exporter + nginx-exporter. Auto-discovery, Prometheus-compatible metrics.
curl -fsSL https://topsrv.io/install.sh | TOPSRV_TOKEN=xxx sudo -E bashOptions via environment variables:
| Variable | Default | Description |
|---|---|---|
TOPSRV_TOKEN |
— | Push token |
TOPSRV_ENDPOINT |
https://push.topsrv.io/v1/write |
Push endpoint |
VERSION |
latest | Specific version to install |
INSTALL_DIR |
/usr/local/bin |
Binary install path |
Manual install:
# Download from https://github.com/vmkteam/topsrv/releases
tar -xzf topsrv_*_linux_amd64.tar.gz
sudo install -m 0755 topsrv /usr/local/bin/Run:
# Scrape mode — Prometheus pulls /metrics
topsrv -config /etc/topsrv/topsrv.toml -verbose
curl localhost:9100/metricsMinimal config (/etc/topsrv/topsrv.toml):
[Server]
Listen = ":9100"That's it. System, disk, network, netstat, process, and S.M.A.R.T. metrics are collected automatically. PostgreSQL and Nginx are auto-discovered if running.
| Collector | Metrics | Replaces |
|---|---|---|
| System | CPU per-core, memory, load, swap, uptime, host info, context switches | node_exporter |
| Disk | IO (read/write bytes/ops/time), filesystem (space/inodes) | node_exporter |
| Network | IO per interface, interface info (MAC/IP/MTU/status) | node_exporter |
| Netstat | TCP connections by state/direction/port and remote-peer scope (loopback/private/public), TCP+UDP listening ports by scope and owning process, TCP retransmits, UDP/IP errors | node_exporter |
| Process | CPU, memory, disk IO, threads, FDs, worst_fd_ratio per process group | process-exporter |
| PostgreSQL | Connections (by state/addr/app), transactions, longest transaction age, checkpoints, bgwriter, locks, replication, WAL, wraparound, pg_stat_statements, tables (top 50) | postgres_exporter |
| Nginx | stub_status, access log parsing (text & JSON log_format, response time histogram, status codes, cache, 4xx/5xx URIs, bytes by URI) | nginx-exporter + mtail |
| Angie | JSON API (server zones, upstreams, SSL, caches, rate limiting, slabs) + access log parsing | — |
| S.M.A.R.T. | Disk health (ATA attributes, NVMe health log, temperature, wear, errors) | smartctl_exporter |
| SSL Certificates | Certificate expiry monitoring (auto-discovered from nginx/angie config) | — |
| Bot-logs (opt-in) | Ships UA-classified bot events from nginx access logs to topsrv.io as gzipped ndjson with disk-backed WAL. 38 families: global/RU/Asian search, AI 2026 crawlers, SEO tools, social link previews, archive | — |
| Packages | Installed-package inventory: dpkg/rpm/apk parsed pure-Go (no shell-out). Aggregates on /metrics; full snapshot (NEVRA, vendor, GPG key, signature digest, modularityLabel, autoInstalled, repoOrigin, licenses) pushed to /v1/inventory for CVE matching |
apt-prom-exporter / pkg-exporter |
On startup topsrv scans running processes and detects known services — no configuration needed:
| Process | Type | Action |
|---|---|---|
postgres / postmaster |
postgresql | Auto-connects as topsrv user (password derived from Push.Token); falls back to hint if no token |
nginx |
nginx | Parses nginx.conf → finds log_format + access_log |
angie |
angie | Parses angie.conf → finds api /status/, log_format + access_log |
redis-server |
redis | Detected |
pgbouncer |
pgbouncer | Detected |
php-fpm |
php-fpm | Detected |
For Nginx, auto-discovery parses nginx.conf including include directives, extracts log_format and access_log paths with $request_time. Both text and JSON (escape=json) log formats are auto-detected.
Pull mode (default) — Prometheus scrapes /metrics:
[Server]
Listen = ":9100"Push mode — agent pushes metrics to topsrv.io, VictoriaMetrics, or any compatible remote-write endpoint:
[Push]
Endpoint = "https://push.topsrv.io/v1/write" # or your VictoriaMetrics URL
Token = "ts_xxx" # get your token at https://topsrv.io
Interval = "30s"
SpoolDir = "/var/lib/topsrv/spool" # disk buffer for retries on network failurePush-only mode — no HTTP server, only push. Omit [Server] or set Listen = "".
Both modes can work simultaneously.
[Server]
Listen = ":9100" # Prometheus /metrics endpoint
[Push]
Endpoint = "" # Push URL
Token = "" # Bearer token for push auth
Interval = "30s" # Push interval
SpoolDir = "" # Disk buffer for retries
[Update]
Enabled = false # Auto-update via control plane
Interval = "15m" # Check interval
Channel = "stable" # stable / beta
# PostgreSQL (optional — auto-discovery detects the process, DSN needed for connection)
# [Postgres]
# DSN = "postgres://topsrv:pass@localhost:5432/postgres?sslmode=disable"
# Disabled = true # set to skip PG monitoring even if discovery finds postgres
# Nginx (optional — auto-discovery parses nginx.conf automatically)
# [Nginx]
# StubStatusURL = "http://127.0.0.1/stub_status"
# LogFormat = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_time $upstream_response_time'
# ExtraLabels = ["server_name"]
# AccessLogs = ["/var/log/nginx/access.log"]
# Angie (optional — auto-discovery parses angie.conf automatically)
# [Angie]
# StatusURL = "http://127.0.0.1:8080/status/"
# StubStatusURL = "http://127.0.0.1/stub_status"
# LogFormat = '$remote_addr ...'
# ExtraLabels = ["server_name"]
# AccessLogs = ["/var/log/angie/access.log"]
# S.M.A.R.T. disk health (always enabled, requires CAP_SYS_RAWIO + CAP_SYS_ADMIN)
# [Smart]
# Disabled = true # set to disable
# Interval = "5m"
# Package inventory (optional — auto-discovery detects dpkg/rpm/apk via files).
# On /metrics only aggregates are exposed (counts, scan duration, error counter).
# Full per-package snapshot is pushed to /v1/inventory with kind="packages".
# [Packages]
# Disabled = false # set to fully skip the collector
# Interval = "6h" # snapshot scan period (±10% jitter to avoid herds)
# Managers = [] # auto-detect by default; e.g. ["dpkg","rpm"] to force a subset
# DisablePush = false # set to skip POSTing snapshots to /v1/inventory (keep /metrics only)
# MaxPackages = 10000 # safety cap; logs a warning and truncates if exceeded
# Bot-logs (optional — ships UA-classified bot events to topsrv.io for analytics)
# [BotLogs]
# Enabled = true
# Token = "" # required — issued by topsrv.io per project
# Endpoint = "" # default: [Push].Endpoint with /v1/bot-logs path
# BatchSize = 5000 # events per batch
# BatchInterval = "30s" # flush interval
# SpoolDir = "" # default: [Push].SpoolDir; a "botlog/" subdir is created inside
# MaxSpoolMB = 200 # WAL disk budget
# UATruncate = 1024 # truncate user-agent at this length
# URITruncate = 2048 # truncate request URI at this length
# ExtraUAPatterns = ["MyCustomCrawler/"] # local additions to the bot list
# Field aliases — only set when discovery cannot infer the right name
# (e.g. operator-defined `set $custom $http_referer;`). Empty falls back.
# [BotLogs.FieldAliases]
# Referer = "ref"| Parameter | Default | Description |
|---|---|---|
Server.Listen |
:9100 |
HTTP listen address for /metrics |
Push.Endpoint |
— | Remote write URL |
Push.Token |
— | Bearer token |
Push.Interval |
30s |
Push frequency |
Push.SpoolDir |
— | Disk spool path (retries on network failure) |
Update.Enabled |
false |
Enable auto-update |
Update.Interval |
15m |
Update check interval |
Update.Channel |
stable |
Update channel (stable/beta) |
Postgres.DSN |
— | PostgreSQL connection string |
Postgres.Disabled |
false |
Skip PostgreSQL monitoring even if discovery finds a local postgres process |
Nginx.StubStatusURL |
— | nginx stub_status URL |
Nginx.LogFormat |
— | nginx log_format string (gonx format) |
Nginx.ExtraLabels |
[] |
Log fields to add as metric labels |
Nginx.AccessLogs |
[] |
Paths to access log files |
Angie.StatusURL |
— | Angie JSON API URL (e.g. http://127.0.0.1:8080/status/) |
Angie.StubStatusURL |
— | Fallback stub_status URL |
Angie.LogFormat |
— | angie log_format string (gonx format) |
Angie.ExtraLabels |
[] |
Log fields to add as metric labels |
Angie.AccessLogs |
[] |
Paths to access log files |
Smart.Disabled |
false |
Disable S.M.A.R.T. collector |
Smart.Interval |
5m |
S.M.A.R.T. polling interval |
Packages.Disabled |
false |
Disable package inventory collector |
Packages.Interval |
6h |
Inventory scan period (±10% jitter); pushed to /v1/inventory |
Packages.Managers |
[] |
Force subset of managers (dpkg/rpm/apk); empty = auto-detect |
Packages.DisablePush |
false |
Skip POSTing snapshots to /v1/inventory (keeps /metrics aggregates) |
Packages.MaxPackages |
10000 |
Truncate snapshot if exceeded (logs a warning) |
BotLogs.Enabled |
false |
Ship UA-classified bot events to topsrv.io |
BotLogs.Token |
— | Bot-logs ingest bearer token (separate from Push.Token) |
BotLogs.Endpoint |
derived | Ingest URL; defaults to [Push].Endpoint with /v1/bot-logs path |
BotLogs.BatchSize |
5000 |
Events per batch |
BotLogs.BatchInterval |
30s |
Flush interval |
BotLogs.SpoolDir |
derived | Parent dir for WAL spool; botlog/ subdir is created inside. Defaults to [Push].SpoolDir |
BotLogs.MaxSpoolMB |
200 |
Disk budget for spool subdir |
BotLogs.UATruncate |
1024 |
Max UA length per event |
BotLogs.URITruncate |
2048 |
Max URI length per event |
BotLogs.ExtraUAPatterns |
[] |
Local additions to known-bots UA patterns |
BotLogs.FieldAliases.* |
auto | Per-format field-name overrides (UserAgent/Host/ServerName/RemoteAddr/Referer). Discovery auto-detects from log_format; only set when operator uses non-standard variables nginx config |
All config flags can be set via environment variables with TOPSRV_ prefix:
TOPSRV_CONFIG=/etc/topsrv/topsrv.toml topsrvSupports PG15+ (version-gated features: pg_stat_wal PG14+, pg_stat_statements.toplevel PG14+, pg_stat_checkpointer PG17, shared_blk_*_time PG17).
Coverage:
- Connections — by state, by client_addr, by application_name, max
- Transactions — commit/rollback, deadlocks, temp files/bytes per database, longest transaction age
- Checkpoints & bgwriter — timed/requested, checkpoint time, buffers (checkpoint/bgwriter/backend)
- Autovacuum — common vs wraparound workers, max_workers
- Locks — by mode + granted label,
blocked_backends(viapg_blocking_pids()), max lock wait duration - Wait events — sampled from
pg_stat_activity(backend_type, datname, application_name, wait_event_type, wait_event, state) — pganalyze/APM-style «why is it slow right now» - Replication — lag bytes, lag seconds by stage (
write/flush/replay),sync_state, slots (retained bytes, active/inactive) - WAL — LSN position, files count via
pg_ls_waldir(), pluspg_stat_wal(records, FPI,buffers_full, write/sync time) on PG14+ - Archiver —
pg_stat_archiverwithresult=archived|failed, timestamp pattern (time() - last_timestamp_secondsfor age) - Wraparound —
xid_agevsautovacuum_freeze_max_age, per-database + cluster max - Database sizes — per-database byte size
- Tables (top 50 by size) — seq/idx scans, tuple ops, dead tuples, autovacuum count,
last_maintenance_timestamp{op=vacuum|analyze},mod_since_analyze— withdatabase, schema, tablelabels - Indexes (top 50) —
scans_total,size_bytes— withdatabase, schema, table, indexlabels - Bloat estimation (ioguix heuristic, refreshed every 15 min) — top 50 tables + top 50 btree indexes by wasted bytes. Catalog-only (never reads heap pages) — safe on multi-TB databases. Points at tables for
pg_repack/CLUSTERand indexes forREINDEX CONCURRENTLY - pg_stat_statements — union of top 20 by time, calls, blocks read, blocks dirtied (DML pressure), and WAL bytes (PG13+, if available) — ~80–100 unique queries covering both read-heavy and write-heavy workloads; outlier-aware duration histogram; full query text pushed separately to control plane (
/v1/meta) - Settings — curated GUCs (
shared_buffers,work_mem,max_connectionsetc.) normalized to bytes/seconds - Stats reset timestamps —
pg_stat_database,pg_stat_bgwriter,pg_stat_wal(PG17),pg_stat_archiver— for detecting unexpected resets
topsrv connects with application_name=topsrv (visible in pg_stat_activity) and selects the largest database on startup for per-DB views (pg_stat_user_tables/pg_stat_user_indexes).
1. Create a monitoring role. The built-in pg_monitor role (PG10+) grants read access to all statistics views and functions including pg_ls_waldir() — no custom functions or schemas needed:
sudo -u postgres psql -d postgres
CREATE ROLE topsrv LOGIN PASSWORD 'CHANGE_ME';
GRANT pg_monitor TO topsrv;2. Allow connections in pg_hba.conf:
# local agent (typical setup)
host all topsrv 127.0.0.1/32 scram-sha-256
local all topsrv scram-sha-256
Reload config:
SELECT pg_reload_conf();3. Configure topsrv — add DSN to config:
[Postgres]
DSN = "postgres://topsrv:CHANGE_ME@localhost:5432/postgres?sslmode=disable"4. (Optional) Enable pg_stat_statements for query-level metrics (top 20 queries, duration histogram):
Add to postgresql.conf:
shared_preload_libraries = 'pg_stat_statements' # requires restart
pg_stat_statements.max = 500
pg_stat_statements.track = top
track_io_timing = on # recommended for block read/write timeRestart PostgreSQL, then create the extension in the same database that topsrv connects to (the one specified in Postgres.DSN):
-- connect to the database from DSN (e.g. postgres)
\c postgres
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
GRANT SELECT ON pg_stat_statements TO topsrv;Note:
pg_stat_statementsview is only visible in the database where the extension is created. If your DSN points tomydb, runCREATE EXTENSIONinmydb, not inpostgres. Thetopsrvrole also needs an explicitSELECTgrant on the view —pg_monitoralone is not sufficient.
If pg_stat_statements is not installed, topsrv silently skips query metrics — everything else works.
NewCollector is lazy — the connection pool is created on first Collect(). If PostgreSQL is down at topsrv startup (e.g. systemd boot ordering), topsrv_pg_up will report 0 and the non-PG collectors continue working. The next scrape retries the connection automatically; no topsrv restart is needed once PG comes back.
To skip PG monitoring entirely (even when auto-discovery finds a local process), set Postgres.Disabled = true in the config.
Two collectors:
stub_status — connections, requests, nginx_up (0/1).
access log — tail + parsing via configurable LogFormat. Supports both text (gonx) and JSON (escape=json) formats:
topsrv_nginx_request_duration_seconds— histogram (buckets: 5ms…10s)topsrv_nginx_upstream_duration_seconds— upstream histogramtopsrv_nginx_http_requests_total{status}— by status codetopsrv_nginx_cache_requests_total{status}— HIT/MISS/EXPIREDtopsrv_nginx_5xx_requests_total{status,uri}— 5xx with URL normalization (/users/123→/users/:id)topsrv_nginx_4xx_requests_total{status,uri}— 4xx with URL normalizationtopsrv_nginx_response_bytes_total— total response bytestopsrv_nginx_response_bytes_by_uri_total{uri}— response bytes by normalized URI- Custom labels —
ExtraLabelsadds log fields as metric labels (server_name, http_platform, etc.)
SSL certificates — auto-discovered ssl_certificate paths from config. topsrv_ssl_certificate_expiry_seconds{path,cn,issuer} exposes NotAfter as Unix timestamp (one series per cert file); topsrv_ssl_certificate_san_info{path,domain} enumerates every DNS name in CN ∪ SANs so multi-host (SAN) certs aren't reduced to their CN in dashboards. Re-reads every 5 minutes to detect Let's Encrypt renewals.
Angie (nginx fork) is supported with a dedicated JSON API collector providing detailed per-zone, per-upstream, SSL, cache, and rate limiting metrics — features not available in nginx free.
Three collectors:
JSON API (/status/) — connections, server zones, upstreams (per-peer state, health, requests), caches, rate limiting, shared memory slabs. Requires api /status/; directive in Angie config.
stub_status — fallback when API is not configured (same 7 metrics as nginx).
access log — same as nginx: request duration histogram, status codes, cache status, 4xx/5xx by URI, bytes by URI.
Auto-discovery parses angie.conf and detects both api /status/; and stub_status directives. If Angie API is available, it takes priority over stub_status.
http {
server {
listen 80;
server_name example.com;
status_zone http_main; # enable per-zone metrics
location / {
proxy_pass http://backend;
}
}
upstream backend {
zone backend_zone 64k; # required for upstream metrics
server 10.0.0.1:8080;
}
server {
listen 127.0.0.1:8080;
location /status/ {
api /status/;
allow 127.0.0.1;
deny all;
}
}
}When [BotLogs].Enabled = true, every parsed nginx access-log line is matched
against a built-in UA fingerprint table (38 families, 100+ patterns) covering:
- Search engines — Googlebot, Bingbot, DuckDuckBot, Applebot, YandexBot (10 subtypes), Mail.RU_Bot, StackRambler, SputnikBot, Baiduspider, Sogou, 360Spider, PetalBot, Naver Yeti, Daum
- AI / LLM — GPTBot / OAI-SearchBot / ChatGPT-User, ClaudeBot / Claude-User / Claude-SearchBot, PerplexityBot / Perplexity-User, MistralAI-User, cohere-ai, Amazonbot, Diffbot, AI2Bot, YouBot, Timpibot, Meta-ExternalAgent / Fetcher, Bytespider, TikTokSpider, CCBot
- SEO crawlers — AhrefsBot, SemrushBot (+ subtypes), MJ12bot, DotBot, rogerbot, BLEXBot, DataForSeoBot
- Social / messengers — Twitterbot, LinkedInBot, Pinterestbot, Slackbot, Discordbot, TelegramBot, WhatsApp, vkShare, SkypeUriPreview
- Archive — ia_archiver, archive.org_bot
Matched events become JSON records with botFamily / botName /
serverName / remoteAddr / uri / status / timing, are batched into
gzipped ndjson, and POSTed to the topsrv.io /v1/bot-logs endpoint. On
transient send failures the batch lands in SpoolDir/botlog/ for replay on
the next interval; permanent rejections (HTTP 4xx, except 408/429) are
discarded without retry to avoid corrupted batches blocking the queue.
Enabling [BotLogs] extends nginx auto-discovery to log_formats without
$request_time so bot events from API/auxiliary vhosts flow through too —
timing histograms are simply skipped per-line for those. Installs without
[BotLogs] keep the original behavior.
Required log_format variables: $http_user_agent, $host /
$server_name, $remote_addr, $http_referer. Field names are
auto-detected from each tailed log_format — common alternates work
out of the box (http_host, realip_remote_addr, http_x_real_ip,
http_x_forwarded_for, http_referrer typo, custom JSON keys). The
resolution and its provenance (override / detected / default) are
emitted as a botlog: resolved field aliases log line at startup.
When discovery cannot infer the right name (e.g. operator-defined
set $custom $http_referer;), override via [BotLogs.FieldAliases].
If a format genuinely lacks UA, a botlog_no_ua_field warning is
raised and topsrv_collector_config_warnings_total{kind=...} ticks.
Operator-supplied [Nginx].ExtraLabels are kept on metrics labels;
required botlog fields are unioned into ExtractFields for parsing.
Metrics: topsrv_botlog_events_total{state=enqueued|sent|spooled|dropped},
topsrv_botlog_match_total{family}, topsrv_botlog_send_errors_total{kind},
topsrv_botlog_batch_duration_seconds, topsrv_botlog_spool_files,
topsrv_botlog_spool_bytes.
Full list of all metrics: docs/metrics.md
Requires GOEXPERIMENT=jsonv2 (set automatically by Makefile and goreleaser).
make build # Build binary
make test # Unit tests
make test-integration # Integration tests (requires Docker)
make fmt # Format code (golangci-lint fmt)
make lint # Lint (golangci-lint run)
make run # Run with local config
make demo # Start demo (VictoriaMetrics + topsrv)From source:
git clone https://github.com/vmkteam/topsrv.git
cd topsrv
make build
sudo install -m 0755 bin/topsrv /usr/local/bin/Apache-2.0