Skip to content

vmkteam/topsrv

Repository files navigation

topsrv

Server monitoring agent. Single binary replaces node_exporter + postgres_exporter + process-exporter + nginx-exporter. Auto-discovery, Prometheus-compatible metrics.

Install

curl -fsSL https://topsrv.io/install.sh | TOPSRV_TOKEN=xxx sudo -E bash

Options via environment variables:

Variable Default Description
TOPSRV_TOKEN Push token
TOPSRV_ENDPOINT https://push.topsrv.io/v1/write Push endpoint
VERSION latest Specific version to install
INSTALL_DIR /usr/local/bin Binary install path

Manual install:

# Download from https://github.com/vmkteam/topsrv/releases
tar -xzf topsrv_*_linux_amd64.tar.gz
sudo install -m 0755 topsrv /usr/local/bin/

Quick start

Run:

# Scrape mode — Prometheus pulls /metrics
topsrv -config /etc/topsrv/topsrv.toml -verbose

curl localhost:9100/metrics

Minimal config (/etc/topsrv/topsrv.toml):

[Server]
Listen = ":9100"

That's it. System, disk, network, netstat, process, and S.M.A.R.T. metrics are collected automatically. PostgreSQL and Nginx are auto-discovered if running.

Features

Collector Metrics Replaces
System CPU per-core, memory, load, swap, uptime, host info, context switches node_exporter
Disk IO (read/write bytes/ops/time), filesystem (space/inodes) node_exporter
Network IO per interface, interface info (MAC/IP/MTU/status) node_exporter
Netstat TCP connections by state/direction/port and remote-peer scope (loopback/private/public), TCP+UDP listening ports by scope and owning process, TCP retransmits, UDP/IP errors node_exporter
Process CPU, memory, disk IO, threads, FDs, worst_fd_ratio per process group process-exporter
PostgreSQL Connections (by state/addr/app), transactions, longest transaction age, checkpoints, bgwriter, locks, replication, WAL, wraparound, pg_stat_statements, tables (top 50) postgres_exporter
Nginx stub_status, access log parsing (text & JSON log_format, response time histogram, status codes, cache, 4xx/5xx URIs, bytes by URI) nginx-exporter + mtail
Angie JSON API (server zones, upstreams, SSL, caches, rate limiting, slabs) + access log parsing
S.M.A.R.T. Disk health (ATA attributes, NVMe health log, temperature, wear, errors) smartctl_exporter
SSL Certificates Certificate expiry monitoring (auto-discovered from nginx/angie config)
Bot-logs (opt-in) Ships UA-classified bot events from nginx access logs to topsrv.io as gzipped ndjson with disk-backed WAL. 38 families: global/RU/Asian search, AI 2026 crawlers, SEO tools, social link previews, archive
Packages Installed-package inventory: dpkg/rpm/apk parsed pure-Go (no shell-out). Aggregates on /metrics; full snapshot (NEVRA, vendor, GPG key, signature digest, modularityLabel, autoInstalled, repoOrigin, licenses) pushed to /v1/inventory for CVE matching apt-prom-exporter / pkg-exporter

Auto-discovery

On startup topsrv scans running processes and detects known services — no configuration needed:

Process Type Action
postgres / postmaster postgresql Auto-connects as topsrv user (password derived from Push.Token); falls back to hint if no token
nginx nginx Parses nginx.conf → finds log_format + access_log
angie angie Parses angie.conf → finds api /status/, log_format + access_log
redis-server redis Detected
pgbouncer pgbouncer Detected
php-fpm php-fpm Detected

For Nginx, auto-discovery parses nginx.conf including include directives, extracts log_format and access_log paths with $request_time. Both text and JSON (escape=json) log formats are auto-detected.

Configuration

Operating modes

Pull mode (default) — Prometheus scrapes /metrics:

[Server]
Listen = ":9100"

Push mode — agent pushes metrics to topsrv.io, VictoriaMetrics, or any compatible remote-write endpoint:

[Push]
Endpoint = "https://push.topsrv.io/v1/write"   # or your VictoriaMetrics URL
Token    = "ts_xxx"         # get your token at https://topsrv.io
Interval = "30s"
SpoolDir = "/var/lib/topsrv/spool"   # disk buffer for retries on network failure

Push-only mode — no HTTP server, only push. Omit [Server] or set Listen = "".

Both modes can work simultaneously.

Full reference

[Server]
Listen = ":9100"            # Prometheus /metrics endpoint

[Push]
Endpoint = ""               # Push URL
Token    = ""               # Bearer token for push auth
Interval = "30s"            # Push interval
SpoolDir = ""               # Disk buffer for retries

[Update]
Enabled  = false            # Auto-update via control plane
Interval = "15m"            # Check interval
Channel  = "stable"         # stable / beta

# PostgreSQL (optional — auto-discovery detects the process, DSN needed for connection)
# [Postgres]
# DSN      = "postgres://topsrv:pass@localhost:5432/postgres?sslmode=disable"
# Disabled = true    # set to skip PG monitoring even if discovery finds postgres

# Nginx (optional — auto-discovery parses nginx.conf automatically)
# [Nginx]
# StubStatusURL = "http://127.0.0.1/stub_status"
# LogFormat     = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_time $upstream_response_time'
# ExtraLabels   = ["server_name"]
# AccessLogs    = ["/var/log/nginx/access.log"]

# Angie (optional — auto-discovery parses angie.conf automatically)
# [Angie]
# StatusURL     = "http://127.0.0.1:8080/status/"
# StubStatusURL = "http://127.0.0.1/stub_status"
# LogFormat     = '$remote_addr ...'
# ExtraLabels   = ["server_name"]
# AccessLogs    = ["/var/log/angie/access.log"]

# S.M.A.R.T. disk health (always enabled, requires CAP_SYS_RAWIO + CAP_SYS_ADMIN)
# [Smart]
# Disabled = true    # set to disable
# Interval = "5m"

# Package inventory (optional — auto-discovery detects dpkg/rpm/apk via files).
# On /metrics only aggregates are exposed (counts, scan duration, error counter).
# Full per-package snapshot is pushed to /v1/inventory with kind="packages".
# [Packages]
# Disabled      = false      # set to fully skip the collector
# Interval      = "6h"       # snapshot scan period (±10% jitter to avoid herds)
# Managers      = []         # auto-detect by default; e.g. ["dpkg","rpm"] to force a subset
# DisablePush   = false      # set to skip POSTing snapshots to /v1/inventory (keep /metrics only)
# MaxPackages   = 10000      # safety cap; logs a warning and truncates if exceeded

# Bot-logs (optional — ships UA-classified bot events to topsrv.io for analytics)
# [BotLogs]
# Enabled       = true
# Token         = ""        # required — issued by topsrv.io per project
# Endpoint      = ""        # default: [Push].Endpoint with /v1/bot-logs path
# BatchSize     = 5000      # events per batch
# BatchInterval = "30s"     # flush interval
# SpoolDir      = ""        # default: [Push].SpoolDir; a "botlog/" subdir is created inside
# MaxSpoolMB    = 200       # WAL disk budget
# UATruncate    = 1024      # truncate user-agent at this length
# URITruncate   = 2048      # truncate request URI at this length
# ExtraUAPatterns = ["MyCustomCrawler/"]  # local additions to the bot list
# Field aliases — only set when discovery cannot infer the right name
# (e.g. operator-defined `set $custom $http_referer;`). Empty falls back.
# [BotLogs.FieldAliases]
# Referer = "ref"
Parameter Default Description
Server.Listen :9100 HTTP listen address for /metrics
Push.Endpoint Remote write URL
Push.Token Bearer token
Push.Interval 30s Push frequency
Push.SpoolDir Disk spool path (retries on network failure)
Update.Enabled false Enable auto-update
Update.Interval 15m Update check interval
Update.Channel stable Update channel (stable/beta)
Postgres.DSN PostgreSQL connection string
Postgres.Disabled false Skip PostgreSQL monitoring even if discovery finds a local postgres process
Nginx.StubStatusURL nginx stub_status URL
Nginx.LogFormat nginx log_format string (gonx format)
Nginx.ExtraLabels [] Log fields to add as metric labels
Nginx.AccessLogs [] Paths to access log files
Angie.StatusURL Angie JSON API URL (e.g. http://127.0.0.1:8080/status/)
Angie.StubStatusURL Fallback stub_status URL
Angie.LogFormat angie log_format string (gonx format)
Angie.ExtraLabels [] Log fields to add as metric labels
Angie.AccessLogs [] Paths to access log files
Smart.Disabled false Disable S.M.A.R.T. collector
Smart.Interval 5m S.M.A.R.T. polling interval
Packages.Disabled false Disable package inventory collector
Packages.Interval 6h Inventory scan period (±10% jitter); pushed to /v1/inventory
Packages.Managers [] Force subset of managers (dpkg/rpm/apk); empty = auto-detect
Packages.DisablePush false Skip POSTing snapshots to /v1/inventory (keeps /metrics aggregates)
Packages.MaxPackages 10000 Truncate snapshot if exceeded (logs a warning)
BotLogs.Enabled false Ship UA-classified bot events to topsrv.io
BotLogs.Token Bot-logs ingest bearer token (separate from Push.Token)
BotLogs.Endpoint derived Ingest URL; defaults to [Push].Endpoint with /v1/bot-logs path
BotLogs.BatchSize 5000 Events per batch
BotLogs.BatchInterval 30s Flush interval
BotLogs.SpoolDir derived Parent dir for WAL spool; botlog/ subdir is created inside. Defaults to [Push].SpoolDir
BotLogs.MaxSpoolMB 200 Disk budget for spool subdir
BotLogs.UATruncate 1024 Max UA length per event
BotLogs.URITruncate 2048 Max URI length per event
BotLogs.ExtraUAPatterns [] Local additions to known-bots UA patterns
BotLogs.FieldAliases.* auto Per-format field-name overrides (UserAgent/Host/ServerName/RemoteAddr/Referer). Discovery auto-detects from log_format; only set when operator uses non-standard variables nginx config

Environment variables

All config flags can be set via environment variables with TOPSRV_ prefix:

TOPSRV_CONFIG=/etc/topsrv/topsrv.toml topsrv

PostgreSQL

Supports PG15+ (version-gated features: pg_stat_wal PG14+, pg_stat_statements.toplevel PG14+, pg_stat_checkpointer PG17, shared_blk_*_time PG17).

Coverage:

  • Connections — by state, by client_addr, by application_name, max
  • Transactions — commit/rollback, deadlocks, temp files/bytes per database, longest transaction age
  • Checkpoints & bgwriter — timed/requested, checkpoint time, buffers (checkpoint/bgwriter/backend)
  • Autovacuum — common vs wraparound workers, max_workers
  • Locks — by mode + granted label, blocked_backends (via pg_blocking_pids()), max lock wait duration
  • Wait events — sampled from pg_stat_activity (backend_type, datname, application_name, wait_event_type, wait_event, state) — pganalyze/APM-style «why is it slow right now»
  • Replication — lag bytes, lag seconds by stage (write/flush/replay), sync_state, slots (retained bytes, active/inactive)
  • WAL — LSN position, files count via pg_ls_waldir(), plus pg_stat_wal (records, FPI, buffers_full, write/sync time) on PG14+
  • Archiverpg_stat_archiver with result=archived|failed, timestamp pattern (time() - last_timestamp_seconds for age)
  • Wraparoundxid_age vs autovacuum_freeze_max_age, per-database + cluster max
  • Database sizes — per-database byte size
  • Tables (top 50 by size) — seq/idx scans, tuple ops, dead tuples, autovacuum count, last_maintenance_timestamp{op=vacuum|analyze}, mod_since_analyze — with database, schema, table labels
  • Indexes (top 50) — scans_total, size_bytes — with database, schema, table, index labels
  • Bloat estimation (ioguix heuristic, refreshed every 15 min) — top 50 tables + top 50 btree indexes by wasted bytes. Catalog-only (never reads heap pages) — safe on multi-TB databases. Points at tables for pg_repack / CLUSTER and indexes for REINDEX CONCURRENTLY
  • pg_stat_statements — union of top 20 by time, calls, blocks read, blocks dirtied (DML pressure), and WAL bytes (PG13+, if available) — ~80–100 unique queries covering both read-heavy and write-heavy workloads; outlier-aware duration histogram; full query text pushed separately to control plane (/v1/meta)
  • Settings — curated GUCs (shared_buffers, work_mem, max_connections etc.) normalized to bytes/seconds
  • Stats reset timestampspg_stat_database, pg_stat_bgwriter, pg_stat_wal (PG17), pg_stat_archiver — for detecting unexpected resets

topsrv connects with application_name=topsrv (visible in pg_stat_activity) and selects the largest database on startup for per-DB views (pg_stat_user_tables/pg_stat_user_indexes).

Setup

1. Create a monitoring role. The built-in pg_monitor role (PG10+) grants read access to all statistics views and functions including pg_ls_waldir() — no custom functions or schemas needed:

sudo -u postgres psql -d postgres

CREATE ROLE topsrv LOGIN PASSWORD 'CHANGE_ME';
GRANT pg_monitor TO topsrv;

2. Allow connections in pg_hba.conf:

# local agent (typical setup)
host all topsrv 127.0.0.1/32 scram-sha-256
local all topsrv scram-sha-256

Reload config:

SELECT pg_reload_conf();

3. Configure topsrv — add DSN to config:

[Postgres]
DSN = "postgres://topsrv:CHANGE_ME@localhost:5432/postgres?sslmode=disable"

4. (Optional) Enable pg_stat_statements for query-level metrics (top 20 queries, duration histogram):

Add to postgresql.conf:

shared_preload_libraries = 'pg_stat_statements'   # requires restart
pg_stat_statements.max = 500
pg_stat_statements.track = top
track_io_timing = on                               # recommended for block read/write time

Restart PostgreSQL, then create the extension in the same database that topsrv connects to (the one specified in Postgres.DSN):

-- connect to the database from DSN (e.g. postgres)
\c postgres
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
GRANT SELECT ON pg_stat_statements TO topsrv;

Note: pg_stat_statements view is only visible in the database where the extension is created. If your DSN points to mydb, run CREATE EXTENSION in mydb, not in postgres. The topsrv role also needs an explicit SELECT grant on the view — pg_monitor alone is not sufficient.

If pg_stat_statements is not installed, topsrv silently skips query metrics — everything else works.

Behaviour when PG is unreachable

NewCollector is lazy — the connection pool is created on first Collect(). If PostgreSQL is down at topsrv startup (e.g. systemd boot ordering), topsrv_pg_up will report 0 and the non-PG collectors continue working. The next scrape retries the connection automatically; no topsrv restart is needed once PG comes back.

To skip PG monitoring entirely (even when auto-discovery finds a local process), set Postgres.Disabled = true in the config.

Nginx

Two collectors:

stub_status — connections, requests, nginx_up (0/1).

access log — tail + parsing via configurable LogFormat. Supports both text (gonx) and JSON (escape=json) formats:

  • topsrv_nginx_request_duration_seconds — histogram (buckets: 5ms…10s)
  • topsrv_nginx_upstream_duration_seconds — upstream histogram
  • topsrv_nginx_http_requests_total{status} — by status code
  • topsrv_nginx_cache_requests_total{status} — HIT/MISS/EXPIRED
  • topsrv_nginx_5xx_requests_total{status,uri} — 5xx with URL normalization (/users/123/users/:id)
  • topsrv_nginx_4xx_requests_total{status,uri} — 4xx with URL normalization
  • topsrv_nginx_response_bytes_total — total response bytes
  • topsrv_nginx_response_bytes_by_uri_total{uri} — response bytes by normalized URI
  • Custom labelsExtraLabels adds log fields as metric labels (server_name, http_platform, etc.)

SSL certificates — auto-discovered ssl_certificate paths from config. topsrv_ssl_certificate_expiry_seconds{path,cn,issuer} exposes NotAfter as Unix timestamp (one series per cert file); topsrv_ssl_certificate_san_info{path,domain} enumerates every DNS name in CN ∪ SANs so multi-host (SAN) certs aren't reduced to their CN in dashboards. Re-reads every 5 minutes to detect Let's Encrypt renewals.

Angie

Angie (nginx fork) is supported with a dedicated JSON API collector providing detailed per-zone, per-upstream, SSL, cache, and rate limiting metrics — features not available in nginx free.

Three collectors:

JSON API (/status/) — connections, server zones, upstreams (per-peer state, health, requests), caches, rate limiting, shared memory slabs. Requires api /status/; directive in Angie config.

stub_status — fallback when API is not configured (same 7 metrics as nginx).

access log — same as nginx: request duration histogram, status codes, cache status, 4xx/5xx by URI, bytes by URI.

Auto-discovery parses angie.conf and detects both api /status/; and stub_status directives. If Angie API is available, it takes priority over stub_status.

Angie config for monitoring

http {
    server {
        listen 80;
        server_name example.com;
        status_zone http_main;        # enable per-zone metrics

        location / {
            proxy_pass http://backend;
        }
    }

    upstream backend {
        zone backend_zone 64k;        # required for upstream metrics
        server 10.0.0.1:8080;
    }

    server {
        listen 127.0.0.1:8080;
        location /status/ {
            api /status/;
            allow 127.0.0.1;
            deny all;
        }
    }
}

Bot-logs (optional)

When [BotLogs].Enabled = true, every parsed nginx access-log line is matched against a built-in UA fingerprint table (38 families, 100+ patterns) covering:

  • Search engines — Googlebot, Bingbot, DuckDuckBot, Applebot, YandexBot (10 subtypes), Mail.RU_Bot, StackRambler, SputnikBot, Baiduspider, Sogou, 360Spider, PetalBot, Naver Yeti, Daum
  • AI / LLM — GPTBot / OAI-SearchBot / ChatGPT-User, ClaudeBot / Claude-User / Claude-SearchBot, PerplexityBot / Perplexity-User, MistralAI-User, cohere-ai, Amazonbot, Diffbot, AI2Bot, YouBot, Timpibot, Meta-ExternalAgent / Fetcher, Bytespider, TikTokSpider, CCBot
  • SEO crawlers — AhrefsBot, SemrushBot (+ subtypes), MJ12bot, DotBot, rogerbot, BLEXBot, DataForSeoBot
  • Social / messengers — Twitterbot, LinkedInBot, Pinterestbot, Slackbot, Discordbot, TelegramBot, WhatsApp, vkShare, SkypeUriPreview
  • Archive — ia_archiver, archive.org_bot

Matched events become JSON records with botFamily / botName / serverName / remoteAddr / uri / status / timing, are batched into gzipped ndjson, and POSTed to the topsrv.io /v1/bot-logs endpoint. On transient send failures the batch lands in SpoolDir/botlog/ for replay on the next interval; permanent rejections (HTTP 4xx, except 408/429) are discarded without retry to avoid corrupted batches blocking the queue.

Enabling [BotLogs] extends nginx auto-discovery to log_formats without $request_time so bot events from API/auxiliary vhosts flow through too — timing histograms are simply skipped per-line for those. Installs without [BotLogs] keep the original behavior.

Required log_format variables: $http_user_agent, $host / $server_name, $remote_addr, $http_referer. Field names are auto-detected from each tailed log_format — common alternates work out of the box (http_host, realip_remote_addr, http_x_real_ip, http_x_forwarded_for, http_referrer typo, custom JSON keys). The resolution and its provenance (override / detected / default) are emitted as a botlog: resolved field aliases log line at startup. When discovery cannot infer the right name (e.g. operator-defined set $custom $http_referer;), override via [BotLogs.FieldAliases]. If a format genuinely lacks UA, a botlog_no_ua_field warning is raised and topsrv_collector_config_warnings_total{kind=...} ticks. Operator-supplied [Nginx].ExtraLabels are kept on metrics labels; required botlog fields are unioned into ExtractFields for parsing.

Metrics: topsrv_botlog_events_total{state=enqueued|sent|spooled|dropped}, topsrv_botlog_match_total{family}, topsrv_botlog_send_errors_total{kind}, topsrv_botlog_batch_duration_seconds, topsrv_botlog_spool_files, topsrv_botlog_spool_bytes.

Metrics reference

Full list of all metrics: docs/metrics.md

Development

Requires GOEXPERIMENT=jsonv2 (set automatically by Makefile and goreleaser).

make build              # Build binary
make test               # Unit tests
make test-integration   # Integration tests (requires Docker)
make fmt                # Format code (golangci-lint fmt)
make lint               # Lint (golangci-lint run)
make run                # Run with local config
make demo               # Start demo (VictoriaMetrics + topsrv)

From source:

git clone https://github.com/vmkteam/topsrv.git
cd topsrv
make build
sudo install -m 0755 bin/topsrv /usr/local/bin/

License

Apache-2.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors