ghprowl — GitHub Prowler

A multi-target watcher for credentials and internal references that leak into public GitHub. Engineers commit tokens to sample repos, paste connection strings into gists, and push dotfiles that point at internal hosts. ghprowl watches many programs at once, finds those leaks the moment they appear, and pings you — without ever moving the secret off your machine.

It is config-driven: a generic engine in lib/, one small config.env + marker catalog per target. The engine ships in this repo; per-target data never does (see Hygiene).

Built for authorized bug-bounty / security research. Public data only. See Scope & ethics.

The design story: Rare, Not Random — the reasoning behind the method, the engineering tradeoffs, and an A/B against a hand-tuned baseline (more recall, a third of the clone load).

Why it works

Three ideas do most of the lifting.

1. Rare, not random

The naive way to find a program's leaks is to search GitHub for its name. That drowns you in tutorials and forks. ghprowl instead builds a marker catalog — endpoints, internal hostnames, env-var names, custom headers, proto packages, SDK module names — and ranks every marker by its global code-search frequency, then queries the rarest first.

A marker with a global count of 0–10 is a near-perfect signal: almost the only code that contains internal-gateway.corp.example is code that shouldn't be public. A marker with a count of 50k (api_key, AUTH_TOKEN, a vendor's public SDK URL) is noise. Rarity is precision.

This applies to hand-added markers too — "distinctive" is not the same as "rare." A public SDK repository URL or a session-cookie name is distinctive to a target yet appears in every integrator's build config. Rank everything; trust the rare.

2. Two-depth ledger

Watching every candidate by cloning it doesn't scale. ghprowl splits findings into two tiers:

deep — high-confidence (a rare-marker hit). Cloned and gitleaks-scanned every cycle.
light — speculative (a name-pattern match, a contributor, a fork). Tracked only; promoted to deep automatically when it later trips a marker (sweep).

A third state, confidence none, quarantines squatters and recon dumps so they never waste a clone. The result: a wide net with a small, high-signal clone set. Most light entries are tripwires — empty or unrelated accounts whose value is the day one of them fat-fingers a push.

3. The escape hatch

Auto-onboarding (setup) derives markers from public sources — program scope, org-repo READMEs, on-disk recon. That reliably reconstructs a target's service inventory. It cannot reconstruct the non-public gold: a client_id captured from traffic, an internal API-gateway id, a proto package name. Those are the highest-signal markers, and they are added by hand as manual markers, which are always queried regardless of the per-run budget. The escape hatch is mandatory, not optional — but its markers get rarity-ranked like everything else.

Install

Requirements: gh (authenticated), gitleaks, jq, curl. Optional: qrencode (for topics qr), sqlite3 + a HackerOne scope cache (richer setup).

git clone https://github.com/DoctorGoz/ghprowl ~/.ghprowl
ln -s ~/.ghprowl/bin/ghprowl ~/.local/bin/ghprowl    # put it on PATH
ghprowl list

The clone is the install: ~/.ghprowl is the working tree, and your per-target data lives inside it under targets/<handle>/ (gitignored).

Quickstart

ghprowl setup acme-corp        # onboard: derive scope -> markers -> FIT check -> DRAFT config (stops for review)
#                              # REVIEW targets/acme-corp/{config.env,markers.tsv}; add manual gold markers
ghprowl discover acme-corp     # widen the net: rare-marker code-search + pivots, tiered into the ledger
ghprowl watch acme-corp        # detect: clone+gitleaks the deep set, alert on a LIVE token
ghprowl status                 # dashboard across all targets

setup deliberately stops for review — it produces a draft, never a live config, and reports a FIT verdict (if a target has no meaningful public GitHub footprint, it says so rather than running a dead watcher). That review step — prune noisy markers, paste in your non-public gold — is where the precision comes from.

Commands

Command	What it does
`setup <handle>`	Onboard a target: derive scope → markers → FIT check → draft `config.env` + catalog. Stops for review.
`rank <target>`	(Re)rank the marker catalog by global rarity, rarest first.
`discover <target>`	Widen the net: rare-marker code-search + contributor/fork pivots, tiered into the ledger.
`sweep <target> [N]`	Promotion pass: light → deep when a tracked entity now trips a marker.
`watch <target>`	Detection: clone/refresh the deep set, gitleaks-scan changed repos, alert on a LIVE hit.
`status [target]`	No-network dashboard: per-target tiers, markers, alerts, last run.
`topics [target]`	List each target's ntfy topic (the subscribe key) + URL.
`topics qr <target>`	Render the subscribe URL as a scannable terminal QR (no phone typing).
`topics test <target>`	Send a harmless confirmation ping to verify a subscription end-to-end.
`list`	List configured targets.

watch, discover, and sweep accept --all instead of a target to run every configured target — the entry point for cron.

Alerting

A hit is announced three ways, and the secret never leaves the host:

ntfy push — a heads-up only ("Actionable hit in <repo>. Check the local file."). The token is never in the message. Each target gets a unique auto-minted topic; subscribe once (ghprowl topics qr <target> → scan).
URGENT-LIVE-TOKEN.txt — local, LIVE hits only, with the repo, commit URL, and the exact git show command to view the secret offline.
alerts/ALERTS.md — local append-only log of every hit.

A baseline (first) scan of a repo suppresses historical findings and alerts only on live tokens, so onboarding a target doesn't bury you. watch exits non-zero on a hit for cron-side handling.

Per-target config (`targets/<handle>/config.env`)

Key	Purpose
`GP_GH_ORGS`	Org(s) that root discovery and are watched directly.
`GP_DOMAINS`	In-scope domains (host markers + issuer derivation).
`GP_ISSUER_SUBSTR`	JWT `iss` substrings that mark a token as this target's (the live check).
`GP_SOFT_KEYWORDS`	Broad relevance keywords (issuer/code detection).
`GP_SWEEP_KEYWORDS`	Curated brand-coined subset for the name-sweep + interest test. Prune generic words here.
`GP_NAME_PATTERNS`	Work-account login patterns to sweep (e.g. `-acme` / `acme-`).
`GP_FORK_PIVOT`	`0` by default; `1` enables bounded fork enumeration — only sane on small targets.
`GP_NTFY_TOPIC`	Alert topic (auto-minted; the token itself never transmitted).
`GP_MARKER_BUDGET`	How many of the rarest markers to spend the code-search budget on per `discover`.
`GP_MAX_REPO_MB`	`watch` skips repos larger than this (default 500) so one monorepo can't wedge a cycle.

The marker catalog (markers.tsv → ranked markers-ranked.tsv) carries kinds like endpoint, internal-host, env-var, header, proto-pkg, and manual (your hand-added gold).

How a watch cycle works

For each repo in the deep set (org repos + marker-hit leaks):

Big-repo guard — check GitHub's reported size first; skip + warn if over GP_MAX_REPO_MB (override a specific repo via force-big.txt). Unattended cron can't prompt, so the design is skip-and-warn, not ask.
Clone or fetch; skip unchanged repos via a per-repo commit-set checkpoint.
gitleaks full-history scan → an issuer-aware post-filter keeps only target-relevant secrets.
Alert on a live hit (and only live hits on a baseline scan).

A concurrency flock per operation means a long cycle and the next cron tick never collide.

Cron

*/30 * * * *  ~/.local/bin/ghprowl watch --all     >/dev/null 2>&1   # detect
17 5   * * *  ~/.local/bin/ghprowl discover --all   >/dev/null 2>&1   # widen
37 */6 * * *  ~/.local/bin/ghprowl sweep --all      >/dev/null 2>&1   # promote

Scope & ethics

ghprowl is a defensive / authorized-research tool, built to operate within those bounds:

Public data only. It reads public repositories and public code search. Nothing else.
Offline verification. Detection runs locally; a found secret is reported, never used.
The token never moves. Push notifications say "check the host"; the secret stays on disk.
Logins are pivots, not people. Account handles are search leads, not targets for profiling.

Use it on programs you are authorized to test, and follow each program's disclosure rules.

Repo layout & hygiene

bin/ghprowl            # thin dispatcher
lib/                   # generic engine (target-agnostic)
targets/_template/     # the only target dir that ships
targets/<handle>/      # YOUR per-target data — gitignored, never committed

Everything operator-specific (scope, markers, ntfy topics, ledgers, clones, alerts) lives under targets/<handle>/ and is gitignored. Before committing, confirm nothing slipped through:

git diff --cached --name-only | grep -v _template   # should list only engine/docs files

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ghprowl — GitHub Prowler

Why it works

1. Rare, not random

2. Two-depth ledger

3. The escape hatch

Install

Quickstart

Commands

Alerting

Per-target config (`targets/<handle>/config.env`)

How a watch cycle works

Cron

Scope & ethics

Repo layout & hygiene

License

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bin		bin
docs		docs
lib		lib
targets/_template		targets/_template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ghprowl — GitHub Prowler

Why it works

1. Rare, not random

2. Two-depth ledger

3. The escape hatch

Install

Quickstart

Commands

Alerting

Per-target config (targets/<handle>/config.env)

How a watch cycle works

Cron

Scope & ethics

Repo layout & hygiene

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages

Per-target config (`targets/<handle>/config.env`)