A multi-target watcher for credentials and internal references that leak into public GitHub. Engineers commit tokens to sample repos, paste connection strings into gists, and push dotfiles that point at internal hosts. ghprowl watches many programs at once, finds those leaks the moment they appear, and pings you — without ever moving the secret off your machine.
It is config-driven: a generic engine in lib/, one small config.env + marker catalog per target.
The engine ships in this repo; per-target data never does (see Hygiene).
Built for authorized bug-bounty / security research. Public data only. See Scope & ethics.
The design story: Rare, Not Random — the reasoning behind the method, the engineering tradeoffs, and an A/B against a hand-tuned baseline (more recall, a third of the clone load).
Three ideas do most of the lifting.
The naive way to find a program's leaks is to search GitHub for its name. That drowns you in tutorials and forks. ghprowl instead builds a marker catalog — endpoints, internal hostnames, env-var names, custom headers, proto packages, SDK module names — and ranks every marker by its global code-search frequency, then queries the rarest first.
A marker with a global count of 0–10 is a near-perfect signal: almost the only code that contains
internal-gateway.corp.example is code that shouldn't be public. A marker with a count of 50k
(api_key, AUTH_TOKEN, a vendor's public SDK URL) is noise. Rarity is precision.
This applies to hand-added markers too — "distinctive" is not the same as "rare." A public SDK repository URL or a session-cookie name is distinctive to a target yet appears in every integrator's build config. Rank everything; trust the rare.
Watching every candidate by cloning it doesn't scale. ghprowl splits findings into two tiers:
- deep — high-confidence (a rare-marker hit). Cloned and gitleaks-scanned every cycle.
- light — speculative (a name-pattern match, a contributor, a fork). Tracked only; promoted to
deep automatically when it later trips a marker (
sweep).
A third state, confidence none, quarantines squatters and recon dumps so they never waste a
clone. The result: a wide net with a small, high-signal clone set. Most light entries are tripwires —
empty or unrelated accounts whose value is the day one of them fat-fingers a push.
Auto-onboarding (setup) derives markers from public sources — program scope, org-repo READMEs,
on-disk recon. That reliably reconstructs a target's service inventory. It cannot reconstruct the
non-public gold: a client_id captured from traffic, an internal API-gateway id, a proto package
name. Those are the highest-signal markers, and they are added by hand as manual markers, which are
always queried regardless of the per-run budget. The escape hatch is mandatory, not optional —
but its markers get rarity-ranked like everything else.
Requirements: gh (authenticated), gitleaks,
jq, curl. Optional: qrencode (for topics qr), sqlite3 + a HackerOne scope cache (richer setup).
git clone https://github.com/DoctorGoz/ghprowl ~/.ghprowl
ln -s ~/.ghprowl/bin/ghprowl ~/.local/bin/ghprowl # put it on PATH
ghprowl listThe clone is the install: ~/.ghprowl is the working tree, and your per-target data lives inside
it under targets/<handle>/ (gitignored).
ghprowl setup acme-corp # onboard: derive scope -> markers -> FIT check -> DRAFT config (stops for review)
# # REVIEW targets/acme-corp/{config.env,markers.tsv}; add manual gold markers
ghprowl discover acme-corp # widen the net: rare-marker code-search + pivots, tiered into the ledger
ghprowl watch acme-corp # detect: clone+gitleaks the deep set, alert on a LIVE token
ghprowl status # dashboard across all targetssetup deliberately stops for review — it produces a draft, never a live config, and reports a
FIT verdict (if a target has no meaningful public GitHub footprint, it says so rather than running
a dead watcher). That review step — prune noisy markers, paste in your non-public gold — is where the
precision comes from.
| Command | What it does |
|---|---|
setup <handle> |
Onboard a target: derive scope → markers → FIT check → draft config.env + catalog. Stops for review. |
rank <target> |
(Re)rank the marker catalog by global rarity, rarest first. |
discover <target> |
Widen the net: rare-marker code-search + contributor/fork pivots, tiered into the ledger. |
sweep <target> [N] |
Promotion pass: light → deep when a tracked entity now trips a marker. |
watch <target> |
Detection: clone/refresh the deep set, gitleaks-scan changed repos, alert on a LIVE hit. |
status [target] |
No-network dashboard: per-target tiers, markers, alerts, last run. |
topics [target] |
List each target's ntfy topic (the subscribe key) + URL. |
topics qr <target> |
Render the subscribe URL as a scannable terminal QR (no phone typing). |
topics test <target> |
Send a harmless confirmation ping to verify a subscription end-to-end. |
list |
List configured targets. |
watch, discover, and sweep accept --all instead of a target to run every configured target —
the entry point for cron.
A hit is announced three ways, and the secret never leaves the host:
- ntfy push — a heads-up only (
"Actionable hit in <repo>. Check the local file."). The token is never in the message. Each target gets a unique auto-minted topic; subscribe once (ghprowl topics qr <target>→ scan). URGENT-LIVE-TOKEN.txt— local, LIVE hits only, with the repo, commit URL, and the exactgit showcommand to view the secret offline.alerts/ALERTS.md— local append-only log of every hit.
A baseline (first) scan of a repo suppresses historical findings and alerts only on live tokens,
so onboarding a target doesn't bury you. watch exits non-zero on a hit for cron-side handling.
| Key | Purpose |
|---|---|
GP_GH_ORGS |
Org(s) that root discovery and are watched directly. |
GP_DOMAINS |
In-scope domains (host markers + issuer derivation). |
GP_ISSUER_SUBSTR |
JWT iss substrings that mark a token as this target's (the live check). |
GP_SOFT_KEYWORDS |
Broad relevance keywords (issuer/code detection). |
GP_SWEEP_KEYWORDS |
Curated brand-coined subset for the name-sweep + interest test. Prune generic words here. |
GP_NAME_PATTERNS |
Work-account login patterns to sweep (e.g. *-acme / acme-*). |
GP_FORK_PIVOT |
0 by default; 1 enables bounded fork enumeration — only sane on small targets. |
GP_NTFY_TOPIC |
Alert topic (auto-minted; the token itself never transmitted). |
GP_MARKER_BUDGET |
How many of the rarest markers to spend the code-search budget on per discover. |
GP_MAX_REPO_MB |
watch skips repos larger than this (default 500) so one monorepo can't wedge a cycle. |
The marker catalog (markers.tsv → ranked markers-ranked.tsv) carries kinds like endpoint,
internal-host, env-var, header, proto-pkg, and manual (your hand-added gold).
For each repo in the deep set (org repos + marker-hit leaks):
- Big-repo guard — check GitHub's reported size first; skip + warn if over
GP_MAX_REPO_MB(override a specific repo viaforce-big.txt). Unattended cron can't prompt, so the design is skip-and-warn, not ask. - Clone or fetch; skip unchanged repos via a per-repo commit-set checkpoint.
- gitleaks full-history scan → an issuer-aware post-filter keeps only target-relevant secrets.
- Alert on a live hit (and only live hits on a baseline scan).
A concurrency flock per operation means a long cycle and the next cron tick never collide.
*/30 * * * * ~/.local/bin/ghprowl watch --all >/dev/null 2>&1 # detect
17 5 * * * ~/.local/bin/ghprowl discover --all >/dev/null 2>&1 # widen
37 */6 * * * ~/.local/bin/ghprowl sweep --all >/dev/null 2>&1 # promoteghprowl is a defensive / authorized-research tool, built to operate within those bounds:
- Public data only. It reads public repositories and public code search. Nothing else.
- Offline verification. Detection runs locally; a found secret is reported, never used.
- The token never moves. Push notifications say "check the host"; the secret stays on disk.
- Logins are pivots, not people. Account handles are search leads, not targets for profiling.
Use it on programs you are authorized to test, and follow each program's disclosure rules.
bin/ghprowl # thin dispatcher
lib/ # generic engine (target-agnostic)
targets/_template/ # the only target dir that ships
targets/<handle>/ # YOUR per-target data — gitignored, never committed
Everything operator-specific (scope, markers, ntfy topics, ledgers, clones, alerts) lives under
targets/<handle>/ and is gitignored. Before committing, confirm nothing slipped through:
git diff --cached --name-only | grep -v _template # should list only engine/docs filesMIT — see LICENSE.