Skip to content

feat: search CLI (query / similar / rehydrate / purge / audit)#12

Open
chrisaddams wants to merge 3 commits into
mainfrom
feat/search
Open

feat: search CLI (query / similar / rehydrate / purge / audit)#12
chrisaddams wants to merge 3 commits into
mainfrom
feat/search

Conversation

@chrisaddams
Copy link
Copy Markdown
Contributor

Summary

Adds full-text search and search-index management to the CLI, plus an audit command that flags data leaks in public search.

anythink search query <text>          [--entities] [--filter] [--sort] [--facet]
                                      [--highlight] [--page] [--limit]
                                      [--public] [--json]
anythink search similar <entity> <id> [--limit] [--public] [--json]
anythink search rehydrate [<entity>]  [-y]   # admin
anythink search purge     [<entity>]  [-y]   # admin
anythink search audit <entity>        [--query] [--sample] [--json]

Notable bits

Security: --public is genuinely anonymous

The shared HttpClient sets the user's bearer token as a default header on every request. That's wrong for --public — the audit's whole purpose is to reflect what an unauthenticated visitor sees, and a token-bearing request would just return the authenticated user's RLS-filtered view.

Fixed by holding a second HttpClient (_anonymousHttp) with no auth headers, used for /search/public and /search/public/similar. Two tests assert the production constructor produces an anonymous client with no Authorization or X-API-Key headers — would catch a regression of the original bug.

search audit

Compares what the entity's schema says should be public-searchable (is_public=true + per-field publicly_searchable=true) with what /search/public actually returns. Any field present in public results but not on the allowlist is flagged.

✗ LEAK DETECTED — 1 field(s) returned in public search are not marked publicly_searchable:
  • draft_notes

Fix by either:
  • Marking these fields as publicly_searchable=true if they should be public
  • Or fixing the index/serialization so they aren't included

Exits with code 1 on leaks — useful for CI/CD security gates.

Wire format lock-in

SearchResult uses snake_case (page_size, total_items, has_next_page, retrieval_time, facet_distribution). Initial draft used camelCase from a stale spec; would have silently shown "0 matches" forever. Test asserts the snake_case parsing.

BACKLOG.md

Captures outstanding CLI fixes and AnyAPI server-side items discovered during integrations + step-delete + search work. Two relevant entries:

  • /search/public returns 500 on getahead-prod regardless of auth (server-side bug, blocking the audit's visitor-view call)
  • workflows trigger payload-wrapping bug, plus several QoL items

Tests

  • 13 new in tests/SearchTests.cs covering wire format, query-string assembly, public/auth route split, similar-docs URL building, rehydrate / purge per-entity vs global, admin 403 propagation
  • 2 reflection-based assertions verifying the production constructor produces an anonymous HttpClient with no auth headers
  • All 207 tests passing

Verified end-to-end

  • search query "workout" against getahead-prod returned 2 mental_edge_options with proper __name labels — confirmed wire format is correct
  • search audit mental_edge_options correctly reported the entity is not public, returned 3 results from auth search, short-circuited the audit cleanly
  • Authenticated-vs-anonymous: production AnythinkClient(token: "...") produces an _anonymousHttp with no Authorization header
  • Build clean, 0 warnings

chrisaddams and others added 2 commits May 10, 2026 14:39
Search
------
- search query <text>: full-text search across one or more entities. Supports
  --filter (with _geoRadius / _geoBoundingBox), --sort, --facet, --highlight,
  --page / --limit, --public (route to /search/public), --json (raw output).
- search similar <entity> <id>: vector search for documents similar to a
  given record. --public for the unauthenticated equivalent.
- search rehydrate [<entity>]: rebuild the search index from the database.
  Per-entity or whole-tenant. Admin-gated server-side.
- search purge [<entity>]: wipe the search index. Confirms by default; -y
  for automation. Admin-gated.
- search audit <entity>: compares fields configured publicly_searchable
  with what /search/public actually returns; flags any leaks. Exits 1 on
  detected leaks for CI/CD integration. Also reports configured-but-absent
  fields (allowlisted but not present in sample data).

Security: anonymous public requests
-----------------------------------
The shared HttpClient sets the user's bearer token as a default header on
every request. That's wrong for --public and the audit's "visitor view"
call — we'd be reflecting the authenticated user's RLS, not the public
visitor's experience.

AnythinkClient now holds a second HttpClient (`_anonymousHttp`) with no
auth headers, used for /search/public and /search/public/similar. The
test mocks share the same HttpClient instance so URL/body assertions
still work, but production correctly drops the token.

Two tests assert the production constructor produces an anonymous client
with no Authorization or X-API-Key headers — would catch a regression of
the original bug.

Wire format
-----------
SearchResult uses snake_case JSON property names (page_size, total_items,
has_next_page, retrieval_time, facet_distribution). Verified against the
live API; the explore agent had reported camelCase originally, would have
silently rendered "0 matches" forever. Lock-in test included.

Backlog
-------
Added BACKLOG.md tracking outstanding CLI fixes and AnyAPI server-side
items so they don't get lost between sessions. Includes follow-ups
discovered during this work.

Tests: 207/207 (13 new search tests + 2 anonymous-headers assertions).
Verified end-to-end against staging:
- Authenticated search returned matching records with __name labels
- search audit on a non-public entity short-circuits cleanly
- Build clean, 0 warnings.
Comment thread BACKLOG.md Outdated
Per @morgang5522's review on this PR — the backlog should live in
the GitHub project, not in code. Adds BACKLOG.md (and
MIGRATIONS-PLAN.md for the same reason) to .gitignore and untracks
the existing file; the local file stays on disk as a working copy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants