Skip to content

feat: analytics dashboard with word cloud and NLP pipeline#127

Open
IonesioJunior wants to merge 13 commits intomainfrom
analytics_tmp
Open

feat: analytics dashboard with word cloud and NLP pipeline#127
IonesioJunior wants to merge 13 commits intomainfrom
analytics_tmp

Conversation

@IonesioJunior
Copy link
Copy Markdown
Member

@IonesioJunior IonesioJunior commented Apr 2, 2026

Summary

  • Analytics backend: Separate SQLite database (analytics.db) with QueryEvent model capturing every endpoint query (fire-and-forget via asyncio.create_task). Three API endpoints serve the dashboard: summary stats, time-bucketed series, and top users.
  • Analytics frontend: Full dashboard page with stat cards, Chart.js line/bar charts, user leaderboard, filter dropdowns (time range + endpoint), loading skeletons, and error states. Replaces the previous mock data store with real API integration.
  • Word cloud feature: spaCy NLP pipeline (normalization, stop word removal, lemmatization, n-gram extraction) with batch processing via nlp.pipe(). Canvas-based word cloud renderer with spiral placement, overlap detection, and random rotation. Users can toggle between single words, two-word phrases, and three-word phrases.
  • MPP enhancements: SYFT_MPP_DEV_BYPASS env var to skip blockchain charge during dev, wallet_private_key support for fee_payer gas sponsorship.

Test plan

  • 79 backend unit tests passing (pytest): repository aggregations, collector, handler business logic, schema validation, text processing NLP, word cloud handler, MPP policy
  • 28 frontend unit tests passing (vitest): formatter utilities, Pinia store (fetch, abort, errors, filters, export, word cloud)
  • Integration tested against local SyftHub stack: 35 real queries fired from 5 authenticated users via satellite tokens, analytics verified via API and raw DB
  • All quality gates: ruff format + ruff check, bun run check (format + lint + typecheck)

IrinaMBejan and others added 13 commits April 1, 2026 16:13
…port

Add comprehensive analytics page with stat cards, query volume trends,
user activity, revenue overview, most active users, and trending topics.
Includes filter bar, export button, and Stats nav tab. All mock data.

Made-with: Cursor
Whitespace and line-wrapping changes only — no behavior change.
Add SYFT_MPP_DEV_BYPASS env var to skip on-chain payment during
development while preserving the full policy flow (price matching,
revenue tracking). Add wallet_private_key support to sponsor gas
fees for client transactions via fee_payer.

Includes 8 unit tests covering bypass behavior and fee_payer wiring.
…hboard endpoints

Introduce a separate analytics SQLite database (analytics.db) with a
QueryEvent model for recording every endpoint query. Events are captured
fire-and-forget via asyncio.create_task so they never block responses.

Three new API endpoints serve the dashboard:
- GET /analytics/summary — stat cards (endpoints, queries, revenue, users)
- GET /analytics/time-series — gap-filled bucketed series (daily/weekly/monthly)
- GET /analytics/top-users — ranked by query count

Also adds count_published and count_created_in_range to EndpointRepository.

Includes 51 unit tests: repository aggregations, collector, handler
business logic, schema validation, and endpoint count methods.
Replace hardcoded mock data with live API calls to the new analytics
backend. The Pinia store manages abort-aware fetching for summary,
time series, and top users with independent loading/error states.

Key changes:
- New API client, types, and Pinia store for analytics
- AnalyticsPage rewritten with loading skeletons, error states, and
  direct value binding (eliminates label indirection for filters)
- New formatCompactNumber and formatCurrency utilities
- Delete old analyticsData.ts mock store

Includes 26 unit tests: formatter edge cases and full store coverage
(fetch, abort, errors, filters, export).
Auto-format test files with ruff, fix import sorting, remove unused
imports and variables flagged by ruff check.
The multi-line template expression caused a Vue compiler parse error
at runtime. Collapse to a single semicolon-separated line.
The dataset dropdown added unnecessary complexity to the analytics
filters. Queries are scoped by endpoint (which already implies a
dataset), making the separate dataset filter redundant.
Store the raw user query text in QueryEvent.query_text so it can be
analyzed for word cloud generation. Text is extracted from
request.messages (string or ChatMessageRequest list) in the endpoint
handler and passed through the fire-and-forget event collector.
New text processing pipeline using spaCy for query analysis:
- Normalization (lowercase, strip URLs/emails/punctuation/numbers)
- Stop word removal (standard English + custom domain filter)
- Lemmatization (reducing word forms to roots)
- N-gram extraction (unigrams, bigrams, trigrams)
- Batch processing via nlp.pipe() for efficiency

New API endpoint GET /analytics/word-cloud returns word frequency
data with configurable ngram_size (1-3) and max_words (10-200).

Includes 21 tests: 14 for text processing, 7 for handler + repository.
Full-width Query Topics panel at the bottom of the analytics
dashboard with a canvas-based word cloud renderer (spiral placement,
overlap detection, random rotation and colors). Users can switch
between single words, two-word phrases, and three-word phrases via
a dropdown that re-fetches from the word-cloud API.

Includes custom useWordCloud composable (zero dependencies), store
integration with abort-aware fetching, and 2 new store tests.
- update analytics API call to include max_words: 10 in /analytics/word-cloud params
- delete frontend/src/composables/useWordCloud.ts
- overhaul AnalyticsPage.vue: rename section to "Most Queried Topics", swap Cloud icon for Search, and render word cloud as a list with progress bars and loading/error/no-data states
- update script imports to remove Cloud and add Search
- ensure rendering uses wordCloudWords with appropriate loading/error handling
Resolve conflicts between analytics branch and main's wallets refactor:

- endpoints/handlers.py: keep event_collector (analytics) + adopt
  wallet_repository replacing settings_repository; policy metadata now
  batch-fetches wallets by ID from WalletRepository
- mpp_accounting_type.py: keep SYFT_MPP_DEV_BYPASS and fee_payer/
  wallet_private_key; adopt wallet config read from metadata['wallets']['mpp']
  instead of flat wallet_address/mpp_secret_key fields
- settings/entities.py: accept removal of mpp_secret_key (moved to wallets)
- main.py: wire both analytics collector and WalletRepository into
  EndpointHandler; include both analytics and wallet routes
- uv.lock: accept main's version
- tests/conftest.py: import Wallet entity for SQLAlchemy mapper resolution
- tests/test_mpp_accounting.py: update _make_context() to use new
  metadata['wallets']['mpp'] structure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants