ProductPulse is a modular, rule-based analytics decision engine built to convert raw operational signals from a SaaS product into explainable business decisions. It serves as an automated reasoning layer that calculates KPIs, detects revenue leakage, evaluates churn risk, and generates prioritized, explainable recommendations without relying on opaque machine learning models.
Note: All data used and generated by this project is strictly synthetic. No real customer data exists in this repository. The project does not currently use ML predictions and is not a production SaaS deployment.
- Key Features
- Why It Matters
- Dashboard Preview
- Architecture
- Quick Start
- Demo Flow
- Portfolio Case Study
- Configuration
- How It Works
- Development
- Documentation
- Advanced Analytics
- Skill Mapping
- Limitations & Future Work
- License
Enforces that metrics computed at runtime strictly align with a declarative YAML catalog. Every KPI must have documented ownership, risk factors, and formulas before execution.
Evaluates multiple configured operational signals (e.g., usage drops, high
support volume) to flag at-risk accounts deterministically using a safe
operator dispatch table — no eval, no ML.
Scans billing states and transaction histories to flag missed collections, failed payments, and unpaid active subscriptions.
Rule-based engine translating risks and anomalies into actionable business interventions with impact scores and confidence levels.
Includes a local-only Streamlit dashboard with an executive cockpit, prioritized actions, Customer 360 drill-down, KPI review, data quality review, risk queues, decision traces, and metric lineage. Features include:
- Searchable action, customer, recommendation, and trace views
- Top Actions overview metrics, owner workload, and prioritized action queue
- Customer 360 overview metrics, prioritized queue, and tabbed drill-downs
- Quick-view presets for common analysis workflows
- Decision Brief summaries with Markdown exports
- Owner workload summaries for handoff queues
- Scenario simulator with custom assumptions, cohort retention, and funnel analysis views
- Actionable data freshness, empty-state guidance, and CSV exports
No data leaves your machine. No cloud deployment or authentication required.
Unified pipeline managing synthetic data generation, loading, validation, analytics calculation, and reporting in a single deterministic run.
ProductPulse demonstrates how product, finance, and customer-success signals can be converted into a governed decision system without adding fragile SaaS infrastructure or opaque model behavior.
The project is designed to show senior-level engineering judgment across:
- Business value: turns churn, usage, billing, and support signals into prioritized intervention queues.
- Explainability: every recommendation can be traced back to deterministic rules, metric definitions, and decision evidence.
- Local-first privacy: all data is synthetic and all processing runs on the developer machine.
- Portfolio readiness: generated snapshots, reports, docs, CI, and dashboard flows are committed for reproducible review.
The committed screenshots below show the local Streamlit demo after a fresh pipeline run. They are useful for quick portfolio review before launching the app locally.
| Top Actions | Customer 360 |
|---|---|
![]() |
![]() |
The system utilizes a clean, adapter-driven architecture separating IO from
business logic. The src/core/ module is completely agnostic to file formats
or data sources, making the logic testable and governance-friendly.
See docs/ARCHITECTURE.md for the detailed layer map,
data flow, and architecture decisions.
flowchart LR
CLI["CLI / Make targets"] --> Pipeline["ProductAnalyticsPipeline"]
Pipeline --> Loader["DataLoader"]
Pipeline --> Validation["Validation Gate"]
Pipeline --> Engines["Core Analytics Engines"]
Engines --> Decisions["Decision Layer"]
Decisions --> Serializer["OutputSerializer"]
Serializer --> CSV["CSV snapshots"]
Serializer --> SQLite["SQLite dashboard DB"]
Decisions --> Reports["Markdown reports"]
SQLite --> Dashboard["Streamlit dashboard"]
- Language: Python 3.10+
- Libraries: Pandas, NumPy, PyYAML, Streamlit
- Testing: Pytest (708 tests, 100% coverage gate)
- Linting: Ruff
- Type Checking: Pyrefly
business-product-analytics/
├── config/ # Declarative configuration and governance files
├── data/ # Synthetic, processed, and export data
├── reports/ # Output markdown reports
├── notebooks/ # Exploratory analysis notebooks
├── src/
│ ├── adapters/ # IO bounds (data loaders, writers, synthetic generator)
│ ├── core/ # Business logic, engines, and orchestrator
│ ├── models/ # Typed domain objects
│ └── utils/ # Shared helpers and paths
├── app/ # Streamlit dashboard
└── tests/ # Pytest suite
- Clone the repository
- Set up a virtual environment:
python3 -m venv .venv - Install the project in editable mode:
make setup
make demoThis regenerates the local synthetic analytics snapshot and then launches the Streamlit dashboard.
make docker-demoThis builds a local Docker image, regenerates the synthetic analytics snapshot
inside the container, and serves the dashboard at http://localhost:8501.
Stop the container with:
make docker-downmake runThis generates synthetic data, runs all analytics engines, and writes CSV, SQLite, and Markdown report artifacts.
make reset-demoThis removes the local SQLite demo database and regenerates the deterministic synthetic analytics snapshot from the configured seed.
make dashboardIf the dashboard reports a missing database or stale data, rerun
make reset-demo.
The productpulse CLI is installed by make setup and supports:
productpulse run # Run the full analytics pipeline
productpulse reset-demo # Rebuild the deterministic local demo snapshot
productpulse status # Show pipeline and database status
productpulse dashboard # Launch the Streamlit dashboardFor a guided review path, see docs/DEMO_FLOW.md.
Recommended reviewer path:
- Run
make cito verify quality gates. - Run
make reset-demoto regenerate a clean synthetic analytics snapshot. - Run
make statusto confirm SQLite table health. - Run
make dashboardand review Executive Cockpit, Top Actions, Customer 360, Scenario Simulator, Decision Traces, and Metric Lineage. - Open generated Markdown reports in
reports/for the executive summary, intervention plan, risk register, metric definitions, and data quality evidence.
For a concise reviewer narrative, see docs/PORTFOLIO_CASE_STUDY.md. It summarizes the problem, architecture, UX, quality gates, tradeoffs, screenshots, and review path in one document.
All business rules and metric definitions reside in the config/ directory:
churn_rules.yaml— churn driver weights and thresholdsmetric_catalog.yaml— tracked KPIs with ownership and formulasrecommendation_rules.yaml— recommendation triggers and actions
Synthetic data generation is anchored by reproducibility.as_of_date and
reproducibility.random_seed in config/config.yaml, so repeated pipeline
runs produce stable CSV and report snapshots.
Data flows from the data/synthetic/ directory, is processed in memory by
Pandas, and output artifacts land in data/exports/. A local SQLite database
at data/local/productpulse.db serves the Streamlit dashboard.
Note: For demonstration purposes, generated data and markdown reports are committed to this repository as snapshots. See docs/GENERATED_ARTIFACTS.md for handling noisy diffs during local development.
The project employs a strict Governance Layer to prevent analytical drift.
The MetricGovernance engine verifies that the KPIEngine is only computing
metrics that have been documented with ownership, risk factors, and formulas
in the metric catalog.
The decision engine applies deterministic thresholding rather than stochastic modeling. This ensures 100% explainability. When a customer is flagged as "Critical Risk," the system produces an exact trace of which rule was triggered and what the recommended intervention is.
Synthetic CSVs → DataLoader → Validation Gate → Analytics Engines
→ Decision Layer → CSV + SQLite + Markdown Reports
make setup # Install in editable mode with dev dependencies
make run # Run the full pipeline
make reset-demo # Rebuild the deterministic local demo snapshot
make dashboard # Launch Streamlit dashboard
make demo # Run pipeline, then launch Streamlit dashboard
make docker-check # Validate Compose config and build the Docker demo image
make docker-build # Build the local Docker demo image
make docker-demo # Build and run the containerized local demo
make docker-down # Stop the Docker demo stack
make status # Show pipeline status
make test # Run pytest suite
make lint # Run Ruff linter
make format # Apply Ruff formatting
make typecheck # Run Pyrefly type checker
make coverage # Run tests with coverage report
make ci # Full CI: compile + lint + typecheck + coveragemake ci compiles Python modules, runs Ruff lint checks, runs Pyrefly type
checks, and executes the full pytest suite with the branch-aware coverage gate.
make docker-check validates the Docker Compose configuration and builds the
containerized demo image. GitHub Actions runs both checks on every push and PR
to main.
- Use
pytestfor all unit testing. - Maintain a strict boundary between
src/adapters(IO) andsrc/core(logic). - Do not commit changes to
data/orreports/unless updating the official snapshot.
| Document | Description |
|---|---|
| CONTRIBUTING.md | Development setup, architecture boundaries, contribution guide |
| CHANGELOG.md | Version history and notable changes |
| PROJECT_STATUS.md | Current implementation phase and capabilities |
| RELEASE_CHECKLIST.md | Pre-release verification steps |
| docs/ARCHITECTURE.md | Layer diagram, data flow, ADRs, and scale boundaries |
| docs/CONTRIBUTING.md | Docs-silo bridge to the canonical contribution guide |
| docs/DEMO_FLOW.md | Guided portfolio and stakeholder review path |
| docs/GENERATED_ARTIFACTS.md | Git policy for generated data and reports |
| docs/METRIC_CATALOG.md | Human-readable metric catalog guide |
| docs/PORTFOLIO_CASE_STUDY.md | Portfolio case study with problem, architecture, UX, tradeoffs, and evidence |
| reports/executive_summary.md | Auto-generated executive analytics summary |
| reports/metric_definitions.md | Full metric catalog with formulas |
| reports/data_quality_report.md | Per-dataset quality scores |
| reports/intervention_plan.md | Prioritized interventions |
| reports/risk_register.md | Churn and revenue leakage risk register |
The pipeline also publishes dashboard-ready advanced analytics:
scenario_analysis— modeled monthly and annualized upside from churn, activation, ARPU, and failed-payment recovery scenarios, plus dashboard what-if controls for custom assumptionscohort_summary— signup cohort retention and revenue by periodfunnel_summaryandsegment_funnel— activation, key action, paid conversion, and month-1 retention funnel views
| Skill | Evidence | Business Value | Certification Alignment |
|---|---|---|---|
| Analytics Engineering | src/core/metric_governance.py |
Ensures KPI consistency and mitigates reporting risk. | Google Data Analytics |
| Data Governance | config/metric_catalog.yaml |
Democratizes data definitions for cross-functional teams. | Microsoft Data Analyst |
| Diagnostic Analytics | src/core/churn_risk.py |
Converts operational data into retention prioritization. | Google Advanced Data Analytics |
| Business Logic Orchestration | src/core/pipeline.py |
Automates repetitive analytical reporting workflows. | Google Project Management |
| Domain Driven Design | src/models/ |
Creates a typed vocabulary for analytical entities. | IBM Data Science |
- Synthetic data does not perfectly mirror long-tail enterprise edge cases.
- In-memory Pandas limits dataset size (does not currently stream from a data warehouse).
- Recommendations are rule-based, not AI-driven.
- Implementation of an LLM summarization layer for executive reports.
- Advanced statistical cohort retention modeling.
MIT License



