Skip to content

tryapitsynandrey-web/ProductPulse

Repository files navigation

ProductPulse — Business Product Analytics Decision Engine

Python CI Pandas Pytest Coverage Architecture

ProductPulse is a modular, rule-based analytics decision engine built to convert raw operational signals from a SaaS product into explainable business decisions. It serves as an automated reasoning layer that calculates KPIs, detects revenue leakage, evaluates churn risk, and generates prioritized, explainable recommendations without relying on opaque machine learning models.

Note: All data used and generated by this project is strictly synthetic. No real customer data exists in this repository. The project does not currently use ML predictions and is not a production SaaS deployment.

Table of Contents

Key Features

Metric Governance Engine

Enforces that metrics computed at runtime strictly align with a declarative YAML catalog. Every KPI must have documented ownership, risk factors, and formulas before execution.

Rule-Based Churn Scoring

Evaluates multiple configured operational signals (e.g., usage drops, high support volume) to flag at-risk accounts deterministically using a safe operator dispatch table — no eval, no ML.

Revenue Leakage Detection

Scans billing states and transaction histories to flag missed collections, failed payments, and unpaid active subscriptions.

Prioritized Recommendations

Rule-based engine translating risks and anomalies into actionable business interventions with impact scores and confidence levels.

Local Streamlit Dashboard

Includes a local-only Streamlit dashboard with an executive cockpit, prioritized actions, Customer 360 drill-down, KPI review, data quality review, risk queues, decision traces, and metric lineage. Features include:

  • Searchable action, customer, recommendation, and trace views
  • Top Actions overview metrics, owner workload, and prioritized action queue
  • Customer 360 overview metrics, prioritized queue, and tabbed drill-downs
  • Quick-view presets for common analysis workflows
  • Decision Brief summaries with Markdown exports
  • Owner workload summaries for handoff queues
  • Scenario simulator with custom assumptions, cohort retention, and funnel analysis views
  • Actionable data freshness, empty-state guidance, and CSV exports

No data leaves your machine. No cloud deployment or authentication required.

End-to-End Orchestration

Unified pipeline managing synthetic data generation, loading, validation, analytics calculation, and reporting in a single deterministic run.

Why It Matters

ProductPulse demonstrates how product, finance, and customer-success signals can be converted into a governed decision system without adding fragile SaaS infrastructure or opaque model behavior.

The project is designed to show senior-level engineering judgment across:

  • Business value: turns churn, usage, billing, and support signals into prioritized intervention queues.
  • Explainability: every recommendation can be traced back to deterministic rules, metric definitions, and decision evidence.
  • Local-first privacy: all data is synthetic and all processing runs on the developer machine.
  • Portfolio readiness: generated snapshots, reports, docs, CI, and dashboard flows are committed for reproducible review.

Dashboard Preview

The committed screenshots below show the local Streamlit demo after a fresh pipeline run. They are useful for quick portfolio review before launching the app locally.

Executive Cockpit

Top Actions Customer 360
Top Actions dashboard Customer 360 dashboard

Scenario Simulator

Architecture

The system utilizes a clean, adapter-driven architecture separating IO from business logic. The src/core/ module is completely agnostic to file formats or data sources, making the logic testable and governance-friendly. See docs/ARCHITECTURE.md for the detailed layer map, data flow, and architecture decisions.

Architecture Snapshot

flowchart LR
    CLI["CLI / Make targets"] --> Pipeline["ProductAnalyticsPipeline"]
    Pipeline --> Loader["DataLoader"]
    Pipeline --> Validation["Validation Gate"]
    Pipeline --> Engines["Core Analytics Engines"]
    Engines --> Decisions["Decision Layer"]
    Decisions --> Serializer["OutputSerializer"]
    Serializer --> CSV["CSV snapshots"]
    Serializer --> SQLite["SQLite dashboard DB"]
    Decisions --> Reports["Markdown reports"]
    SQLite --> Dashboard["Streamlit dashboard"]
Loading

Tech Stack

  • Language: Python 3.10+
  • Libraries: Pandas, NumPy, PyYAML, Streamlit
  • Testing: Pytest (708 tests, 100% coverage gate)
  • Linting: Ruff
  • Type Checking: Pyrefly

Project Structure

business-product-analytics/
├── config/              # Declarative configuration and governance files
├── data/                # Synthetic, processed, and export data
├── reports/             # Output markdown reports
├── notebooks/           # Exploratory analysis notebooks
├── src/
│   ├── adapters/        # IO bounds (data loaders, writers, synthetic generator)
│   ├── core/            # Business logic, engines, and orchestrator
│   ├── models/          # Typed domain objects
│   └── utils/           # Shared helpers and paths
├── app/                 # Streamlit dashboard
└── tests/               # Pytest suite

Quick Start

Installation

  1. Clone the repository
  2. Set up a virtual environment: python3 -m venv .venv
  3. Install the project in editable mode: make setup

One-Command Demo

make demo

This regenerates the local synthetic analytics snapshot and then launches the Streamlit dashboard.

Docker Demo

make docker-demo

This builds a local Docker image, regenerates the synthetic analytics snapshot inside the container, and serves the dashboard at http://localhost:8501.

Stop the container with:

make docker-down

Run the Pipeline

make run

This generates synthetic data, runs all analytics engines, and writes CSV, SQLite, and Markdown report artifacts.

Reset Demo Snapshot

make reset-demo

This removes the local SQLite demo database and regenerates the deterministic synthetic analytics snapshot from the configured seed.

Launch the Dashboard

make dashboard

If the dashboard reports a missing database or stale data, rerun make reset-demo.

CLI Reference

The productpulse CLI is installed by make setup and supports:

productpulse run        # Run the full analytics pipeline
productpulse reset-demo # Rebuild the deterministic local demo snapshot
productpulse status     # Show pipeline and database status
productpulse dashboard  # Launch the Streamlit dashboard

Demo Flow

For a guided review path, see docs/DEMO_FLOW.md.

Recommended reviewer path:

  1. Run make ci to verify quality gates.
  2. Run make reset-demo to regenerate a clean synthetic analytics snapshot.
  3. Run make status to confirm SQLite table health.
  4. Run make dashboard and review Executive Cockpit, Top Actions, Customer 360, Scenario Simulator, Decision Traces, and Metric Lineage.
  5. Open generated Markdown reports in reports/ for the executive summary, intervention plan, risk register, metric definitions, and data quality evidence.

Portfolio Case Study

For a concise reviewer narrative, see docs/PORTFOLIO_CASE_STUDY.md. It summarizes the problem, architecture, UX, quality gates, tradeoffs, screenshots, and review path in one document.

Configuration

Business Rules (YAML)

All business rules and metric definitions reside in the config/ directory:

  • churn_rules.yaml — churn driver weights and thresholds
  • metric_catalog.yaml — tracked KPIs with ownership and formulas
  • recommendation_rules.yaml — recommendation triggers and actions

Reproducibility

Synthetic data generation is anchored by reproducibility.as_of_date and reproducibility.random_seed in config/config.yaml, so repeated pipeline runs produce stable CSV and report snapshots.

Data Storage

Data flows from the data/synthetic/ directory, is processed in memory by Pandas, and output artifacts land in data/exports/. A local SQLite database at data/local/productpulse.db serves the Streamlit dashboard.

Note: For demonstration purposes, generated data and markdown reports are committed to this repository as snapshots. See docs/GENERATED_ARTIFACTS.md for handling noisy diffs during local development.

How It Works

Governance Layer

The project employs a strict Governance Layer to prevent analytical drift. The MetricGovernance engine verifies that the KPIEngine is only computing metrics that have been documented with ownership, risk factors, and formulas in the metric catalog.

Decision Engine Logic

The decision engine applies deterministic thresholding rather than stochastic modeling. This ensures 100% explainability. When a customer is flagged as "Critical Risk," the system produces an exact trace of which rule was triggered and what the recommended intervention is.

Data Flow

Synthetic CSVs → DataLoader → Validation Gate → Analytics Engines
  → Decision Layer → CSV + SQLite + Markdown Reports

Development

Local Commands

make setup      # Install in editable mode with dev dependencies
make run        # Run the full pipeline
make reset-demo # Rebuild the deterministic local demo snapshot
make dashboard  # Launch Streamlit dashboard
make demo       # Run pipeline, then launch Streamlit dashboard
make docker-check # Validate Compose config and build the Docker demo image
make docker-build # Build the local Docker demo image
make docker-demo  # Build and run the containerized local demo
make docker-down  # Stop the Docker demo stack
make status     # Show pipeline status
make test       # Run pytest suite
make lint       # Run Ruff linter
make format     # Apply Ruff formatting
make typecheck  # Run Pyrefly type checker
make coverage   # Run tests with coverage report
make ci         # Full CI: compile + lint + typecheck + coverage

CI Pipeline

make ci compiles Python modules, runs Ruff lint checks, runs Pyrefly type checks, and executes the full pytest suite with the branch-aware coverage gate. make docker-check validates the Docker Compose configuration and builds the containerized demo image. GitHub Actions runs both checks on every push and PR to main.

Guidelines

  • Use pytest for all unit testing.
  • Maintain a strict boundary between src/adapters (IO) and src/core (logic).
  • Do not commit changes to data/ or reports/ unless updating the official snapshot.

Documentation

Document Description
CONTRIBUTING.md Development setup, architecture boundaries, contribution guide
CHANGELOG.md Version history and notable changes
PROJECT_STATUS.md Current implementation phase and capabilities
RELEASE_CHECKLIST.md Pre-release verification steps
docs/ARCHITECTURE.md Layer diagram, data flow, ADRs, and scale boundaries
docs/CONTRIBUTING.md Docs-silo bridge to the canonical contribution guide
docs/DEMO_FLOW.md Guided portfolio and stakeholder review path
docs/GENERATED_ARTIFACTS.md Git policy for generated data and reports
docs/METRIC_CATALOG.md Human-readable metric catalog guide
docs/PORTFOLIO_CASE_STUDY.md Portfolio case study with problem, architecture, UX, tradeoffs, and evidence
reports/executive_summary.md Auto-generated executive analytics summary
reports/metric_definitions.md Full metric catalog with formulas
reports/data_quality_report.md Per-dataset quality scores
reports/intervention_plan.md Prioritized interventions
reports/risk_register.md Churn and revenue leakage risk register

Advanced Analytics

The pipeline also publishes dashboard-ready advanced analytics:

  • scenario_analysis — modeled monthly and annualized upside from churn, activation, ARPU, and failed-payment recovery scenarios, plus dashboard what-if controls for custom assumptions
  • cohort_summary — signup cohort retention and revenue by period
  • funnel_summary and segment_funnel — activation, key action, paid conversion, and month-1 retention funnel views

Skill Mapping

Skill Evidence Business Value Certification Alignment
Analytics Engineering src/core/metric_governance.py Ensures KPI consistency and mitigates reporting risk. Google Data Analytics
Data Governance config/metric_catalog.yaml Democratizes data definitions for cross-functional teams. Microsoft Data Analyst
Diagnostic Analytics src/core/churn_risk.py Converts operational data into retention prioritization. Google Advanced Data Analytics
Business Logic Orchestration src/core/pipeline.py Automates repetitive analytical reporting workflows. Google Project Management
Domain Driven Design src/models/ Creates a typed vocabulary for analytical entities. IBM Data Science

Limitations & Future Work

Current Limitations

  • Synthetic data does not perfectly mirror long-tail enterprise edge cases.
  • In-memory Pandas limits dataset size (does not currently stream from a data warehouse).
  • Recommendations are rule-based, not AI-driven.

Future Improvements

  • Implementation of an LLM summarization layer for executive reports.
  • Advanced statistical cohort retention modeling.

License

MIT License