Skip to content

Latest commit

 

History

History
1603 lines (1303 loc) · 64.3 KB

File metadata and controls

1603 lines (1303 loc) · 64.3 KB

RAMSey — Master Plan

Version: 1.1 Date: 2026-02-10 Status: Pre-development planning Changelog: v1.1 — Incorporated review feedback: solver worker boundary, analysis jobs lifecycle, ValueRef in ModelIR, initial state distributions, dependency definitions, audit log, unit enforcement, repair policy schema, numeric metadata in solver output.


Table of Contents

  1. Project Overview
  2. Tech Stack
  3. Architecture
  4. Authentication & Authorization
  5. Data Model & Database Schema
  6. Diagram Types
  7. Analysis Engine
  8. AI Assistance
  9. Export Pipeline
  10. Real-Time Collaboration
  11. UI/UX Design Principles
  12. Deployment & Infrastructure
  13. CI/CD Pipeline
  14. Testing Strategy
  15. Day-One DevOps & Quality
  16. Phased Implementation Roadmap
  17. Future Considerations

1. Project Overview

What is RAMSey?

RAMSey is a modern, web-based, collaborative tool for creating, analyzing, and exporting RAMS (Reliability, Availability, Maintainability, Safety) diagrams. It replaces legacy desktop tools like PRISM's editor with a real-time collaborative environment featuring AI-assisted diagram generation and publication-quality export.

Core Value Propositions

  • Multi-user real-time collaboration — Google Docs-like editing for RAMS diagrams
  • AI copilot — Natural language to diagram, validation, Q&A about models
  • Publication-quality export — LaTeX/TikZ output ready for scientific papers
  • Integrated analysis — Markov solvers, fault tree analysis, importance measures, and more
  • Explainable results — Every computation includes assumptions, warnings, and audit trails
  • Modern UX — Minimalist, professional diagrams; fast, responsive interface

Target Users

  • Reliability engineers
  • Safety analysts
  • Academic researchers
  • Students in RAMS/dependability courses
  • Engineering teams performing system safety assessments

2. Tech Stack

Frontend

Component Technology Purpose
Framework React 19+ UI framework
Language TypeScript (strict mode) Type safety
Build tool Vite Fast HMR, ESM-based builds
Canvas React Flow Node/edge diagram rendering
State management Zustand Lightweight, React Flow recommended
UI components shadcn/ui Accessible, customizable component library
Styling Tailwind CSS Utility-first CSS, dark mode support
Layout engine ELK.js Auto-layout for all diagram types
Client export html-to-image PNG/JPEG export from canvas
Data grid TanStack Table FMEA table view

Backend

Component Technology Purpose
Runtime Node.js Server runtime
Framework Fastify HTTP framework (fast, TypeScript-native)
Language TypeScript (strict mode) Type safety
ORM Prisma Type-safe database access, migrations
Database PostgreSQL 16 Primary data store
Cache Redis Session cache, pub/sub, result caching
Auth Better Auth OAuth, organizations, RBAC
CRDT sync y-websocket Real-time collaboration server
Job queue BullMQ (Redis-backed) Analysis job lifecycle, cancellation, retries
PDF export Puppeteer (headless Chromium) Server-side PDF generation

AI

Component Technology Purpose
SDK Vercel AI SDK (ai package) Provider-agnostic LLM integration
Default provider Claude (Anthropic) Primary LLM for AI features
Pattern Tool-use agent Diagram manipulation via function calling

Infrastructure

Component Technology Purpose
Frontend hosting Vercel Static SPA, CDN, preview deploys
Backend hosting Docker on Fly.io / Railway / VPS Persistent server (WebSockets)
Containers Docker + Docker Compose Local dev and production
Orchestration Kubernetes (future) Horizontal scaling when needed
CI/CD GitHub Actions Automated testing and deployment
Error tracking Sentry Frontend + backend error monitoring

3. Architecture

High-Level System Diagram

┌──────────────────────────────────────────────────────────────────┐
│                          Client (Browser)                         │
│                                                                   │
│  ┌───────────┐  ┌──────────┐  ┌───────────┐  ┌───────────────┐  │
│  │ React Flow│  │ Zustand  │  │ Yjs Doc   │  │ AI Chat Panel │  │
│  │ (canvas)  │◄─┤ (state)  │◄─┤ (CRDT)    │  │ (Vercel AI)   │  │
│  └─────┬─────┘  └──────────┘  └─────┬─────┘  └───────┬───────┘  │
│        │                             │                 │          │
│  ┌─────┴─────────────────────────────┴─────────────────┴───────┐ │
│  │                Diagram → ModelIR Serializer                  │ │
│  └─────────────────────────┬───────────────────────────────────┘ │
│                            │                                      │
│  ┌─────────────────────────┴───────────────────────────────────┐ │
│  │           Client Analysis Engine (Web Worker)                │ │
│  │    Validation, small computations, sensitivity sliders       │ │
│  └─────────────────────────┬───────────────────────────────────┘ │
│                            │ (large models delegated to server)   │
└────────────────────────────┼─────────────────────────────────────┘
                             │
┌────────────────────────────┼─────────────────────────────────────┐
│                     Server │                                      │
│                                                                   │
│  ┌────────────┐ ┌─────────────┐ ┌────────────┐ ┌─────────────┐  │
│  │ Fastify API│ │ Better Auth │ │ y-websocket│ │ AI endpoint │  │
│  │ (REST)     │ │ (OAuth/RBAC)│ │ (CRDT sync)│ │ (LLM proxy) │  │
│  └─────┬──────┘ └─────────────┘ └────────────┘ └─────────────┘  │
│        │                                                          │
│        │ submits jobs via BullMQ                                  │
│        ▼                                                          │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │         Solver Worker (separate process / container)         │ │
│  │   Large CTMC, BDD-based FTA, batch runs, Monte Carlo        │ │
│  │   Consumes jobs from Redis queue                             │ │
│  │   Designed for future migration to Rust/Python/WASM          │ │
│  └─────────────────────────────────────────────────────────────┘ │
│                                                                   │
│  ┌─────┴──────────────────┐  ┌─────────────────────────────────┐ │
│  │      PostgreSQL        │  │           Redis                 │ │
│  │  users, projects,      │  │  sessions, pub/sub, job queue,  │ │
│  │  diagrams, snapshots,  │  │  analysis cache, rate limiting  │ │
│  │  audit log, jobs       │  │                                 │ │
│  └────────────────────────┘  └─────────────────────────────────┘ │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │         Export Service (Headless Chromium container)         │ │
│  └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

Key Architecture Principles

  1. Diagram is UI, Model is the product — Visual canvas and computation layer are decoupled via ModelIR
  2. Offline-first collaboration — Yjs CRDTs queue edits and merge on reconnect
  3. Hybrid computation — Small models run client-side (Web Worker), large models run server-side
  4. Solver as replaceable engine — API server never performs heavy math in-process. Solvers run in a separate worker process/container, communicating via job queue (BullMQ/Redis). Interface is designed so solvers can migrate to Rust/Python/WASM without breaking the platform
  5. AI as agent — The LLM has tools to read and manipulate diagrams, not just chat
  6. Plugin-ready diagram types — Each diagram type is a self-contained module with its own nodes, edges, validation, layout, and serializer
  7. Stateless backend containers — Sessions in Redis, CRDT state in PostgreSQL, enabling horizontal scaling
  8. Everything must be reproducible — Every analysis result stores ModelIR schema version, solver version, options, tolerances, and computation timestamp. Researchers can cite exact solver configurations in publications

4. Authentication & Authorization

Auth Library: Better Auth

  • Framework-agnostic, works with Fastify
  • Prisma adapter for PostgreSQL
  • Built-in plugins: organization (teams), rbac (permissions), two-factor (future)
  • Session-based auth with secure httpOnly cookies

OAuth Providers

Provider Priority Audience
Google v1 Universal
GitHub v1 Developers, researchers
Microsoft / Azure AD v1 University, enterprise SSO
Generic OIDC v2 Institutional identity providers
Email / password v1 Fallback with email verification

Auth Flow (SPA + Separate API)

Browser (Vite SPA)
  │
  ├── Auth pages (login/register) → calls Better Auth endpoints
  │                                  hosted on Fastify server
  ├── OAuth redirect flow         → handled by Better Auth
  │
  └── API calls with session cookie → Fastify validates session
                                      via Better Auth middleware

Team / Organization System

Better Auth's organization plugin manages:

  • Teams (called "organizations" in Better Auth)
  • Members with roles
  • Invitations (by email or invite link)

Team roles:

Role Permissions
admin Manage members, delete projects, full control over team resources
member Create/edit diagrams in team projects, cannot manage team settings

Project-Level Permissions

Separate from team roles — applied per-project:

Role Permissions
owner Full control, delete project, manage shares
editor Create/edit/delete diagrams within the project
viewer Read-only access to all diagrams in the project

Permission Resolution Order

When a user accesses a project:

1. Is user the project creator?              → owner
2. Has direct project_share entry?           → use that role
3. Is project owned by a team?
   └── Is user a team member?
       ├── team role = admin                 → editor (of project content)
       └── team role = member                → editor
4. Has valid share_link token in URL?        → use link's role
5. None of the above                         → 403 Forbidden

Policy note: Team admin grants team management powers (members, settings, delete team projects) but does NOT automatically grant owner on every project. Project ownership is explicit — only the creator or someone explicitly granted owner via project_shares can delete a project or manage its shares. This prevents team admins from silently becoming owners of all team content.

Link Sharing (Google Docs-style)

  • Project owners can generate a share link with a role (editor or viewer)
  • Links contain a UUID token: /invite/:token
  • Links can be set to expire or deactivated
  • Opening a share link grants the user the specified role via project_shares

5. Data Model & Database Schema

All IDs are UUIDs. Timestamps use timestamptz.

Better Auth Managed Tables

These are created and managed by Better Auth + plugins:

-- Better Auth core
users (id, email, name, image, emailVerified, createdAt, updatedAt)
sessions (id, userId, token, expiresAt, ...)
accounts (id, userId, provider, providerAccountId, ...)
verifications (id, identifier, value, expiresAt, ...)

-- Better Auth organization plugin
organizations (id, name, slug, logo, metadata, createdAt)
members (id, organizationId, userId, role, createdAt)
invitations (id, organizationId, email, role, status, ...)

Application Tables

-- ─── Projects ───

CREATE TABLE projects (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name            VARCHAR(255) NOT NULL,
    description     TEXT,
    owner_type      VARCHAR(20) NOT NULL CHECK (owner_type IN ('user', 'team')),
    owner_id        UUID NOT NULL,  -- references users.id OR organizations.id
    created_by      UUID NOT NULL REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Diagrams ───

CREATE TABLE diagrams (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id      UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    name            VARCHAR(255) NOT NULL,
    type            VARCHAR(50) NOT NULL CHECK (type IN (
                        'markov_chain',
                        'fault_tree',
                        'event_tree',
                        'reliability_block_diagram',
                        'bow_tie',
                        'fmea'
                    )),
    yjs_state       BYTEA,              -- persisted Yjs document
    thumbnail       BYTEA,              -- dashboard preview image
    created_by      UUID NOT NULL REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Diagram Snapshots (version history) ───

CREATE TABLE diagram_snapshots (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    diagram_id      UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
    yjs_state       BYTEA NOT NULL,
    label           VARCHAR(255),       -- optional named version
    created_by      UUID NOT NULL REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Project Sharing (direct user access) ───

CREATE TABLE project_shares (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id      UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    user_id         UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    role            VARCHAR(20) NOT NULL CHECK (role IN ('owner', 'editor', 'viewer')),
    granted_by      UUID NOT NULL REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(project_id, user_id)
);

-- ─── Share Links (URL-based access) ───

CREATE TABLE share_links (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    token           UUID UNIQUE NOT NULL DEFAULT gen_random_uuid(),
    project_id      UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    role            VARCHAR(20) NOT NULL CHECK (role IN ('editor', 'viewer')),
    created_by      UUID NOT NULL REFERENCES users(id),
    expires_at      TIMESTAMPTZ,        -- NULL = never expires
    is_active       BOOLEAN NOT NULL DEFAULT true,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Comments / Annotations ───

CREATE TABLE comments (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    diagram_id      UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
    user_id         UUID NOT NULL REFERENCES users(id),
    node_id         VARCHAR(255),       -- anchored to a specific node (nullable)
    position_x      DOUBLE PRECISION,   -- canvas position (nullable)
    position_y      DOUBLE PRECISION,
    content         TEXT NOT NULL,
    resolved        BOOLEAN NOT NULL DEFAULT false,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Notifications ───

CREATE TABLE notifications (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    type            VARCHAR(50) NOT NULL CHECK (type IN (
                        'project_shared',
                        'team_invite',
                        'comment_added',
                        'comment_resolved',
                        'analysis_complete',
                        'mention'
                    )),
    payload         JSONB NOT NULL,     -- flexible data per notification type
    read            BOOLEAN NOT NULL DEFAULT false,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Diagram Templates ───

CREATE TABLE diagram_templates (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name            VARCHAR(255) NOT NULL,
    description     TEXT,
    type            VARCHAR(50) NOT NULL,  -- same enum as diagrams.type
    model_ir        JSONB NOT NULL,        -- template stored as ModelIR
    is_builtin      BOOLEAN NOT NULL DEFAULT false,  -- system vs user-created
    created_by      UUID REFERENCES users(id),       -- NULL for built-in
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Analysis Jobs (lifecycle tracking) ───

CREATE TABLE analysis_jobs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    diagram_id      UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
    requested_by    UUID NOT NULL REFERENCES users(id),
    content_hash    VARCHAR(64) NOT NULL,  -- SHA-256 of ModelIR + options
    method          VARCHAR(50) NOT NULL,  -- analysis method name
    options         JSONB NOT NULL,
    status          VARCHAR(20) NOT NULL DEFAULT 'queued'
                        CHECK (status IN ('queued', 'running', 'succeeded', 'failed', 'canceled')),
    progress        DOUBLE PRECISION DEFAULT 0
                        CHECK (progress >= 0 AND progress <= 1),
    priority        INTEGER NOT NULL DEFAULT 0,
    worker_id       VARCHAR(100),          -- identifies which worker picked up the job
    error_message   TEXT,
    error_stack     TEXT,
    queued_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
    started_at      TIMESTAMPTZ,
    finished_at     TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Analysis Results (cached) ───

CREATE TABLE analysis_results (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    job_id          UUID REFERENCES analysis_jobs(id),  -- link to job that produced this
    diagram_id      UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
    content_hash    VARCHAR(64) NOT NULL,  -- SHA-256 of ModelIR + options
    solver_name     VARCHAR(100) NOT NULL,
    solver_version  VARCHAR(20) NOT NULL,
    options         JSONB NOT NULL,
    results         JSONB NOT NULL,
    trace           JSONB NOT NULL,        -- assumptions, warnings, audit
    numeric_metadata JSONB NOT NULL,       -- method, tolerance, iterations, residual norms
    warnings        JSONB,
    error_bounds    JSONB,                 -- { lower, upper } where applicable
    compute_time_ms INTEGER NOT NULL,
    executed_on     VARCHAR(20) NOT NULL CHECK (executed_on IN ('client', 'server')),
    created_by      UUID NOT NULL REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(diagram_id, content_hash)
);

-- ─── Audit Log ───

CREATE TABLE audit_log (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID REFERENCES users(id),   -- NULL for system events
    action          VARCHAR(100) NOT NULL,        -- e.g., 'project.create', 'share_link.use'
    object_type     VARCHAR(50) NOT NULL,         -- e.g., 'project', 'diagram', 'team'
    object_id       UUID,
    metadata        JSONB,                        -- action-specific details
    ip_address      INET,
    session_id      UUID,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Audited events:
-- auth.login, auth.logout, auth.oauth_connect
-- project.create, project.delete, project.update
-- diagram.create, diagram.delete
-- share_link.create, share_link.use, share_link.revoke
-- project_share.grant, project_share.revoke
-- team.create, team.member_add, team.member_remove, team.role_change
-- analysis.submit, analysis.complete, analysis.fail
-- export.generate, export.download

-- ─── Indexes ───

CREATE INDEX idx_diagrams_project ON diagrams(project_id);
CREATE INDEX idx_diagrams_type ON diagrams(type);
CREATE INDEX idx_snapshots_diagram ON diagram_snapshots(diagram_id);
CREATE INDEX idx_project_shares_user ON project_shares(user_id);
CREATE INDEX idx_project_shares_project ON project_shares(project_id);
CREATE INDEX idx_share_links_token ON share_links(token);
CREATE INDEX idx_comments_diagram ON comments(diagram_id);
CREATE INDEX idx_notifications_user ON notifications(user_id, read);
CREATE INDEX idx_analysis_results_hash ON analysis_results(diagram_id, content_hash);
CREATE INDEX idx_analysis_jobs_diagram ON analysis_jobs(diagram_id);
CREATE INDEX idx_analysis_jobs_status ON analysis_jobs(status);
CREATE INDEX idx_analysis_jobs_hash ON analysis_jobs(content_hash);
CREATE INDEX idx_audit_log_user ON audit_log(user_id, created_at);
CREATE INDEX idx_audit_log_object ON audit_log(object_type, object_id);
CREATE INDEX idx_audit_log_action ON audit_log(action, created_at);

6. Diagram Types

Supported Types (v1)

Type UI Model Layout Algorithm
Markov Chain Nodes (states) + edges (transitions) Force-directed (ELK.js)
Fault Tree (FTA) Tree with logic gate nodes Top-down hierarchical (ELK.js)
Event Tree (ETA) Horizontal branching tree Left-to-right (ELK.js)
Reliability Block Diagram (RBD) Blocks in series/parallel/k-of-n Left-to-right flow (ELK.js)
Bow-Tie Fault tree + central event + event tree Symmetric left-center-right (ELK.js)
FMEA Data table (not a canvas diagram) N/A — uses TanStack Table

Plugin Architecture

Each diagram type is a self-contained module:

src/diagram-types/
├── index.ts                    // diagram type registry
├── markov-chain/
│   ├── nodes/                  // custom React Flow node components
│   │   ├── StateNode.tsx       // circle node for states
│   │   └── index.ts
│   ├── edges/                  // custom edge components
│   │   └── TransitionEdge.tsx  // labeled edge with rate/probability
│   ├── toolbar/                // type-specific toolbar items
│   ├── validation.ts           // outgoing probabilities sum to 1, etc.
│   ├── serializer.ts           // React Flow state → ModelIR
│   ├── layout.ts               // ELK.js config for this type
│   ├── tikz-export.ts          // ModelIR → TikZ code
│   └── index.ts                // diagram type definition
├── fault-tree/
│   ├── nodes/
│   │   ├── GateNode.tsx        // AND/OR/k-of-n/NOT gates
│   │   ├── BasicEventNode.tsx
│   │   └── TopEventNode.tsx
│   ├── ...
├── event-tree/
│   ├── ...
├── rbd/
│   ├── ...
├── bow-tie/
│   ├── ...
└── fmea/
    ├── columns.ts              // TanStack Table column definitions
    ├── validation.ts
    ├── serializer.ts
    └── ...

Diagram Type Definition Interface

interface DiagramTypeDefinition {
  id: DiagramType
  name: string
  description: string
  icon: React.ComponentType

  // React Flow configuration
  nodeTypes: Record<string, React.ComponentType>
  edgeTypes: Record<string, React.ComponentType>

  // Default nodes/edges for a new diagram
  defaultContent: () => { nodes: Node[]; edges: Edge[] }

  // Toolbar items specific to this type
  toolbarItems: ToolbarItem[]

  // Validation rules
  validate: (nodes: Node[], edges: Edge[]) => ValidationResult

  // Convert to ModelIR for analysis
  serialize: (nodes: Node[], edges: Edge[]) => ModelIR

  // Auto-layout configuration for ELK.js
  layoutOptions: ElkLayoutOptions

  // TikZ export
  toTikZ: (modelIR: ModelIR) => string
}

7. Analysis Engine

Core Principle: ModelIR Decouples Diagrams from Solvers

Every diagram type serializes to a shared Intermediate Representation (ModelIR). Solvers consume only ModelIR. This enables:

  • Cross-diagram consistency
  • Fewer duplicated rules
  • New diagram types without rewriting math
  • Cross-type conversions (e.g., RBD ↔ equivalent FTA)

ModelIR Schema

ValueRef — Parameterized Values

All numeric properties in the IR use ValueRef instead of raw numbers. This enables named parameters, scenario management, and expressions:

// ValueRef replaces raw numbers throughout the IR.
// Allows literal values, parameter references, or expressions.
type ValueRef =
  | number                           // literal: 0.001
  | { param: string }                // reference: { param: "lambda_1" }
  | { expr: string }                 // expression: { expr: "2 * lambda_1" }

Unit System

All values in the IR carry units. The normalization step converts to base units before solving:

interface UnitConfig {
  timeBase: 'hours' | 'days' | 'years'   // solver operates in this unit
  rateBase: '1/h' | '1/d' | '1/y'       // derived from timeBase
}
  • User-facing values store original units for display
  • Normalization converts to timeBase before any solver runs
  • Solver results are returned in base units, then converted for display

Core Schema

interface ModelIR {
  version: string                    // schema version for reproducibility
  type: DiagramType
  unitConfig: UnitConfig             // base units for this model
  components: Component[]
  events: Event[]
  gates: Gate[]                      // FTA, Bow-tie
  states: State[]                    // Markov
  transitions: Transition[]          // Markov, ETA
  blocks: Block[]                    // RBD
  barriers: Barrier[]                // Bow-tie
  dependencies: Dependency[]
  parameters: Parameter[]            // named values (lambda, mu, etc.)
  distributions: Distribution[]      // exponential, Weibull, etc.
  initialCondition: InitialCondition // how the model starts
  missionTime?: ValueRef
  repairPolicy?: RepairPolicy
}

interface Component {
  id: string
  name: string
  failureRate?: ValueRef             // ValueRef, not number
  repairRate?: ValueRef              // ValueRef, not number
  distribution?: DistributionRef
  metadata: Record<string, unknown>
}

interface State {
  id: string
  label: string
  type: 'operational' | 'degraded' | 'failed' | 'absorbing'
}

// Initial condition supports single state or distribution across states
type InitialCondition =
  | { type: 'single'; stateId: string }
  | { type: 'distribution'; probabilities: Record<string, number> }

interface Transition {
  id: string
  from: string
  to: string
  rate?: ValueRef                    // ValueRef, not number
  probability?: ValueRef             // ValueRef, not number
  label?: string
  condition?: string
}

interface Gate {
  id: string
  type: 'AND' | 'OR' | 'NOT' | 'K_OF_N' | 'XOR'
  k?: number                         // for K_OF_N
  inputs: string[]                   // child event/gate IDs
  output: string                     // parent event ID
}

interface Parameter {
  name: string                       // e.g., "lambda_1"
  value: number
  unit: string                       // e.g., "1/h" — required, not optional
  description?: string
}

interface Distribution {
  id: string
  type: 'exponential' | 'weibull' | 'lognormal' | 'constant'
  params: Record<string, ValueRef>   // ValueRef, not number
}

Dependency Definitions

Dependencies model correlations and shared-cause relationships between components. Critical for realistic FTA, RBD, and bow-tie analysis:

type Dependency =
  | CommonCauseFailure
  | FunctionalDependency
  | ConditionalProbability
  | InhibitCondition

// Components that share a common failure cause (Beta-factor, MGL, Alpha-factor)
interface CommonCauseFailure {
  type: 'common_cause'
  id: string
  group: string[]                    // component IDs in the CCF group
  model: 'beta_factor' | 'mgl' | 'alpha_factor'
  params: Record<string, ValueRef>   // e.g., { beta: 0.1 }
}

// Component B fails whenever component A fails
interface FunctionalDependency {
  type: 'functional'
  id: string
  source: string                     // component that triggers
  targets: string[]                  // components that depend on it
}

// Probability of B given A (for event trees, dependent branches)
interface ConditionalProbability {
  type: 'conditional'
  id: string
  given: string                      // conditioning event/component
  target: string                     // dependent event/component
  probability: ValueRef
}

// Inhibit gate condition (FTA)
interface InhibitCondition {
  type: 'inhibit'
  id: string
  gate: string                       // gate ID
  condition: string                  // condition event ID
  probability: ValueRef
}

Repair Policy

Defines how repairs are modeled. Schema supports future complexity; v1 implements unlimited only:

interface RepairPolicy {
  type: 'unlimited' | 'single_repairman' | 'priority_queue' | 'k_repairmen'
  maxSimultaneousRepairs?: number    // for k_repairmen
  priorityOrder?: string[]           // component IDs in repair priority
  preemptive?: boolean               // can a higher-priority repair interrupt?
}
// v1 default: { type: 'unlimited' }
// Schema allows future extension without breaking changes

Analysis Request / Response Contract

interface AnalyzeRequest {
  modelIR: ModelIR
  method: AnalysisMethod
  options: AnalysisOptions
  executionTarget: 'client' | 'server' | 'auto'
}

type AnalysisMethod =
  // Markov
  | 'steady_state'
  | 'transient'
  | 'mttf'
  | 'availability'
  // Fault tree
  | 'minimal_cut_sets'
  | 'top_event_probability'
  | 'importance_measures'
  // Event tree
  | 'outcome_probabilities'
  | 'path_ranking'
  // RBD
  | 'system_reliability'
  | 'system_availability'
  // Bow-tie
  | 'end_state_frequencies'
  | 'barrier_effectiveness'
  // FMEA
  | 'rpn_calculation'
  | 'criticality_analysis'
  // General
  | 'validate'

interface AnalysisOptions {
  tolerance?: number                 // convergence tolerance
  maxIterations?: number
  truncationLimit?: number           // cut set order limit
  timePoints?: number[]              // for transient analysis
  missionTime?: number
  confidenceLevel?: number
  method?: string                    // solver algorithm selection
}

interface AnalyzeResponse {
  status: 'success' | 'warning' | 'error'
  solver: {
    name: string
    version: string
  }
  modelIRVersion: string             // schema version used
  contentHash: string                // SHA-256 of ModelIR + options

  metrics: Record<string, number | number[]>
  contributions?: ContributionTable[]
  cutSets?: CutSet[]
  importanceMeasures?: ImportanceMeasure[]

  // Structured numeric metadata — mandatory for every result
  numericMetadata: {
    method: string                   // e.g., "uniformization", "sparse_lu", "bdd"
    tolerance: number                // convergence tolerance used
    iterations?: number              // iterations to convergence
    residualNorm?: number            // final residual norm
    truncation?: {
      enabled: boolean
      threshold: number              // e.g., 1e-12
      cutSetsDropped?: number        // how many cut sets were truncated
    }
    stiffnessDetected?: boolean      // for CTMC solvers
    methodAutoSelected?: boolean     // was the method chosen automatically?
  }

  trace: {
    assumptions: string[]            // e.g., "component independence assumed"
    normalizations: string[]         // e.g., "rates converted from 1/d to 1/h"
    unitConversions: string[]        // e.g., "lambda_1: 0.024/d → 0.001/h"
    simplifications: string[]        // e.g., "k-of-n gate expanded to OR/AND"
    methodDetails: string            // human-readable explanation of method
  }

  warnings: Warning[]
  errorBounds?: {
    lower: number
    upper: number
    description: string              // e.g., "cut set truncation error bound"
  }
  computeTimeMs: number
  timestamp: string                  // ISO 8601 — when this result was computed
}

interface Warning {
  severity: 'info' | 'warning' | 'error'
  code: string
  message: string
  affectedComponents: string[]
}

Solver Details Per Analysis Type

A. Markov Chain / CTMC

Analysis Method Notes
Steady-state Solve piQ = 0 with normalization Sparse linear solvers for large models
Transient Uniformization (default) Krylov methods for large sparse systems
MTTF Absorbing CTMC methods Solve linear system for expected absorption time
Availability Steady-state P(operational states) Sum of operational + degraded state probabilities

Robustness features:

  • Automatic stiffness detection with method selection
  • State-space explosion mitigation: basic lumping, symmetry reduction
  • "Model size / confidence" panel predicting runtime and suggesting simplifications

B. Fault Tree Analysis

Analysis Method Notes
Minimal cut sets BDD-based (exact) for moderate size Approximate/truncated for large trees
Top event probability Exact via BDD when possible Rare-event approximation as labeled option
Importance measures Birnbaum, Fussell-Vesely, RAW, RRW Computed from cut set results

Differentiator: Show error bounds when truncating cut sets, show which cut sets were dropped.

C. Event Tree Analysis

Analysis Method Notes
Outcome probabilities Conditional probability propagation Handle dependencies between branches
Path ranking Ranked by probability Top contributing paths

Differentiator: "What-if branch probability slider" for fast recomputation.

D. Reliability Block Diagram

Analysis Method Notes
System reliability (non-repairable) Combinatorial (series/parallel/k-of-n) Support exponential + Weibull distributions
System availability (repairable) CTMC or Markov approximation Auto-convert to CTMC for repairable k-of-n

Differentiator: Auto-convert RBD ↔ equivalent FTA with side-by-side comparison.

E. Bow-Tie

  • Combined IR linking initiating event frequency, preventive barriers, and mitigative barriers
  • Compute end-state frequencies directly with attribution
  • Differentiator: Barrier management dashboard showing risk reduction per barrier

F. FMEA

Analysis Method Notes
RPN Severity x Occurrence x Detection Classic method, explicitly labeled
Criticality MIL-STD-1629 style Alternative to RPN
Custom scoring Configurable weights User-defined risk formula

Differentiator: Connect FMEA items to model components, show how improvements change system-level risk.

Computation Architecture

Design Rule: Fastify Never Computes

The API server (Fastify) is strictly an orchestrator. It accepts analysis requests, validates them, submits jobs, and returns results. All heavy math runs in separate processes.

Client-Side (Web Worker)

  • Runs in dedicated Web Worker to keep UI responsive
  • Handles: validation, normalization, small/medium computations
  • Threshold: models with < ~50 states (Markov) or < ~100 basic events (FTA)
  • Used for: sensitivity sliders, instant recalc, FMEA RPN
  • Same analyze(modelIR, options) → result interface as server-side

Server-Side (Solver Worker — Separate Process)

  • Runs as a separate process or container, NOT inside Fastify
  • Communicates via BullMQ (Redis-backed job queue)
  • Handles: large CTMC, exact BDD-based FTA, Monte Carlo (future), batch runs
  • Result caching via Redis: hash(ModelIR + options) → cached result
  • Audit logging for reproducibility and compliance

Solver Worker boundary is designed for future migration:

  • v1: Node.js worker process (same language, separate process)
  • Future: Rust (performance), Python/scipy (numerical libraries), or WASM
  • The interface (AnalyzeRequest → AnalyzeResponse) stays the same regardless of implementation language

Job Lifecycle

Client submits analysis request
    │
    ▼
Fastify API validates request
    │
    ├── Small model? → return to client for Web Worker execution
    │
    └── Large model? → submit to BullMQ job queue
                          │
                          ▼
                    Solver Worker picks up job
                    ├── status: queued → running
                    ├── reports progress (0..1)
                    ├── on success: status → succeeded, result cached
                    └── on failure: status → failed, error logged
                          │
                          ▼
                    Client polls or receives WebSocket notification
                    Result returned from cache

Job API endpoints:

  • POST /api/analysis/jobs — submit analysis job
  • GET /api/analysis/jobs/:id — poll status + progress
  • POST /api/analysis/jobs/:id/cancel — cancel running/queued job
  • GET /api/analysis/jobs/:id/result — retrieve cached result

Auto-Detection

executionTarget: 'auto' →
  if (stateCount < 50 && !isStiff)      → client (Web Worker)
  if (stateCount < 500)                 → server (sync worker)
  if (stateCount >= 500 || monteCarlo)  → server (queued job)

User sees: "Running locally..." or "Running on server (estimated ~3s)..." or "Queued (position 2)..."

Math Libraries

Need Library Purpose
Matrix operations mathjs or ml-matrix Steady-state solving, transient analysis
BDD Custom implementation Exact fault tree cut set enumeration
Graph algorithms graphology Reachability, cycle detection, components
Distributions jstat Weibull, exponential, lognormal

Engine File Structure

The engine is a shared package used by both the client Web Worker and the server Solver Worker. This ensures identical results regardless of where computation runs.

packages/engine/                     // shared package (no framework dependencies)
├── src/
│   ├── ir/
│   │   ├── schema.ts                // ModelIR TypeScript types + ValueRef
│   │   ├── validate.ts              // IR validation (structural correctness)
│   │   ├── normalize.ts             // Canonicalization: unit conversion,
│   │   │                            // expand k-of-n, resolve ValueRefs,
│   │   │                            // convert to base units
│   │   └── units.ts                 // Unit conversion utilities
│   ├── serializers/
│   │   ├── markov.ts                // React Flow state → ModelIR
│   │   ├── fault-tree.ts
│   │   ├── event-tree.ts
│   │   ├── rbd.ts
│   │   ├── bow-tie.ts
│   │   └── fmea.ts
│   ├── solvers/
│   │   ├── interface.ts             // Shared Solver interface:
│   │   │                            // analyze(ModelIR, options) → AnalyzeResponse
│   │   ├── registry.ts              // Solver registry (method → solver mapping)
│   │   ├── markov/
│   │   │   ├── steady-state.ts
│   │   │   ├── transient.ts
│   │   │   └── mttf.ts
│   │   ├── fta/
│   │   │   ├── cut-sets.ts
│   │   │   ├── probability.ts
│   │   │   └── importance.ts
│   │   ├── eta/
│   │   │   └── outcome-probability.ts
│   │   ├── rbd/
│   │   │   ├── reliability.ts
│   │   │   └── availability.ts
│   │   ├── bow-tie/
│   │   │   └── end-state-frequency.ts
│   │   └── fmea/
│   │       ├── rpn.ts
│   │       └── criticality.ts
│   └── test-harness/
│       ├── golden-models/           // Known models with verified outputs
│       └── cross-check.ts          // Dual-method verification
│
├── package.json                     // standalone package, no React/Fastify deps

packages/frontend/
├── src/engine/
│   └── worker.ts                    // Web Worker: imports from @ramsey/engine

packages/backend/
├── src/worker/
│   ├── solver-worker.ts             // BullMQ worker: imports from @ramsey/engine
│   └── job-queue.ts                 // BullMQ queue setup + job submission

Solver Test Harness

  • Golden models: Library of known small models with published/hand-verified outputs
  • Cross-checking: Compute via two methods where possible (FTA cut sets vs BDD, RBD vs equivalent FTA)
  • Property tests: Probabilities in [0,1], monotonicity under component improvement
  • Numeric diagnostics: Convergence flags, residual norms, truncation error estimates

8. AI Assistance

Concept: AI as a Diagram Agent

The AI chat is not just Q&A — it's an agent with tools to read, manipulate, and analyze diagrams.

Capabilities

Capability Description
Natural language → Diagram "Create a Markov chain for a redundant pump system with repair"
Q&A about diagram "Is this chain irreducible?" / "What's the MTTF?"
DSL → Diagram Paste DSL in chat, AI parses and generates (when DSL module is built)
Validation & error checking "Check my diagram for problems" — finds structural issues, suggests fixes

Tool System

The AI agent has access to these tools via Vercel AI SDK function calling:

// Diagram manipulation
add_node(type, label, position?, properties?)
add_edge(from, to, label?, properties?)
remove_node(id)
remove_edge(id)
update_node(id, changes)
update_edge(id, changes)
auto_layout()
select_nodes(ids[])
clear_diagram()

// Analysis (calls actual solver, no hallucinated math)
run_steady_state()
run_transient(time)
run_mttf()
run_availability()
run_cut_sets()
run_rpn()
validate_diagram()

// Context
get_diagram_state()
get_selected_nodes()
get_diagram_metadata()

Context Serialization

Before each AI request, the diagram is serialized into a compact text format:

Current diagram (Markov Chain): "Redundant Pump System"
States: S0("Both OK", initial), S1("One failed"), S2("Both failed", absorbing)
Transitions: S0->S1(rate=2*lambda), S1->S2(rate=lambda), S2->S1(rate=mu), S1->S0(rate=mu)
Parameters: lambda=0.001, mu=0.05

Streaming UX

When the AI generates a diagram:

  1. AI text response streams in the chat panel
  2. Nodes and edges appear on canvas as each tool call resolves
  3. Final auto-layout once generation completes

The user watches the AI "draw" the diagram in real-time.

AI Request Flow

Client                              Server
──────                              ──────
Chat UI                             POST /api/ai/chat
  │                                   │
  │  message + diagram state    ────▶ │  system prompt
  │  + conversation history           │  + diagram context
  │                                   │  + tool definitions
  │                                   │
  │  streaming response + tool  ◄──── │  streams from LLM
  │  calls                            │  executes analysis tools
  │                                   │  server-side
  │  applies diagram changes          │
  │  via Yjs (synced to all           │
  │  collaborators)                   │

API Key Management

  • Platform key: RAMSey provides AI features via a shared key, usage-limited per user/team
  • BYO key: Users can configure their own Claude/OpenAI API key in settings for unlimited use

9. Export Pipeline

Client-Side Exports (instant)

Format Method Notes
SVG React Flow toSVG() + style cleanup Clean vector output, infinite scalability
PNG html-to-image Configurable DPI (1x, 2x, 4x)
JPEG html-to-image Configurable DPI, quality setting

Server-Side Exports

Format Method Notes
PDF Puppeteer (headless Chromium) Exact rendering with vector graphics
LaTeX/TikZ Custom serializer per diagram type Publication-ready TikZ code

LaTeX/TikZ Export

Each diagram type has a dedicated TikZ serializer. Example Markov chain output:

\begin{tikzpicture}[->, >=stealth', auto, node distance=3cm,
  thick, every state/.style={circle, draw, minimum size=1.2cm}]

  \node[state]         (S0) {$S_0$};
  \node[state]         (S1) [right of=S0] {$S_1$};
  \node[state]         (S2) [below of=S1] {$S_2$};

  \path (S0) edge [bend left]  node {$\lambda_1$} (S1)
        (S1) edge [bend left]  node {$\mu_1$}     (S0)
        (S1) edge              node {$\lambda_2$} (S2)
        (S2) edge [bend right] node {$\mu_2$}     (S0);

\end{tikzpicture}

Users can paste directly into Overleaf or any LaTeX editor.

Export Service (Docker Container)

Runs headless Chromium in isolation. Receives diagram state, renders, exports. Keeps the main backend lean.

Export Dialog UI

Export Diagram
─────────────────────────────
Format:      [SVG] [PNG] [JPEG] [PDF] [LaTeX]
Scale:       [1x] [2x] [4x]          (PNG/JPEG only)
Background:  [Transparent] [White]    (PNG/SVG only)
Include:     [x] Labels  [x] Probabilities  [ ] Grid
             [x] Title   [ ] Legend
─────────────────────────────
                          [Export]

10. Real-Time Collaboration

CRDT: Yjs

  • y-websocket handles WebSocket sync, awareness (cursors/presence), and reconnection
  • Offline-first: Edits queue locally and merge on reconnect
  • Sub-documents: Large diagrams can be split into sync-able chunks
  • UndoManager: Built-in undo/redo per user session

Awareness Features

  • Cursor positions on canvas — see where collaborators are pointing
  • User colors — each collaborator gets a distinct color
  • Selection highlights — see what others have selected
  • Presence indicators — user avatars in the toolbar showing who's online

Persistence

  • Yjs document state periodically persisted to PostgreSQL (diagrams.yjs_state)
  • Named snapshots saved to diagram_snapshots for version history
  • Auto-save interval: configurable (default: every 30 seconds of inactivity)
  • Manual "Save version" button for named snapshots

Scaling

For multiple backend instances:

  • Redis pub/sub for cross-instance y-websocket message relay
  • Stateless backend containers — all persistent state in PostgreSQL + Redis

11. UI/UX Design Principles

Visual Style

  • Minimalist and professional — diagrams must look publication-ready
  • Thin lines, muted colors, no drop shadows or gradients
  • Serif/math fonts for labels (consistent with LaTeX output)
  • Clean white or transparent backgrounds
  • Color palette: muted blues, grays, with red/amber for warnings/failures

Layout

┌─────────────────────────────────────────────────────────────┐
│  Toolbar: [File] [Edit] [View] [Diagram] [Analysis] [AI]   │
│  ──────────────────────────────────────────────────────────  │
│  ┌──────┐ ┌──────────────────────────┐ ┌────────────────┐   │
│  │      │ │                          │ │                │   │
│  │ Side │ │     Diagram Canvas       │ │   AI Chat /    │   │
│  │ bar  │ │     (React Flow)         │ │   Properties / │   │
│  │      │ │                          │ │   Analysis     │   │
│  │ Node │ │                          │ │   Panel        │   │
│  │ palet│ │                          │ │                │   │
│  │ te + │ │                          │ │   (collapsible │   │
│  │ props│ │                          │ │    tabs)       │   │
│  │      │ │                          │ │                │   │
│  └──────┘ └──────────────────────────┘ └────────────────┘   │
│  Status bar: [Collaborators] [Zoom] [Connection] [Autosave] │
└─────────────────────────────────────────────────────────────┘

Dashboard

┌─────────────────────────────────────────────────────────────┐
│  RAMSey    [My Projects] [Teams] [Templates]    [Profile]   │
│  ──────────────────────────────────────────────────────────  │
│                                                              │
│  Recent Projects                                             │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐                │
│  │ thumbnail │  │ thumbnail │  │    +      │                │
│  │           │  │           │  │  New      │                │
│  │ Project A │  │ Project B │  │  Project  │                │
│  │ 3 diagrams│  │ 1 diagram │  │           │                │
│  │ Team Alpha│  │ Personal  │  │           │                │
│  └───────────┘  └───────────┘  └───────────┘                │
│                                                              │
└─────────────────────────────────────────────────────────────┘

URL Structure

/dashboard                            Personal overview
/project/:projectId                   Project view (list diagrams)
/project/:projectId/d/:diagramId      Diagram editor
/team/:teamSlug                       Team dashboard
/team/:teamSlug/settings              Team management
/invite/:token                        Share link entry point
/templates                            Browse templates
/login                                Auth pages
/register

12. Deployment & Infrastructure

Docker Compose (Development)

services:
  frontend:
    # Vite dev server
    build: ./packages/frontend
    ports: ["5173:5173"]
    volumes: ["./packages/frontend/src:/app/src"]

  backend:
    # Fastify API + y-websocket + Better Auth (orchestrator only, no heavy math)
    build: ./packages/backend
    ports: ["3000:3000"]
    depends_on: [postgres, redis]
    environment:
      DATABASE_URL: postgresql://ramsey:ramsey@postgres:5432/ramsey
      REDIS_URL: redis://redis:6379

  solver-worker:
    # Separate process for analysis computation
    build: ./packages/backend
    command: ["node", "dist/worker/solver-worker.js"]
    depends_on: [postgres, redis]
    environment:
      DATABASE_URL: postgresql://ramsey:ramsey@postgres:5432/ramsey
      REDIS_URL: redis://redis:6379
    # Can be scaled: docker compose up --scale solver-worker=3

  postgres:
    image: postgres:16-alpine
    ports: ["5432:5432"]
    environment:
      POSTGRES_USER: ramsey
      POSTGRES_PASSWORD: ramsey
      POSTGRES_DB: ramsey
    volumes: ["pgdata:/var/lib/postgresql/data"]

  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]

  export-service:
    # Headless Chromium for PDF export
    build: ./packages/export-service
    ports: ["3001:3001"]

volumes:
  pgdata:

Production Deployment

Component Host Notes
Frontend (Vite SPA) Vercel Static files, global CDN, preview deploys
Backend (Fastify + WS) Fly.io / Railway / VPS Must support persistent WebSocket connections
Solver Worker Same container host as backend Separate container, scalable independently
PostgreSQL Managed (Neon, Supabase, RDS) or self-hosted Managed recommended for production
Redis Managed (Upstash, ElastiCache) or self-hosted Upstash has a generous free tier
Export service Same container host as backend Separate container, internal network

Kubernetes (Future)

Design for it now, deploy on it later:

  • Stateless backend containers (sessions in Redis, state in PostgreSQL)
  • Horizontal pod autoscaler on backend based on WebSocket connection count
  • Horizontal pod autoscaler on solver workers based on job queue depth
  • Separate deployments for API, y-websocket, solver worker, export service

13. CI/CD Pipeline

GitHub Actions Workflow

On Pull Request:
├── Lint (ESLint + Prettier)
├── Type check (tsc --noEmit)
├── Unit tests (Vitest)
├── Build check (vite build)
├── Solver golden model tests
└── Vercel preview deploy (automatic)

On Merge to Main:
├── All PR checks
├── E2E tests (Playwright against preview)
├── Vercel production deploy (frontend)
└── Docker build + push → deploy backend

On Release Tag:
└── Versioned Docker image push to container registry

14. Testing Strategy

Unit Tests (Vitest)

  • Analysis engine: Every solver tested against golden models with known outputs
  • ModelIR serializers: Diagram state → IR conversion correctness
  • Validation rules: Per diagram type
  • Permission resolver: Role resolution logic
  • TikZ serializer: Output matches expected LaTeX

Integration Tests

  • API endpoints: Auth flows, project CRUD, sharing, analysis requests
  • CRDT sync: Multi-client edit merging

E2E Tests (Playwright)

  • Core user flows: Register → create project → create diagram → add nodes → run analysis → export
  • Collaboration: Two browser instances editing simultaneously
  • Share link flow: Generate link → open in incognito → verify access level

Golden Model Test Harness

  • Library of known small models with published or hand-verified results
  • Cross-checking via dual methods (FTA cut sets vs BDD, RBD vs equivalent FTA)
  • Property-based tests: probabilities in [0,1], monotonicity, convergence
  • Numerical diagnostics: residual norms, convergence rates

15. Day-One DevOps & Quality

Tool Purpose
ESLint + Prettier Code formatting and linting
Husky + lint-staged Pre-commit hooks (lint, type check)
Sentry Error tracking (frontend + backend)
Pino (structured logging) JSON logs with requestId, userId, projectId, diagramId context. Fastify uses Pino natively — adopt from day one
GitHub branch protection Require PR reviews + passing CI for main
Conventional Commits Standardized commit messages
Dependabot / Renovate Automated dependency updates

16. Phased Implementation Roadmap

Phase 0 — Project Scaffolding

  • Monorepo setup (Turborepo or npm workspaces) with shared @ramsey/engine package
  • Vite + React + TypeScript frontend scaffold
  • Fastify + TypeScript backend scaffold with Pino structured logging
  • Prisma + PostgreSQL schema + migrations (including audit_log, analysis_jobs tables)
  • Docker Compose for local development (frontend, backend, solver-worker, postgres, redis)
  • ESLint, Prettier, Husky, lint-staged
  • GitHub Actions basic CI (lint + type check + build)
  • Better Auth integration (Google + GitHub OAuth)

Phase 1 — Core Drawing (Markov Chains)

  • React Flow canvas with custom Markov chain nodes/edges
  • Node palette sidebar (drag to create)
  • Edge creation (click source → click target)
  • Node/edge property panel (labels, rates, probabilities)
  • Basic validation (probability sums, unreachable states)
  • Auto-layout with ELK.js
  • Undo/redo (Yjs UndoManager)
  • Save/load diagram to PostgreSQL

Phase 2 — Collaboration

  • Yjs + y-websocket integration
  • Multi-user editing on same diagram
  • Cursor/presence awareness (user colors, positions)
  • Conflict-free concurrent edits
  • Offline editing with sync on reconnect
  • Diagram version history (snapshots)

Phase 3 — Projects, Teams, Sharing

  • Dashboard (project list, thumbnails)
  • Project CRUD (create, rename, delete)
  • Multiple diagrams per project
  • Team creation and member management
  • Project sharing (direct + link)
  • Permission enforcement across all endpoints
  • Notifications

Phase 4 — Additional Diagram Types

  • Fault Tree (FTA) — gate nodes, basic events, top-down layout
  • Event Tree (ETA) — horizontal branching
  • Reliability Block Diagram (RBD) — series/parallel/k-of-n blocks
  • Bow-Tie — combined FTA + ETA with central event
  • FMEA — table-based editor with TanStack Table

Phase 5 — Analysis Engine

  • ModelIR schema with ValueRef, dependencies, repair policy, unit config
  • IR validation + normalization (unit conversion, ValueRef resolution, canonicalization)
  • Solver interface (analyze(ModelIR, options) → AnalyzeResponse) with numeric metadata
  • Markov solvers: steady-state, transient, MTTF, availability
  • FTA solvers: minimal cut sets, top event probability, importance measures
  • ETA solver: outcome probabilities, path ranking
  • RBD solver: system reliability, availability
  • Bow-tie solver: end-state frequencies
  • FMEA solver: RPN, criticality
  • Web Worker for client-side computation (shared engine package)
  • BullMQ job queue + Solver Worker (separate process)
  • Job lifecycle: submit, poll, cancel, result retrieval
  • Content-hash caching via Redis
  • Audit logging for analysis runs
  • Golden model test harness
  • Results panel UI with explainability (assumptions, warnings, traces, numeric metadata)

Phase 6 — Export

  • SVG export (React Flow native)
  • PNG / JPEG export (html-to-image, configurable DPI)
  • PDF export (Puppeteer server-side)
  • LaTeX/TikZ export (per diagram type serializer)
  • Export dialog UI

Phase 7 — AI Assistance

  • AI chat panel UI
  • Vercel AI SDK integration with Claude
  • Diagram context serialization
  • Tool definitions for diagram manipulation
  • Natural language → diagram generation
  • Diagram Q&A (context-aware)
  • AI-powered validation and error checking
  • Streaming UX (nodes appear as AI generates)
  • BYO API key support

Phase 8 — Polish & Production

  • Dark mode
  • Keyboard shortcuts
  • Diagram templates (built-in library)
  • Comments/annotations on diagrams
  • Sentry error tracking
  • Performance optimization (large diagrams)
  • Production deployment (Vercel + container host)
  • Documentation / help system

Future Phases

  • DSL (text → diagram) — dedicated syntax per diagram type
  • PRISM model import
  • Monte Carlo / rare-event simulation
  • Scenario management (parameter sets with diffable results)
  • Public REST API for programmatic access
  • i18n (internationalization)
  • Kubernetes deployment
  • SAML/OIDC for enterprise SSO

17. Future Considerations

Items to Revisit

Item Notes
DSL design Deferred — will be added after diagram drawing works
PRISM import Parse .prism files → ModelIR for migration
Monte Carlo simulation Server-side, queued jobs, for rare-event analysis
Enterprise SSO Generic OIDC / SAML via Better Auth
Public API REST API for programmatic model creation and analysis
Mobile support Desktop-first; responsive sidebar is sufficient
Kubernetes When horizontal scaling is needed

Architectural Decisions Log

Decision Choice Rationale
Frontend framework Vite + React Fast HMR, modern ESM, TypeScript-first
Diagram library React Flow Purpose-built for node/edge UIs
CRDT Yjs + y-websocket Most mature JS CRDT, offline-first, awareness protocol
Auth Better Auth Framework-agnostic, organization + RBAC plugins, active development
Backend framework Fastify Fast, TypeScript-native, plugin system
ORM Prisma Type-safe, auto-migrations, PostgreSQL support
State management Zustand Lightweight, React Flow recommended
Layout engine ELK.js Covers all needed layout algorithms
AI SDK Vercel AI SDK Provider-agnostic, streaming, tool-use support
ID strategy UUID everywhere No sequential ID exposure, safe for sharing
Team roles admin / member Simple, extensible via Better Auth
Project roles owner / editor / viewer Granular per-project access
Team admin ≠ project owner Explicit policy Team admins manage membership, not automatic project owners
Computation Hybrid client/server Snappy UX for small models, server power for large
Solver boundary Separate worker process (BullMQ) API never computes; solvers replaceable with Rust/Python
Job queue BullMQ (Redis-backed) Job lifecycle, cancellation, retries, scaling
ModelIR values ValueRef (literal / param / expr) Enables scenarios, parameter sweeps, shared parameters
Unit handling Normalize to base units Consistent solver input, display in original units
Structured logging Pino Native to Fastify, JSON logs with request context
Audit trail audit_log table Compliance, trust, who-did-what traceability
Frontend hosting Vercel Static SPA, CDN, preview deploys
Backend hosting Docker on Fly.io / Railway / VPS WebSocket support, persistent processes

This document is the single source of truth for RAMSey's architecture and implementation plan. Update it as decisions evolve.