RAMSey — Master Plan

Version: 1.1 Date: 2026-02-10 Status: Pre-development planning Changelog: v1.1 — Incorporated review feedback: solver worker boundary, analysis jobs lifecycle, ValueRef in ModelIR, initial state distributions, dependency definitions, audit log, unit enforcement, repair policy schema, numeric metadata in solver output.

Project Overview
Tech Stack
Architecture
Authentication & Authorization
Data Model & Database Schema
Diagram Types
Analysis Engine
AI Assistance
Export Pipeline
Real-Time Collaboration
UI/UX Design Principles
Deployment & Infrastructure
CI/CD Pipeline
Testing Strategy
Day-One DevOps & Quality
Phased Implementation Roadmap
Future Considerations

1. Project Overview

What is RAMSey?

RAMSey is a modern, web-based, collaborative tool for creating, analyzing, and exporting RAMS (Reliability, Availability, Maintainability, Safety) diagrams. It replaces legacy desktop tools like PRISM's editor with a real-time collaborative environment featuring AI-assisted diagram generation and publication-quality export.

Core Value Propositions

Multi-user real-time collaboration — Google Docs-like editing for RAMS diagrams
AI copilot — Natural language to diagram, validation, Q&A about models
Publication-quality export — LaTeX/TikZ output ready for scientific papers
Integrated analysis — Markov solvers, fault tree analysis, importance measures, and more
Explainable results — Every computation includes assumptions, warnings, and audit trails
Modern UX — Minimalist, professional diagrams; fast, responsive interface

Target Users

Reliability engineers
Safety analysts
Academic researchers
Students in RAMS/dependability courses
Engineering teams performing system safety assessments

2. Tech Stack

Frontend

Component	Technology	Purpose
Framework	React 19+	UI framework
Language	TypeScript (strict mode)	Type safety
Build tool	Vite	Fast HMR, ESM-based builds
Canvas	React Flow	Node/edge diagram rendering
State management	Zustand	Lightweight, React Flow recommended
UI components	shadcn/ui	Accessible, customizable component library
Styling	Tailwind CSS	Utility-first CSS, dark mode support
Layout engine	ELK.js	Auto-layout for all diagram types
Client export	html-to-image	PNG/JPEG export from canvas
Data grid	TanStack Table	FMEA table view

Backend

Component	Technology	Purpose
Runtime	Node.js	Server runtime
Framework	Fastify	HTTP framework (fast, TypeScript-native)
Language	TypeScript (strict mode)	Type safety
ORM	Prisma	Type-safe database access, migrations
Database	PostgreSQL 16	Primary data store
Cache	Redis	Session cache, pub/sub, result caching
Auth	Better Auth	OAuth, organizations, RBAC
CRDT sync	y-websocket	Real-time collaboration server
Job queue	BullMQ (Redis-backed)	Analysis job lifecycle, cancellation, retries
PDF export	Puppeteer (headless Chromium)	Server-side PDF generation

AI

Component	Technology	Purpose
SDK	Vercel AI SDK (`ai` package)	Provider-agnostic LLM integration
Default provider	Claude (Anthropic)	Primary LLM for AI features
Pattern	Tool-use agent	Diagram manipulation via function calling

Infrastructure

Component	Technology	Purpose
Frontend hosting	Vercel	Static SPA, CDN, preview deploys
Backend hosting	Docker on Fly.io / Railway / VPS	Persistent server (WebSockets)
Containers	Docker + Docker Compose	Local dev and production
Orchestration	Kubernetes (future)	Horizontal scaling when needed
CI/CD	GitHub Actions	Automated testing and deployment
Error tracking	Sentry	Frontend + backend error monitoring

3. Architecture

High-Level System Diagram

┌──────────────────────────────────────────────────────────────────┐
│                          Client (Browser)                         │
│                                                                   │
│  ┌───────────┐  ┌──────────┐  ┌───────────┐  ┌───────────────┐  │
│  │ React Flow│  │ Zustand  │  │ Yjs Doc   │  │ AI Chat Panel │  │
│  │ (canvas)  │◄─┤ (state)  │◄─┤ (CRDT)    │  │ (Vercel AI)   │  │
│  └─────┬─────┘  └──────────┘  └─────┬─────┘  └───────┬───────┘  │
│        │                             │                 │          │
│  ┌─────┴─────────────────────────────┴─────────────────┴───────┐ │
│  │                Diagram → ModelIR Serializer                  │ │
│  └─────────────────────────┬───────────────────────────────────┘ │
│                            │                                      │
│  ┌─────────────────────────┴───────────────────────────────────┐ │
│  │           Client Analysis Engine (Web Worker)                │ │
│  │    Validation, small computations, sensitivity sliders       │ │
│  └─────────────────────────┬───────────────────────────────────┘ │
│                            │ (large models delegated to server)   │
└────────────────────────────┼─────────────────────────────────────┘
                             │
┌────────────────────────────┼─────────────────────────────────────┐
│                     Server │                                      │
│                                                                   │
│  ┌────────────┐ ┌─────────────┐ ┌────────────┐ ┌─────────────┐  │
│  │ Fastify API│ │ Better Auth │ │ y-websocket│ │ AI endpoint │  │
│  │ (REST)     │ │ (OAuth/RBAC)│ │ (CRDT sync)│ │ (LLM proxy) │  │
│  └─────┬──────┘ └─────────────┘ └────────────┘ └─────────────┘  │
│        │                                                          │
│        │ submits jobs via BullMQ                                  │
│        ▼                                                          │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │         Solver Worker (separate process / container)         │ │
│  │   Large CTMC, BDD-based FTA, batch runs, Monte Carlo        │ │
│  │   Consumes jobs from Redis queue                             │ │
│  │   Designed for future migration to Rust/Python/WASM          │ │
│  └─────────────────────────────────────────────────────────────┘ │
│                                                                   │
│  ┌─────┴──────────────────┐  ┌─────────────────────────────────┐ │
│  │      PostgreSQL        │  │           Redis                 │ │
│  │  users, projects,      │  │  sessions, pub/sub, job queue,  │ │
│  │  diagrams, snapshots,  │  │  analysis cache, rate limiting  │ │
│  │  audit log, jobs       │  │                                 │ │
│  └────────────────────────┘  └─────────────────────────────────┘ │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │         Export Service (Headless Chromium container)         │ │
│  └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘

Key Architecture Principles

Diagram is UI, Model is the product — Visual canvas and computation layer are decoupled via ModelIR
Offline-first collaboration — Yjs CRDTs queue edits and merge on reconnect
Hybrid computation — Small models run client-side (Web Worker), large models run server-side
Solver as replaceable engine — API server never performs heavy math in-process. Solvers run in a separate worker process/container, communicating via job queue (BullMQ/Redis). Interface is designed so solvers can migrate to Rust/Python/WASM without breaking the platform
AI as agent — The LLM has tools to read and manipulate diagrams, not just chat
Plugin-ready diagram types — Each diagram type is a self-contained module with its own nodes, edges, validation, layout, and serializer
Stateless backend containers — Sessions in Redis, CRDT state in PostgreSQL, enabling horizontal scaling
Everything must be reproducible — Every analysis result stores ModelIR schema version, solver version, options, tolerances, and computation timestamp. Researchers can cite exact solver configurations in publications

4. Authentication & Authorization

Auth Library: Better Auth

Framework-agnostic, works with Fastify
Prisma adapter for PostgreSQL
Built-in plugins: organization (teams), rbac (permissions), two-factor (future)
Session-based auth with secure httpOnly cookies

OAuth Providers

Provider	Priority	Audience
Google	v1	Universal
GitHub	v1	Developers, researchers
Microsoft / Azure AD	v1	University, enterprise SSO
Generic OIDC	v2	Institutional identity providers
Email / password	v1	Fallback with email verification

Auth Flow (SPA + Separate API)

Browser (Vite SPA)
  │
  ├── Auth pages (login/register) → calls Better Auth endpoints
  │                                  hosted on Fastify server
  ├── OAuth redirect flow         → handled by Better Auth
  │
  └── API calls with session cookie → Fastify validates session
                                      via Better Auth middleware

Team / Organization System

Better Auth's organization plugin manages:

Teams (called "organizations" in Better Auth)
Members with roles
Invitations (by email or invite link)

Team roles:

Role	Permissions
`admin`	Manage members, delete projects, full control over team resources
`member`	Create/edit diagrams in team projects, cannot manage team settings

Project-Level Permissions

Separate from team roles — applied per-project:

Role	Permissions
`owner`	Full control, delete project, manage shares
`editor`	Create/edit/delete diagrams within the project
`viewer`	Read-only access to all diagrams in the project

Permission Resolution Order

When a user accesses a project:

1. Is user the project creator?              → owner
2. Has direct project_share entry?           → use that role
3. Is project owned by a team?
   └── Is user a team member?
       ├── team role = admin                 → editor (of project content)
       └── team role = member                → editor
4. Has valid share_link token in URL?        → use link's role
5. None of the above                         → 403 Forbidden

Policy note: Team admin grants team management powers (members, settings, delete team projects) but does NOT automatically grant owner on every project. Project ownership is explicit — only the creator or someone explicitly granted owner via project_shares can delete a project or manage its shares. This prevents team admins from silently becoming owners of all team content.

Link Sharing (Google Docs-style)

Project owners can generate a share link with a role (editor or viewer)
Links contain a UUID token: /invite/:token
Links can be set to expire or deactivated
Opening a share link grants the user the specified role via project_shares

5. Data Model & Database Schema

All IDs are UUIDs. Timestamps use timestamptz.

Better Auth Managed Tables

These are created and managed by Better Auth + plugins:

-- Better Auth core
users (id, email, name, image, emailVerified, createdAt, updatedAt)
sessions (id, userId, token, expiresAt, ...)
accounts (id, userId, provider, providerAccountId, ...)
verifications (id, identifier, value, expiresAt, ...)

-- Better Auth organization plugin
organizations (id, name, slug, logo, metadata, createdAt)
members (id, organizationId, userId, role, createdAt)
invitations (id, organizationId, email, role, status, ...)

Application Tables

-- ─── Projects ───

CREATE TABLE projects (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name            VARCHAR(255) NOT NULL,
    description     TEXT,
    owner_type      VARCHAR(20) NOT NULL CHECK (owner_type IN ('user', 'team')),
    owner_id        UUID NOT NULL,  -- references users.id OR organizations.id
    created_by      UUID NOT NULL REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Diagrams ───

CREATE TABLE diagrams (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id      UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    name            VARCHAR(255) NOT NULL,
    type            VARCHAR(50) NOT NULL CHECK (type IN (
                        'markov_chain',
                        'fault_tree',
                        'event_tree',
                        'reliability_block_diagram',
                        'bow_tie',
                        'fmea'
                    )),
    yjs_state       BYTEA,              -- persisted Yjs document
    thumbnail       BYTEA,              -- dashboard preview image
    created_by      UUID NOT NULL REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Diagram Snapshots (version history) ───

CREATE TABLE diagram_snapshots (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    diagram_id      UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
    yjs_state       BYTEA NOT NULL,
    label           VARCHAR(255),       -- optional named version
    created_by      UUID NOT NULL REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Project Sharing (direct user access) ───

CREATE TABLE project_shares (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    project_id      UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    user_id         UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    role            VARCHAR(20) NOT NULL CHECK (role IN ('owner', 'editor', 'viewer')),
    granted_by      UUID NOT NULL REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(project_id, user_id)
);

-- ─── Share Links (URL-based access) ───

CREATE TABLE share_links (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    token           UUID UNIQUE NOT NULL DEFAULT gen_random_uuid(),
    project_id      UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    role            VARCHAR(20) NOT NULL CHECK (role IN ('editor', 'viewer')),
    created_by      UUID NOT NULL REFERENCES users(id),
    expires_at      TIMESTAMPTZ,        -- NULL = never expires
    is_active       BOOLEAN NOT NULL DEFAULT true,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Comments / Annotations ───

CREATE TABLE comments (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    diagram_id      UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
    user_id         UUID NOT NULL REFERENCES users(id),
    node_id         VARCHAR(255),       -- anchored to a specific node (nullable)
    position_x      DOUBLE PRECISION,   -- canvas position (nullable)
    position_y      DOUBLE PRECISION,
    content         TEXT NOT NULL,
    resolved        BOOLEAN NOT NULL DEFAULT false,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Notifications ───

CREATE TABLE notifications (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    type            VARCHAR(50) NOT NULL CHECK (type IN (
                        'project_shared',
                        'team_invite',
                        'comment_added',
                        'comment_resolved',
                        'analysis_complete',
                        'mention'
                    )),
    payload         JSONB NOT NULL,     -- flexible data per notification type
    read            BOOLEAN NOT NULL DEFAULT false,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Diagram Templates ───

CREATE TABLE diagram_templates (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name            VARCHAR(255) NOT NULL,
    description     TEXT,
    type            VARCHAR(50) NOT NULL,  -- same enum as diagrams.type
    model_ir        JSONB NOT NULL,        -- template stored as ModelIR
    is_builtin      BOOLEAN NOT NULL DEFAULT false,  -- system vs user-created
    created_by      UUID REFERENCES users(id),       -- NULL for built-in
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Analysis Jobs (lifecycle tracking) ───

CREATE TABLE analysis_jobs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    diagram_id      UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
    requested_by    UUID NOT NULL REFERENCES users(id),
    content_hash    VARCHAR(64) NOT NULL,  -- SHA-256 of ModelIR + options
    method          VARCHAR(50) NOT NULL,  -- analysis method name
    options         JSONB NOT NULL,
    status          VARCHAR(20) NOT NULL DEFAULT 'queued'
                        CHECK (status IN ('queued', 'running', 'succeeded', 'failed', 'canceled')),
    progress        DOUBLE PRECISION DEFAULT 0
                        CHECK (progress >= 0 AND progress <= 1),
    priority        INTEGER NOT NULL DEFAULT 0,
    worker_id       VARCHAR(100),          -- identifies which worker picked up the job
    error_message   TEXT,
    error_stack     TEXT,
    queued_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
    started_at      TIMESTAMPTZ,
    finished_at     TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- ─── Analysis Results (cached) ───

CREATE TABLE analysis_results (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    job_id          UUID REFERENCES analysis_jobs(id),  -- link to job that produced this
    diagram_id      UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
    content_hash    VARCHAR(64) NOT NULL,  -- SHA-256 of ModelIR + options
    solver_name     VARCHAR(100) NOT NULL,
    solver_version  VARCHAR(20) NOT NULL,
    options         JSONB NOT NULL,
    results         JSONB NOT NULL,
    trace           JSONB NOT NULL,        -- assumptions, warnings, audit
    numeric_metadata JSONB NOT NULL,       -- method, tolerance, iterations, residual norms
    warnings        JSONB,
    error_bounds    JSONB,                 -- { lower, upper } where applicable
    compute_time_ms INTEGER NOT NULL,
    executed_on     VARCHAR(20) NOT NULL CHECK (executed_on IN ('client', 'server')),
    created_by      UUID NOT NULL REFERENCES users(id),
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(diagram_id, content_hash)
);

-- ─── Audit Log ───

CREATE TABLE audit_log (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id         UUID REFERENCES users(id),   -- NULL for system events
    action          VARCHAR(100) NOT NULL,        -- e.g., 'project.create', 'share_link.use'
    object_type     VARCHAR(50) NOT NULL,         -- e.g., 'project', 'diagram', 'team'
    object_id       UUID,
    metadata        JSONB,                        -- action-specific details
    ip_address      INET,
    session_id      UUID,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Audited events:
-- auth.login, auth.logout, auth.oauth_connect
-- project.create, project.delete, project.update
-- diagram.create, diagram.delete
-- share_link.create, share_link.use, share_link.revoke
-- project_share.grant, project_share.revoke
-- team.create, team.member_add, team.member_remove, team.role_change
-- analysis.submit, analysis.complete, analysis.fail
-- export.generate, export.download

-- ─── Indexes ───

CREATE INDEX idx_diagrams_project ON diagrams(project_id);
CREATE INDEX idx_diagrams_type ON diagrams(type);
CREATE INDEX idx_snapshots_diagram ON diagram_snapshots(diagram_id);
CREATE INDEX idx_project_shares_user ON project_shares(user_id);
CREATE INDEX idx_project_shares_project ON project_shares(project_id);
CREATE INDEX idx_share_links_token ON share_links(token);
CREATE INDEX idx_comments_diagram ON comments(diagram_id);
CREATE INDEX idx_notifications_user ON notifications(user_id, read);
CREATE INDEX idx_analysis_results_hash ON analysis_results(diagram_id, content_hash);
CREATE INDEX idx_analysis_jobs_diagram ON analysis_jobs(diagram_id);
CREATE INDEX idx_analysis_jobs_status ON analysis_jobs(status);
CREATE INDEX idx_analysis_jobs_hash ON analysis_jobs(content_hash);
CREATE INDEX idx_audit_log_user ON audit_log(user_id, created_at);
CREATE INDEX idx_audit_log_object ON audit_log(object_type, object_id);
CREATE INDEX idx_audit_log_action ON audit_log(action, created_at);

6. Diagram Types

Supported Types (v1)

Type	UI Model	Layout Algorithm
Markov Chain	Nodes (states) + edges (transitions)	Force-directed (ELK.js)
Fault Tree (FTA)	Tree with logic gate nodes	Top-down hierarchical (ELK.js)
Event Tree (ETA)	Horizontal branching tree	Left-to-right (ELK.js)
Reliability Block Diagram (RBD)	Blocks in series/parallel/k-of-n	Left-to-right flow (ELK.js)
Bow-Tie	Fault tree + central event + event tree	Symmetric left-center-right (ELK.js)
FMEA	Data table (not a canvas diagram)	N/A — uses TanStack Table

Plugin Architecture

Each diagram type is a self-contained module:

src/diagram-types/
├── index.ts                    // diagram type registry
├── markov-chain/
│   ├── nodes/                  // custom React Flow node components
│   │   ├── StateNode.tsx       // circle node for states
│   │   └── index.ts
│   ├── edges/                  // custom edge components
│   │   └── TransitionEdge.tsx  // labeled edge with rate/probability
│   ├── toolbar/                // type-specific toolbar items
│   ├── validation.ts           // outgoing probabilities sum to 1, etc.
│   ├── serializer.ts           // React Flow state → ModelIR
│   ├── layout.ts               // ELK.js config for this type
│   ├── tikz-export.ts          // ModelIR → TikZ code
│   └── index.ts                // diagram type definition
├── fault-tree/
│   ├── nodes/
│   │   ├── GateNode.tsx        // AND/OR/k-of-n/NOT gates
│   │   ├── BasicEventNode.tsx
│   │   └── TopEventNode.tsx
│   ├── ...
├── event-tree/
│   ├── ...
├── rbd/
│   ├── ...
├── bow-tie/
│   ├── ...
└── fmea/
    ├── columns.ts              // TanStack Table column definitions
    ├── validation.ts
    ├── serializer.ts
    └── ...

Diagram Type Definition Interface

interface DiagramTypeDefinition {
  id: DiagramType
  name: string
  description: string
  icon: React.ComponentType

  // React Flow configuration
  nodeTypes: Record<string, React.ComponentType>
  edgeTypes: Record<string, React.ComponentType>

  // Default nodes/edges for a new diagram
  defaultContent: () => { nodes: Node[]; edges: Edge[] }

  // Toolbar items specific to this type
  toolbarItems: ToolbarItem[]

  // Validation rules
  validate: (nodes: Node[], edges: Edge[]) => ValidationResult

  // Convert to ModelIR for analysis
  serialize: (nodes: Node[], edges: Edge[]) => ModelIR

  // Auto-layout configuration for ELK.js
  layoutOptions: ElkLayoutOptions

  // TikZ export
  toTikZ: (modelIR: ModelIR) => string
}

7. Analysis Engine

Core Principle: ModelIR Decouples Diagrams from Solvers

Every diagram type serializes to a shared Intermediate Representation (ModelIR). Solvers consume only ModelIR. This enables:

Cross-diagram consistency
Fewer duplicated rules
New diagram types without rewriting math
Cross-type conversions (e.g., RBD ↔ equivalent FTA)

ModelIR Schema

ValueRef — Parameterized Values

All numeric properties in the IR use ValueRef instead of raw numbers. This enables named parameters, scenario management, and expressions:

// ValueRef replaces raw numbers throughout the IR.
// Allows literal values, parameter references, or expressions.
type ValueRef =
  | number                           // literal: 0.001
  | { param: string }                // reference: { param: "lambda_1" }
  | { expr: string }                 // expression: { expr: "2 * lambda_1" }

Unit System

All values in the IR carry units. The normalization step converts to base units before solving:

interface UnitConfig {
  timeBase: 'hours' | 'days' | 'years'   // solver operates in this unit
  rateBase: '1/h' | '1/d' | '1/y'       // derived from timeBase
}

User-facing values store original units for display
Normalization converts to timeBase before any solver runs
Solver results are returned in base units, then converted for display

Core Schema

interface ModelIR {
  version: string                    // schema version for reproducibility
  type: DiagramType
  unitConfig: UnitConfig             // base units for this model
  components: Component[]
  events: Event[]
  gates: Gate[]                      // FTA, Bow-tie
  states: State[]                    // Markov
  transitions: Transition[]          // Markov, ETA
  blocks: Block[]                    // RBD
  barriers: Barrier[]                // Bow-tie
  dependencies: Dependency[]
  parameters: Parameter[]            // named values (lambda, mu, etc.)
  distributions: Distribution[]      // exponential, Weibull, etc.
  initialCondition: InitialCondition // how the model starts
  missionTime?: ValueRef
  repairPolicy?: RepairPolicy
}

interface Component {
  id: string
  name: string
  failureRate?: ValueRef             // ValueRef, not number
  repairRate?: ValueRef              // ValueRef, not number
  distribution?: DistributionRef
  metadata: Record<string, unknown>
}

interface State {
  id: string
  label: string
  type: 'operational' | 'degraded' | 'failed' | 'absorbing'
}

// Initial condition supports single state or distribution across states
type InitialCondition =
  | { type: 'single'; stateId: string }
  | { type: 'distribution'; probabilities: Record<string, number> }

interface Transition {
  id: string
  from: string
  to: string
  rate?: ValueRef                    // ValueRef, not number
  probability?: ValueRef             // ValueRef, not number
  label?: string
  condition?: string
}

interface Gate {
  id: string
  type: 'AND' | 'OR' | 'NOT' | 'K_OF_N' | 'XOR'
  k?: number                         // for K_OF_N
  inputs: string[]                   // child event/gate IDs
  output: string                     // parent event ID
}

interface Parameter {
  name: string                       // e.g., "lambda_1"
  value: number
  unit: string                       // e.g., "1/h" — required, not optional
  description?: string
}

interface Distribution {
  id: string
  type: 'exponential' | 'weibull' | 'lognormal' | 'constant'
  params: Record<string, ValueRef>   // ValueRef, not number
}

Dependency Definitions

Dependencies model correlations and shared-cause relationships between components. Critical for realistic FTA, RBD, and bow-tie analysis:

type Dependency =
  | CommonCauseFailure
  | FunctionalDependency
  | ConditionalProbability
  | InhibitCondition

// Components that share a common failure cause (Beta-factor, MGL, Alpha-factor)
interface CommonCauseFailure {
  type: 'common_cause'
  id: string
  group: string[]                    // component IDs in the CCF group
  model: 'beta_factor' | 'mgl' | 'alpha_factor'
  params: Record<string, ValueRef>   // e.g., { beta: 0.1 }
}

// Component B fails whenever component A fails
interface FunctionalDependency {
  type: 'functional'
  id: string
  source: string                     // component that triggers
  targets: string[]                  // components that depend on it
}

// Probability of B given A (for event trees, dependent branches)
interface ConditionalProbability {
  type: 'conditional'
  id: string
  given: string                      // conditioning event/component
  target: string                     // dependent event/component
  probability: ValueRef
}

// Inhibit gate condition (FTA)
interface InhibitCondition {
  type: 'inhibit'
  id: string
  gate: string                       // gate ID
  condition: string                  // condition event ID
  probability: ValueRef
}

Repair Policy

Defines how repairs are modeled. Schema supports future complexity; v1 implements unlimited only:

interface RepairPolicy {
  type: 'unlimited' | 'single_repairman' | 'priority_queue' | 'k_repairmen'
  maxSimultaneousRepairs?: number    // for k_repairmen
  priorityOrder?: string[]           // component IDs in repair priority
  preemptive?: boolean               // can a higher-priority repair interrupt?
}
// v1 default: { type: 'unlimited' }
// Schema allows future extension without breaking changes

Analysis Request / Response Contract

interface AnalyzeRequest {
  modelIR: ModelIR
  method: AnalysisMethod
  options: AnalysisOptions
  executionTarget: 'client' | 'server' | 'auto'
}

type AnalysisMethod =
  // Markov
  | 'steady_state'
  | 'transient'
  | 'mttf'
  | 'availability'
  // Fault tree
  | 'minimal_cut_sets'
  | 'top_event_probability'
  | 'importance_measures'
  // Event tree
  | 'outcome_probabilities'
  | 'path_ranking'
  // RBD
  | 'system_reliability'
  | 'system_availability'
  // Bow-tie
  | 'end_state_frequencies'
  | 'barrier_effectiveness'
  // FMEA
  | 'rpn_calculation'
  | 'criticality_analysis'
  // General
  | 'validate'

interface AnalysisOptions {
  tolerance?: number                 // convergence tolerance
  maxIterations?: number
  truncationLimit?: number           // cut set order limit
  timePoints?: number[]              // for transient analysis
  missionTime?: number
  confidenceLevel?: number
  method?: string                    // solver algorithm selection
}

interface AnalyzeResponse {
  status: 'success' | 'warning' | 'error'
  solver: {
    name: string
    version: string
  }
  modelIRVersion: string             // schema version used
  contentHash: string                // SHA-256 of ModelIR + options

  metrics: Record<string, number | number[]>
  contributions?: ContributionTable[]
  cutSets?: CutSet[]
  importanceMeasures?: ImportanceMeasure[]

  // Structured numeric metadata — mandatory for every result
  numericMetadata: {
    method: string                   // e.g., "uniformization", "sparse_lu", "bdd"
    tolerance: number                // convergence tolerance used
    iterations?: number              // iterations to convergence
    residualNorm?: number            // final residual norm
    truncation?: {
      enabled: boolean
      threshold: number              // e.g., 1e-12
      cutSetsDropped?: number        // how many cut sets were truncated
    }
    stiffnessDetected?: boolean      // for CTMC solvers
    methodAutoSelected?: boolean     // was the method chosen automatically?
  }

  trace: {
    assumptions: string[]            // e.g., "component independence assumed"
    normalizations: string[]         // e.g., "rates converted from 1/d to 1/h"
    unitConversions: string[]        // e.g., "lambda_1: 0.024/d → 0.001/h"
    simplifications: string[]        // e.g., "k-of-n gate expanded to OR/AND"
    methodDetails: string            // human-readable explanation of method
  }

  warnings: Warning[]
  errorBounds?: {
    lower: number
    upper: number
    description: string              // e.g., "cut set truncation error bound"
  }
  computeTimeMs: number
  timestamp: string                  // ISO 8601 — when this result was computed
}

interface Warning {
  severity: 'info' | 'warning' | 'error'
  code: string
  message: string
  affectedComponents: string[]
}

Solver Details Per Analysis Type

A. Markov Chain / CTMC

Analysis	Method	Notes
Steady-state	Solve piQ = 0 with normalization	Sparse linear solvers for large models
Transient	Uniformization (default)	Krylov methods for large sparse systems
MTTF	Absorbing CTMC methods	Solve linear system for expected absorption time
Availability	Steady-state P(operational states)	Sum of operational + degraded state probabilities

Robustness features:

Automatic stiffness detection with method selection
State-space explosion mitigation: basic lumping, symmetry reduction
"Model size / confidence" panel predicting runtime and suggesting simplifications

B. Fault Tree Analysis

Analysis	Method	Notes
Minimal cut sets	BDD-based (exact) for moderate size	Approximate/truncated for large trees
Top event probability	Exact via BDD when possible	Rare-event approximation as labeled option
Importance measures	Birnbaum, Fussell-Vesely, RAW, RRW	Computed from cut set results

Differentiator: Show error bounds when truncating cut sets, show which cut sets were dropped.

C. Event Tree Analysis

Analysis	Method	Notes
Outcome probabilities	Conditional probability propagation	Handle dependencies between branches
Path ranking	Ranked by probability	Top contributing paths

Differentiator: "What-if branch probability slider" for fast recomputation.

D. Reliability Block Diagram

Analysis	Method	Notes
System reliability (non-repairable)	Combinatorial (series/parallel/k-of-n)	Support exponential + Weibull distributions
System availability (repairable)	CTMC or Markov approximation	Auto-convert to CTMC for repairable k-of-n

Differentiator: Auto-convert RBD ↔ equivalent FTA with side-by-side comparison.

E. Bow-Tie

Combined IR linking initiating event frequency, preventive barriers, and mitigative barriers
Compute end-state frequencies directly with attribution
Differentiator: Barrier management dashboard showing risk reduction per barrier

F. FMEA

Analysis	Method	Notes
RPN	Severity x Occurrence x Detection	Classic method, explicitly labeled
Criticality	MIL-STD-1629 style	Alternative to RPN
Custom scoring	Configurable weights	User-defined risk formula

Differentiator: Connect FMEA items to model components, show how improvements change system-level risk.

Computation Architecture

Design Rule: Fastify Never Computes

The API server (Fastify) is strictly an orchestrator. It accepts analysis requests, validates them, submits jobs, and returns results. All heavy math runs in separate processes.

Client-Side (Web Worker)

Runs in dedicated Web Worker to keep UI responsive
Handles: validation, normalization, small/medium computations
Threshold: models with < ~50 states (Markov) or < ~100 basic events (FTA)
Used for: sensitivity sliders, instant recalc, FMEA RPN
Same analyze(modelIR, options) → result interface as server-side

Server-Side (Solver Worker — Separate Process)

Runs as a separate process or container, NOT inside Fastify
Communicates via BullMQ (Redis-backed job queue)
Handles: large CTMC, exact BDD-based FTA, Monte Carlo (future), batch runs
Result caching via Redis: hash(ModelIR + options) → cached result
Audit logging for reproducibility and compliance

Solver Worker boundary is designed for future migration:

v1: Node.js worker process (same language, separate process)
Future: Rust (performance), Python/scipy (numerical libraries), or WASM
The interface (AnalyzeRequest → AnalyzeResponse) stays the same regardless of implementation language

Job Lifecycle

Client submits analysis request
    │
    ▼
Fastify API validates request
    │
    ├── Small model? → return to client for Web Worker execution
    │
    └── Large model? → submit to BullMQ job queue
                          │
                          ▼
                    Solver Worker picks up job
                    ├── status: queued → running
                    ├── reports progress (0..1)
                    ├── on success: status → succeeded, result cached
                    └── on failure: status → failed, error logged
                          │
                          ▼
                    Client polls or receives WebSocket notification
                    Result returned from cache

Job API endpoints:

POST /api/analysis/jobs — submit analysis job
GET /api/analysis/jobs/:id — poll status + progress
POST /api/analysis/jobs/:id/cancel — cancel running/queued job
GET /api/analysis/jobs/:id/result — retrieve cached result

Auto-Detection

executionTarget: 'auto' →
  if (stateCount < 50 && !isStiff)      → client (Web Worker)
  if (stateCount < 500)                 → server (sync worker)
  if (stateCount >= 500 || monteCarlo)  → server (queued job)

User sees: "Running locally..." or "Running on server (estimated ~3s)..." or "Queued (position 2)..."

Math Libraries

Need	Library	Purpose
Matrix operations	mathjs or ml-matrix	Steady-state solving, transient analysis
BDD	Custom implementation	Exact fault tree cut set enumeration
Graph algorithms	graphology	Reachability, cycle detection, components
Distributions	jstat	Weibull, exponential, lognormal

Engine File Structure

The engine is a shared package used by both the client Web Worker and the server Solver Worker. This ensures identical results regardless of where computation runs.

packages/engine/                     // shared package (no framework dependencies)
├── src/
│   ├── ir/
│   │   ├── schema.ts                // ModelIR TypeScript types + ValueRef
│   │   ├── validate.ts              // IR validation (structural correctness)
│   │   ├── normalize.ts             // Canonicalization: unit conversion,
│   │   │                            // expand k-of-n, resolve ValueRefs,
│   │   │                            // convert to base units
│   │   └── units.ts                 // Unit conversion utilities
│   ├── serializers/
│   │   ├── markov.ts                // React Flow state → ModelIR
│   │   ├── fault-tree.ts
│   │   ├── event-tree.ts
│   │   ├── rbd.ts
│   │   ├── bow-tie.ts
│   │   └── fmea.ts
│   ├── solvers/
│   │   ├── interface.ts             // Shared Solver interface:
│   │   │                            // analyze(ModelIR, options) → AnalyzeResponse
│   │   ├── registry.ts              // Solver registry (method → solver mapping)
│   │   ├── markov/
│   │   │   ├── steady-state.ts
│   │   │   ├── transient.ts
│   │   │   └── mttf.ts
│   │   ├── fta/
│   │   │   ├── cut-sets.ts
│   │   │   ├── probability.ts
│   │   │   └── importance.ts
│   │   ├── eta/
│   │   │   └── outcome-probability.ts
│   │   ├── rbd/
│   │   │   ├── reliability.ts
│   │   │   └── availability.ts
│   │   ├── bow-tie/
│   │   │   └── end-state-frequency.ts
│   │   └── fmea/
│   │       ├── rpn.ts
│   │       └── criticality.ts
│   └── test-harness/
│       ├── golden-models/           // Known models with verified outputs
│       └── cross-check.ts          // Dual-method verification
│
├── package.json                     // standalone package, no React/Fastify deps

packages/frontend/
├── src/engine/
│   └── worker.ts                    // Web Worker: imports from @ramsey/engine

packages/backend/
├── src/worker/
│   ├── solver-worker.ts             // BullMQ worker: imports from @ramsey/engine
│   └── job-queue.ts                 // BullMQ queue setup + job submission

Solver Test Harness

Golden models: Library of known small models with published/hand-verified outputs
Cross-checking: Compute via two methods where possible (FTA cut sets vs BDD, RBD vs equivalent FTA)
Property tests: Probabilities in [0,1], monotonicity under component improvement
Numeric diagnostics: Convergence flags, residual norms, truncation error estimates

8. AI Assistance

Concept: AI as a Diagram Agent

The AI chat is not just Q&A — it's an agent with tools to read, manipulate, and analyze diagrams.

Capabilities

Capability	Description
Natural language → Diagram	"Create a Markov chain for a redundant pump system with repair"
Q&A about diagram	"Is this chain irreducible?" / "What's the MTTF?"
DSL → Diagram	Paste DSL in chat, AI parses and generates (when DSL module is built)
Validation & error checking	"Check my diagram for problems" — finds structural issues, suggests fixes

Tool System

The AI agent has access to these tools via Vercel AI SDK function calling:

// Diagram manipulation
add_node(type, label, position?, properties?)
add_edge(from, to, label?, properties?)
remove_node(id)
remove_edge(id)
update_node(id, changes)
update_edge(id, changes)
auto_layout()
select_nodes(ids[])
clear_diagram()

// Analysis (calls actual solver, no hallucinated math)
run_steady_state()
run_transient(time)
run_mttf()
run_availability()
run_cut_sets()
run_rpn()
validate_diagram()

// Context
get_diagram_state()
get_selected_nodes()
get_diagram_metadata()

Context Serialization

Before each AI request, the diagram is serialized into a compact text format:

Current diagram (Markov Chain): "Redundant Pump System"
States: S0("Both OK", initial), S1("One failed"), S2("Both failed", absorbing)
Transitions: S0->S1(rate=2*lambda), S1->S2(rate=lambda), S2->S1(rate=mu), S1->S0(rate=mu)
Parameters: lambda=0.001, mu=0.05

Streaming UX

When the AI generates a diagram:

AI text response streams in the chat panel
Nodes and edges appear on canvas as each tool call resolves
Final auto-layout once generation completes

The user watches the AI "draw" the diagram in real-time.

AI Request Flow

Client                              Server
──────                              ──────
Chat UI                             POST /api/ai/chat
  │                                   │
  │  message + diagram state    ────▶ │  system prompt
  │  + conversation history           │  + diagram context
  │                                   │  + tool definitions
  │                                   │
  │  streaming response + tool  ◄──── │  streams from LLM
  │  calls                            │  executes analysis tools
  │                                   │  server-side
  │  applies diagram changes          │
  │  via Yjs (synced to all           │
  │  collaborators)                   │

API Key Management

Platform key: RAMSey provides AI features via a shared key, usage-limited per user/team
BYO key: Users can configure their own Claude/OpenAI API key in settings for unlimited use

9. Export Pipeline

Client-Side Exports (instant)

Format	Method	Notes
SVG	React Flow `toSVG()` + style cleanup	Clean vector output, infinite scalability
PNG	`html-to-image`	Configurable DPI (1x, 2x, 4x)
JPEG	`html-to-image`	Configurable DPI, quality setting

Server-Side Exports

Format	Method	Notes
PDF	Puppeteer (headless Chromium)	Exact rendering with vector graphics
LaTeX/TikZ	Custom serializer per diagram type	Publication-ready TikZ code

LaTeX/TikZ Export

Each diagram type has a dedicated TikZ serializer. Example Markov chain output:

\begin{tikzpicture}[->, >=stealth', auto, node distance=3cm,
  thick, every state/.style={circle, draw, minimum size=1.2cm}]

  \node[state]         (S0) {$S_0$};
  \node[state]         (S1) [right of=S0] {$S_1$};
  \node[state]         (S2) [below of=S1] {$S_2$};

  \path (S0) edge [bend left]  node {$\lambda_1$} (S1)
        (S1) edge [bend left]  node {$\mu_1$}     (S0)
        (S1) edge              node {$\lambda_2$} (S2)
        (S2) edge [bend right] node {$\mu_2$}     (S0);

\end{tikzpicture}

Users can paste directly into Overleaf or any LaTeX editor.

Export Service (Docker Container)

Runs headless Chromium in isolation. Receives diagram state, renders, exports. Keeps the main backend lean.

Export Dialog UI

Export Diagram
─────────────────────────────
Format:      [SVG] [PNG] [JPEG] [PDF] [LaTeX]
Scale:       [1x] [2x] [4x]          (PNG/JPEG only)
Background:  [Transparent] [White]    (PNG/SVG only)
Include:     [x] Labels  [x] Probabilities  [ ] Grid
             [x] Title   [ ] Legend
─────────────────────────────
                          [Export]

10. Real-Time Collaboration

CRDT: Yjs

y-websocket handles WebSocket sync, awareness (cursors/presence), and reconnection
Offline-first: Edits queue locally and merge on reconnect
Sub-documents: Large diagrams can be split into sync-able chunks
UndoManager: Built-in undo/redo per user session

Awareness Features

Cursor positions on canvas — see where collaborators are pointing
User colors — each collaborator gets a distinct color
Selection highlights — see what others have selected
Presence indicators — user avatars in the toolbar showing who's online

Persistence

Yjs document state periodically persisted to PostgreSQL (diagrams.yjs_state)
Named snapshots saved to diagram_snapshots for version history
Auto-save interval: configurable (default: every 30 seconds of inactivity)
Manual "Save version" button for named snapshots

Scaling

For multiple backend instances:

Redis pub/sub for cross-instance y-websocket message relay
Stateless backend containers — all persistent state in PostgreSQL + Redis

11. UI/UX Design Principles

Visual Style

Minimalist and professional — diagrams must look publication-ready
Thin lines, muted colors, no drop shadows or gradients
Serif/math fonts for labels (consistent with LaTeX output)
Clean white or transparent backgrounds
Color palette: muted blues, grays, with red/amber for warnings/failures

Layout

┌─────────────────────────────────────────────────────────────┐
│  Toolbar: [File] [Edit] [View] [Diagram] [Analysis] [AI]   │
│  ──────────────────────────────────────────────────────────  │
│  ┌──────┐ ┌──────────────────────────┐ ┌────────────────┐   │
│  │      │ │                          │ │                │   │
│  │ Side │ │     Diagram Canvas       │ │   AI Chat /    │   │
│  │ bar  │ │     (React Flow)         │ │   Properties / │   │
│  │      │ │                          │ │   Analysis     │   │
│  │ Node │ │                          │ │   Panel        │   │
│  │ palet│ │                          │ │                │   │
│  │ te + │ │                          │ │   (collapsible │   │
│  │ props│ │                          │ │    tabs)       │   │
│  │      │ │                          │ │                │   │
│  └──────┘ └──────────────────────────┘ └────────────────┘   │
│  Status bar: [Collaborators] [Zoom] [Connection] [Autosave] │
└─────────────────────────────────────────────────────────────┘

Dashboard

┌─────────────────────────────────────────────────────────────┐
│  RAMSey    [My Projects] [Teams] [Templates]    [Profile]   │
│  ──────────────────────────────────────────────────────────  │
│                                                              │
│  Recent Projects                                             │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐                │
│  │ thumbnail │  │ thumbnail │  │    +      │                │
│  │           │  │           │  │  New      │                │
│  │ Project A │  │ Project B │  │  Project  │                │
│  │ 3 diagrams│  │ 1 diagram │  │           │                │
│  │ Team Alpha│  │ Personal  │  │           │                │
│  └───────────┘  └───────────┘  └───────────┘                │
│                                                              │
└─────────────────────────────────────────────────────────────┘

URL Structure

/dashboard                            Personal overview
/project/:projectId                   Project view (list diagrams)
/project/:projectId/d/:diagramId      Diagram editor
/team/:teamSlug                       Team dashboard
/team/:teamSlug/settings              Team management
/invite/:token                        Share link entry point
/templates                            Browse templates
/login                                Auth pages
/register

12. Deployment & Infrastructure

Docker Compose (Development)

services:
  frontend:
    # Vite dev server
    build: ./packages/frontend
    ports: ["5173:5173"]
    volumes: ["./packages/frontend/src:/app/src"]

  backend:
    # Fastify API + y-websocket + Better Auth (orchestrator only, no heavy math)
    build: ./packages/backend
    ports: ["3000:3000"]
    depends_on: [postgres, redis]
    environment:
      DATABASE_URL: postgresql://ramsey:ramsey@postgres:5432/ramsey
      REDIS_URL: redis://redis:6379

  solver-worker:
    # Separate process for analysis computation
    build: ./packages/backend
    command: ["node", "dist/worker/solver-worker.js"]
    depends_on: [postgres, redis]
    environment:
      DATABASE_URL: postgresql://ramsey:ramsey@postgres:5432/ramsey
      REDIS_URL: redis://redis:6379
    # Can be scaled: docker compose up --scale solver-worker=3

  postgres:
    image: postgres:16-alpine
    ports: ["5432:5432"]
    environment:
      POSTGRES_USER: ramsey
      POSTGRES_PASSWORD: ramsey
      POSTGRES_DB: ramsey
    volumes: ["pgdata:/var/lib/postgresql/data"]

  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]

  export-service:
    # Headless Chromium for PDF export
    build: ./packages/export-service
    ports: ["3001:3001"]

volumes:
  pgdata:

Production Deployment

Component	Host	Notes
Frontend (Vite SPA)	Vercel	Static files, global CDN, preview deploys
Backend (Fastify + WS)	Fly.io / Railway / VPS	Must support persistent WebSocket connections
Solver Worker	Same container host as backend	Separate container, scalable independently
PostgreSQL	Managed (Neon, Supabase, RDS) or self-hosted	Managed recommended for production
Redis	Managed (Upstash, ElastiCache) or self-hosted	Upstash has a generous free tier
Export service	Same container host as backend	Separate container, internal network

Kubernetes (Future)

Design for it now, deploy on it later:

Stateless backend containers (sessions in Redis, state in PostgreSQL)
Horizontal pod autoscaler on backend based on WebSocket connection count
Horizontal pod autoscaler on solver workers based on job queue depth
Separate deployments for API, y-websocket, solver worker, export service

13. CI/CD Pipeline

GitHub Actions Workflow

On Pull Request:
├── Lint (ESLint + Prettier)
├── Type check (tsc --noEmit)
├── Unit tests (Vitest)
├── Build check (vite build)
├── Solver golden model tests
└── Vercel preview deploy (automatic)

On Merge to Main:
├── All PR checks
├── E2E tests (Playwright against preview)
├── Vercel production deploy (frontend)
└── Docker build + push → deploy backend

On Release Tag:
└── Versioned Docker image push to container registry

14. Testing Strategy

Unit Tests (Vitest)

Analysis engine: Every solver tested against golden models with known outputs
ModelIR serializers: Diagram state → IR conversion correctness
Validation rules: Per diagram type
Permission resolver: Role resolution logic
TikZ serializer: Output matches expected LaTeX

Integration Tests

API endpoints: Auth flows, project CRUD, sharing, analysis requests
CRDT sync: Multi-client edit merging

E2E Tests (Playwright)

Core user flows: Register → create project → create diagram → add nodes → run analysis → export
Collaboration: Two browser instances editing simultaneously
Share link flow: Generate link → open in incognito → verify access level

Golden Model Test Harness

Library of known small models with published or hand-verified results
Cross-checking via dual methods (FTA cut sets vs BDD, RBD vs equivalent FTA)
Property-based tests: probabilities in [0,1], monotonicity, convergence
Numerical diagnostics: residual norms, convergence rates

15. Day-One DevOps & Quality

Tool	Purpose
ESLint + Prettier	Code formatting and linting
Husky + lint-staged	Pre-commit hooks (lint, type check)
Sentry	Error tracking (frontend + backend)
Pino (structured logging)	JSON logs with requestId, userId, projectId, diagramId context. Fastify uses Pino natively — adopt from day one
GitHub branch protection	Require PR reviews + passing CI for main
Conventional Commits	Standardized commit messages
Dependabot / Renovate	Automated dependency updates

16. Phased Implementation Roadmap

Phase 0 — Project Scaffolding

Monorepo setup (Turborepo or npm workspaces) with shared @ramsey/engine package
Vite + React + TypeScript frontend scaffold
Fastify + TypeScript backend scaffold with Pino structured logging
Prisma + PostgreSQL schema + migrations (including audit_log, analysis_jobs tables)
Docker Compose for local development (frontend, backend, solver-worker, postgres, redis)
ESLint, Prettier, Husky, lint-staged
GitHub Actions basic CI (lint + type check + build)
Better Auth integration (Google + GitHub OAuth)

Phase 1 — Core Drawing (Markov Chains)

React Flow canvas with custom Markov chain nodes/edges
Node palette sidebar (drag to create)
Edge creation (click source → click target)
Node/edge property panel (labels, rates, probabilities)
Basic validation (probability sums, unreachable states)
Auto-layout with ELK.js
Undo/redo (Yjs UndoManager)
Save/load diagram to PostgreSQL

Phase 2 — Collaboration

Yjs + y-websocket integration
Multi-user editing on same diagram
Cursor/presence awareness (user colors, positions)
Conflict-free concurrent edits
Offline editing with sync on reconnect
Diagram version history (snapshots)

Phase 3 — Projects, Teams, Sharing

Dashboard (project list, thumbnails)
Project CRUD (create, rename, delete)
Multiple diagrams per project
Team creation and member management
Project sharing (direct + link)
Permission enforcement across all endpoints
Notifications

Phase 4 — Additional Diagram Types

Fault Tree (FTA) — gate nodes, basic events, top-down layout
Event Tree (ETA) — horizontal branching
Reliability Block Diagram (RBD) — series/parallel/k-of-n blocks
Bow-Tie — combined FTA + ETA with central event
FMEA — table-based editor with TanStack Table

Phase 5 — Analysis Engine

Phase 6 — Export

SVG export (React Flow native)
PNG / JPEG export (html-to-image, configurable DPI)
PDF export (Puppeteer server-side)
LaTeX/TikZ export (per diagram type serializer)
Export dialog UI

Phase 7 — AI Assistance

AI chat panel UI
Vercel AI SDK integration with Claude
Diagram context serialization
Tool definitions for diagram manipulation
Natural language → diagram generation
Diagram Q&A (context-aware)
AI-powered validation and error checking
Streaming UX (nodes appear as AI generates)
BYO API key support

Phase 8 — Polish & Production

Dark mode
Keyboard shortcuts
Diagram templates (built-in library)
Comments/annotations on diagrams
Sentry error tracking
Performance optimization (large diagrams)
Production deployment (Vercel + container host)
Documentation / help system

Future Phases

DSL (text → diagram) — dedicated syntax per diagram type
PRISM model import
Monte Carlo / rare-event simulation
Scenario management (parameter sets with diffable results)
Public REST API for programmatic access
i18n (internationalization)
Kubernetes deployment
SAML/OIDC for enterprise SSO

17. Future Considerations

Items to Revisit

Item	Notes
DSL design	Deferred — will be added after diagram drawing works
PRISM import	Parse `.prism` files → ModelIR for migration
Monte Carlo simulation	Server-side, queued jobs, for rare-event analysis
Enterprise SSO	Generic OIDC / SAML via Better Auth
Public API	REST API for programmatic model creation and analysis
Mobile support	Desktop-first; responsive sidebar is sufficient
Kubernetes	When horizontal scaling is needed

Architectural Decisions Log

Decision	Choice	Rationale
Frontend framework	Vite + React	Fast HMR, modern ESM, TypeScript-first
Diagram library	React Flow	Purpose-built for node/edge UIs
CRDT	Yjs + y-websocket	Most mature JS CRDT, offline-first, awareness protocol
Auth	Better Auth	Framework-agnostic, organization + RBAC plugins, active development
Backend framework	Fastify	Fast, TypeScript-native, plugin system
ORM	Prisma	Type-safe, auto-migrations, PostgreSQL support
State management	Zustand	Lightweight, React Flow recommended
Layout engine	ELK.js	Covers all needed layout algorithms
AI SDK	Vercel AI SDK	Provider-agnostic, streaming, tool-use support
ID strategy	UUID everywhere	No sequential ID exposure, safe for sharing
Team roles	admin / member	Simple, extensible via Better Auth
Project roles	owner / editor / viewer	Granular per-project access
Team admin ≠ project owner	Explicit policy	Team admins manage membership, not automatic project owners
Computation	Hybrid client/server	Snappy UX for small models, server power for large
Solver boundary	Separate worker process (BullMQ)	API never computes; solvers replaceable with Rust/Python
Job queue	BullMQ (Redis-backed)	Job lifecycle, cancellation, retries, scaling
ModelIR values	ValueRef (literal / param / expr)	Enables scenarios, parameter sweeps, shared parameters
Unit handling	Normalize to base units	Consistent solver input, display in original units
Structured logging	Pino	Native to Fastify, JSON logs with request context
Audit trail	audit_log table	Compliance, trust, who-did-what traceability
Frontend hosting	Vercel	Static SPA, CDN, preview deploys
Backend hosting	Docker on Fly.io / Railway / VPS	WebSocket support, persistent processes

This document is the single source of truth for RAMSey's architecture and implementation plan. Update it as decisions evolve.

FilesExpand file tree

RAMSEY_MASTER_PLAN.md

Latest commit

History