Version: 1.1 Date: 2026-02-10 Status: Pre-development planning Changelog: v1.1 — Incorporated review feedback: solver worker boundary, analysis jobs lifecycle, ValueRef in ModelIR, initial state distributions, dependency definitions, audit log, unit enforcement, repair policy schema, numeric metadata in solver output.
- Project Overview
- Tech Stack
- Architecture
- Authentication & Authorization
- Data Model & Database Schema
- Diagram Types
- Analysis Engine
- AI Assistance
- Export Pipeline
- Real-Time Collaboration
- UI/UX Design Principles
- Deployment & Infrastructure
- CI/CD Pipeline
- Testing Strategy
- Day-One DevOps & Quality
- Phased Implementation Roadmap
- Future Considerations
RAMSey is a modern, web-based, collaborative tool for creating, analyzing, and exporting RAMS (Reliability, Availability, Maintainability, Safety) diagrams. It replaces legacy desktop tools like PRISM's editor with a real-time collaborative environment featuring AI-assisted diagram generation and publication-quality export.
- Multi-user real-time collaboration — Google Docs-like editing for RAMS diagrams
- AI copilot — Natural language to diagram, validation, Q&A about models
- Publication-quality export — LaTeX/TikZ output ready for scientific papers
- Integrated analysis — Markov solvers, fault tree analysis, importance measures, and more
- Explainable results — Every computation includes assumptions, warnings, and audit trails
- Modern UX — Minimalist, professional diagrams; fast, responsive interface
- Reliability engineers
- Safety analysts
- Academic researchers
- Students in RAMS/dependability courses
- Engineering teams performing system safety assessments
| Component | Technology | Purpose |
|---|---|---|
| Framework | React 19+ | UI framework |
| Language | TypeScript (strict mode) | Type safety |
| Build tool | Vite | Fast HMR, ESM-based builds |
| Canvas | React Flow | Node/edge diagram rendering |
| State management | Zustand | Lightweight, React Flow recommended |
| UI components | shadcn/ui | Accessible, customizable component library |
| Styling | Tailwind CSS | Utility-first CSS, dark mode support |
| Layout engine | ELK.js | Auto-layout for all diagram types |
| Client export | html-to-image | PNG/JPEG export from canvas |
| Data grid | TanStack Table | FMEA table view |
| Component | Technology | Purpose |
|---|---|---|
| Runtime | Node.js | Server runtime |
| Framework | Fastify | HTTP framework (fast, TypeScript-native) |
| Language | TypeScript (strict mode) | Type safety |
| ORM | Prisma | Type-safe database access, migrations |
| Database | PostgreSQL 16 | Primary data store |
| Cache | Redis | Session cache, pub/sub, result caching |
| Auth | Better Auth | OAuth, organizations, RBAC |
| CRDT sync | y-websocket | Real-time collaboration server |
| Job queue | BullMQ (Redis-backed) | Analysis job lifecycle, cancellation, retries |
| PDF export | Puppeteer (headless Chromium) | Server-side PDF generation |
| Component | Technology | Purpose |
|---|---|---|
| SDK | Vercel AI SDK (ai package) |
Provider-agnostic LLM integration |
| Default provider | Claude (Anthropic) | Primary LLM for AI features |
| Pattern | Tool-use agent | Diagram manipulation via function calling |
| Component | Technology | Purpose |
|---|---|---|
| Frontend hosting | Vercel | Static SPA, CDN, preview deploys |
| Backend hosting | Docker on Fly.io / Railway / VPS | Persistent server (WebSockets) |
| Containers | Docker + Docker Compose | Local dev and production |
| Orchestration | Kubernetes (future) | Horizontal scaling when needed |
| CI/CD | GitHub Actions | Automated testing and deployment |
| Error tracking | Sentry | Frontend + backend error monitoring |
┌──────────────────────────────────────────────────────────────────┐
│ Client (Browser) │
│ │
│ ┌───────────┐ ┌──────────┐ ┌───────────┐ ┌───────────────┐ │
│ │ React Flow│ │ Zustand │ │ Yjs Doc │ │ AI Chat Panel │ │
│ │ (canvas) │◄─┤ (state) │◄─┤ (CRDT) │ │ (Vercel AI) │ │
│ └─────┬─────┘ └──────────┘ └─────┬─────┘ └───────┬───────┘ │
│ │ │ │ │
│ ┌─────┴─────────────────────────────┴─────────────────┴───────┐ │
│ │ Diagram → ModelIR Serializer │ │
│ └─────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────┴───────────────────────────────────┐ │
│ │ Client Analysis Engine (Web Worker) │ │
│ │ Validation, small computations, sensitivity sliders │ │
│ └─────────────────────────┬───────────────────────────────────┘ │
│ │ (large models delegated to server) │
└────────────────────────────┼─────────────────────────────────────┘
│
┌────────────────────────────┼─────────────────────────────────────┐
│ Server │ │
│ │
│ ┌────────────┐ ┌─────────────┐ ┌────────────┐ ┌─────────────┐ │
│ │ Fastify API│ │ Better Auth │ │ y-websocket│ │ AI endpoint │ │
│ │ (REST) │ │ (OAuth/RBAC)│ │ (CRDT sync)│ │ (LLM proxy) │ │
│ └─────┬──────┘ └─────────────┘ └────────────┘ └─────────────┘ │
│ │ │
│ │ submits jobs via BullMQ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Solver Worker (separate process / container) │ │
│ │ Large CTMC, BDD-based FTA, batch runs, Monte Carlo │ │
│ │ Consumes jobs from Redis queue │ │
│ │ Designed for future migration to Rust/Python/WASM │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────┴──────────────────┐ ┌─────────────────────────────────┐ │
│ │ PostgreSQL │ │ Redis │ │
│ │ users, projects, │ │ sessions, pub/sub, job queue, │ │
│ │ diagrams, snapshots, │ │ analysis cache, rate limiting │ │
│ │ audit log, jobs │ │ │ │
│ └────────────────────────┘ └─────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Export Service (Headless Chromium container) │ │
│ └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
- Diagram is UI, Model is the product — Visual canvas and computation layer are decoupled via ModelIR
- Offline-first collaboration — Yjs CRDTs queue edits and merge on reconnect
- Hybrid computation — Small models run client-side (Web Worker), large models run server-side
- Solver as replaceable engine — API server never performs heavy math in-process. Solvers run in a separate worker process/container, communicating via job queue (BullMQ/Redis). Interface is designed so solvers can migrate to Rust/Python/WASM without breaking the platform
- AI as agent — The LLM has tools to read and manipulate diagrams, not just chat
- Plugin-ready diagram types — Each diagram type is a self-contained module with its own nodes, edges, validation, layout, and serializer
- Stateless backend containers — Sessions in Redis, CRDT state in PostgreSQL, enabling horizontal scaling
- Everything must be reproducible — Every analysis result stores ModelIR schema version, solver version, options, tolerances, and computation timestamp. Researchers can cite exact solver configurations in publications
- Framework-agnostic, works with Fastify
- Prisma adapter for PostgreSQL
- Built-in plugins:
organization(teams),rbac(permissions),two-factor(future) - Session-based auth with secure httpOnly cookies
| Provider | Priority | Audience |
|---|---|---|
| v1 | Universal | |
| GitHub | v1 | Developers, researchers |
| Microsoft / Azure AD | v1 | University, enterprise SSO |
| Generic OIDC | v2 | Institutional identity providers |
| Email / password | v1 | Fallback with email verification |
Browser (Vite SPA)
│
├── Auth pages (login/register) → calls Better Auth endpoints
│ hosted on Fastify server
├── OAuth redirect flow → handled by Better Auth
│
└── API calls with session cookie → Fastify validates session
via Better Auth middleware
Better Auth's organization plugin manages:
- Teams (called "organizations" in Better Auth)
- Members with roles
- Invitations (by email or invite link)
Team roles:
| Role | Permissions |
|---|---|
admin |
Manage members, delete projects, full control over team resources |
member |
Create/edit diagrams in team projects, cannot manage team settings |
Separate from team roles — applied per-project:
| Role | Permissions |
|---|---|
owner |
Full control, delete project, manage shares |
editor |
Create/edit/delete diagrams within the project |
viewer |
Read-only access to all diagrams in the project |
When a user accesses a project:
1. Is user the project creator? → owner
2. Has direct project_share entry? → use that role
3. Is project owned by a team?
└── Is user a team member?
├── team role = admin → editor (of project content)
└── team role = member → editor
4. Has valid share_link token in URL? → use link's role
5. None of the above → 403 Forbidden
Policy note: Team admin grants team management powers (members, settings, delete team projects) but does NOT automatically grant owner on every project. Project ownership is explicit — only the creator or someone explicitly granted owner via project_shares can delete a project or manage its shares. This prevents team admins from silently becoming owners of all team content.
- Project owners can generate a share link with a role (
editororviewer) - Links contain a UUID token:
/invite/:token - Links can be set to expire or deactivated
- Opening a share link grants the user the specified role via
project_shares
All IDs are UUIDs. Timestamps use timestamptz.
These are created and managed by Better Auth + plugins:
-- Better Auth core
users (id, email, name, image, emailVerified, createdAt, updatedAt)
sessions (id, userId, token, expiresAt, ...)
accounts (id, userId, provider, providerAccountId, ...)
verifications (id, identifier, value, expiresAt, ...)
-- Better Auth organization plugin
organizations (id, name, slug, logo, metadata, createdAt)
members (id, organizationId, userId, role, createdAt)
invitations (id, organizationId, email, role, status, ...)-- ─── Projects ───
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
owner_type VARCHAR(20) NOT NULL CHECK (owner_type IN ('user', 'team')),
owner_id UUID NOT NULL, -- references users.id OR organizations.id
created_by UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ─── Diagrams ───
CREATE TABLE diagrams (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
name VARCHAR(255) NOT NULL,
type VARCHAR(50) NOT NULL CHECK (type IN (
'markov_chain',
'fault_tree',
'event_tree',
'reliability_block_diagram',
'bow_tie',
'fmea'
)),
yjs_state BYTEA, -- persisted Yjs document
thumbnail BYTEA, -- dashboard preview image
created_by UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ─── Diagram Snapshots (version history) ───
CREATE TABLE diagram_snapshots (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
diagram_id UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
yjs_state BYTEA NOT NULL,
label VARCHAR(255), -- optional named version
created_by UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ─── Project Sharing (direct user access) ───
CREATE TABLE project_shares (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
role VARCHAR(20) NOT NULL CHECK (role IN ('owner', 'editor', 'viewer')),
granted_by UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(project_id, user_id)
);
-- ─── Share Links (URL-based access) ───
CREATE TABLE share_links (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
token UUID UNIQUE NOT NULL DEFAULT gen_random_uuid(),
project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
role VARCHAR(20) NOT NULL CHECK (role IN ('editor', 'viewer')),
created_by UUID NOT NULL REFERENCES users(id),
expires_at TIMESTAMPTZ, -- NULL = never expires
is_active BOOLEAN NOT NULL DEFAULT true,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ─── Comments / Annotations ───
CREATE TABLE comments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
diagram_id UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES users(id),
node_id VARCHAR(255), -- anchored to a specific node (nullable)
position_x DOUBLE PRECISION, -- canvas position (nullable)
position_y DOUBLE PRECISION,
content TEXT NOT NULL,
resolved BOOLEAN NOT NULL DEFAULT false,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ─── Notifications ───
CREATE TABLE notifications (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
type VARCHAR(50) NOT NULL CHECK (type IN (
'project_shared',
'team_invite',
'comment_added',
'comment_resolved',
'analysis_complete',
'mention'
)),
payload JSONB NOT NULL, -- flexible data per notification type
read BOOLEAN NOT NULL DEFAULT false,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ─── Diagram Templates ───
CREATE TABLE diagram_templates (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
type VARCHAR(50) NOT NULL, -- same enum as diagrams.type
model_ir JSONB NOT NULL, -- template stored as ModelIR
is_builtin BOOLEAN NOT NULL DEFAULT false, -- system vs user-created
created_by UUID REFERENCES users(id), -- NULL for built-in
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ─── Analysis Jobs (lifecycle tracking) ───
CREATE TABLE analysis_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
diagram_id UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
requested_by UUID NOT NULL REFERENCES users(id),
content_hash VARCHAR(64) NOT NULL, -- SHA-256 of ModelIR + options
method VARCHAR(50) NOT NULL, -- analysis method name
options JSONB NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'queued'
CHECK (status IN ('queued', 'running', 'succeeded', 'failed', 'canceled')),
progress DOUBLE PRECISION DEFAULT 0
CHECK (progress >= 0 AND progress <= 1),
priority INTEGER NOT NULL DEFAULT 0,
worker_id VARCHAR(100), -- identifies which worker picked up the job
error_message TEXT,
error_stack TEXT,
queued_at TIMESTAMPTZ NOT NULL DEFAULT now(),
started_at TIMESTAMPTZ,
finished_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- ─── Analysis Results (cached) ───
CREATE TABLE analysis_results (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
job_id UUID REFERENCES analysis_jobs(id), -- link to job that produced this
diagram_id UUID NOT NULL REFERENCES diagrams(id) ON DELETE CASCADE,
content_hash VARCHAR(64) NOT NULL, -- SHA-256 of ModelIR + options
solver_name VARCHAR(100) NOT NULL,
solver_version VARCHAR(20) NOT NULL,
options JSONB NOT NULL,
results JSONB NOT NULL,
trace JSONB NOT NULL, -- assumptions, warnings, audit
numeric_metadata JSONB NOT NULL, -- method, tolerance, iterations, residual norms
warnings JSONB,
error_bounds JSONB, -- { lower, upper } where applicable
compute_time_ms INTEGER NOT NULL,
executed_on VARCHAR(20) NOT NULL CHECK (executed_on IN ('client', 'server')),
created_by UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE(diagram_id, content_hash)
);
-- ─── Audit Log ───
CREATE TABLE audit_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES users(id), -- NULL for system events
action VARCHAR(100) NOT NULL, -- e.g., 'project.create', 'share_link.use'
object_type VARCHAR(50) NOT NULL, -- e.g., 'project', 'diagram', 'team'
object_id UUID,
metadata JSONB, -- action-specific details
ip_address INET,
session_id UUID,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Audited events:
-- auth.login, auth.logout, auth.oauth_connect
-- project.create, project.delete, project.update
-- diagram.create, diagram.delete
-- share_link.create, share_link.use, share_link.revoke
-- project_share.grant, project_share.revoke
-- team.create, team.member_add, team.member_remove, team.role_change
-- analysis.submit, analysis.complete, analysis.fail
-- export.generate, export.download
-- ─── Indexes ───
CREATE INDEX idx_diagrams_project ON diagrams(project_id);
CREATE INDEX idx_diagrams_type ON diagrams(type);
CREATE INDEX idx_snapshots_diagram ON diagram_snapshots(diagram_id);
CREATE INDEX idx_project_shares_user ON project_shares(user_id);
CREATE INDEX idx_project_shares_project ON project_shares(project_id);
CREATE INDEX idx_share_links_token ON share_links(token);
CREATE INDEX idx_comments_diagram ON comments(diagram_id);
CREATE INDEX idx_notifications_user ON notifications(user_id, read);
CREATE INDEX idx_analysis_results_hash ON analysis_results(diagram_id, content_hash);
CREATE INDEX idx_analysis_jobs_diagram ON analysis_jobs(diagram_id);
CREATE INDEX idx_analysis_jobs_status ON analysis_jobs(status);
CREATE INDEX idx_analysis_jobs_hash ON analysis_jobs(content_hash);
CREATE INDEX idx_audit_log_user ON audit_log(user_id, created_at);
CREATE INDEX idx_audit_log_object ON audit_log(object_type, object_id);
CREATE INDEX idx_audit_log_action ON audit_log(action, created_at);| Type | UI Model | Layout Algorithm |
|---|---|---|
| Markov Chain | Nodes (states) + edges (transitions) | Force-directed (ELK.js) |
| Fault Tree (FTA) | Tree with logic gate nodes | Top-down hierarchical (ELK.js) |
| Event Tree (ETA) | Horizontal branching tree | Left-to-right (ELK.js) |
| Reliability Block Diagram (RBD) | Blocks in series/parallel/k-of-n | Left-to-right flow (ELK.js) |
| Bow-Tie | Fault tree + central event + event tree | Symmetric left-center-right (ELK.js) |
| FMEA | Data table (not a canvas diagram) | N/A — uses TanStack Table |
Each diagram type is a self-contained module:
src/diagram-types/
├── index.ts // diagram type registry
├── markov-chain/
│ ├── nodes/ // custom React Flow node components
│ │ ├── StateNode.tsx // circle node for states
│ │ └── index.ts
│ ├── edges/ // custom edge components
│ │ └── TransitionEdge.tsx // labeled edge with rate/probability
│ ├── toolbar/ // type-specific toolbar items
│ ├── validation.ts // outgoing probabilities sum to 1, etc.
│ ├── serializer.ts // React Flow state → ModelIR
│ ├── layout.ts // ELK.js config for this type
│ ├── tikz-export.ts // ModelIR → TikZ code
│ └── index.ts // diagram type definition
├── fault-tree/
│ ├── nodes/
│ │ ├── GateNode.tsx // AND/OR/k-of-n/NOT gates
│ │ ├── BasicEventNode.tsx
│ │ └── TopEventNode.tsx
│ ├── ...
├── event-tree/
│ ├── ...
├── rbd/
│ ├── ...
├── bow-tie/
│ ├── ...
└── fmea/
├── columns.ts // TanStack Table column definitions
├── validation.ts
├── serializer.ts
└── ...
interface DiagramTypeDefinition {
id: DiagramType
name: string
description: string
icon: React.ComponentType
// React Flow configuration
nodeTypes: Record<string, React.ComponentType>
edgeTypes: Record<string, React.ComponentType>
// Default nodes/edges for a new diagram
defaultContent: () => { nodes: Node[]; edges: Edge[] }
// Toolbar items specific to this type
toolbarItems: ToolbarItem[]
// Validation rules
validate: (nodes: Node[], edges: Edge[]) => ValidationResult
// Convert to ModelIR for analysis
serialize: (nodes: Node[], edges: Edge[]) => ModelIR
// Auto-layout configuration for ELK.js
layoutOptions: ElkLayoutOptions
// TikZ export
toTikZ: (modelIR: ModelIR) => string
}Every diagram type serializes to a shared Intermediate Representation (ModelIR). Solvers consume only ModelIR. This enables:
- Cross-diagram consistency
- Fewer duplicated rules
- New diagram types without rewriting math
- Cross-type conversions (e.g., RBD ↔ equivalent FTA)
All numeric properties in the IR use ValueRef instead of raw numbers. This enables named parameters, scenario management, and expressions:
// ValueRef replaces raw numbers throughout the IR.
// Allows literal values, parameter references, or expressions.
type ValueRef =
| number // literal: 0.001
| { param: string } // reference: { param: "lambda_1" }
| { expr: string } // expression: { expr: "2 * lambda_1" }All values in the IR carry units. The normalization step converts to base units before solving:
interface UnitConfig {
timeBase: 'hours' | 'days' | 'years' // solver operates in this unit
rateBase: '1/h' | '1/d' | '1/y' // derived from timeBase
}- User-facing values store original units for display
- Normalization converts to
timeBasebefore any solver runs - Solver results are returned in base units, then converted for display
interface ModelIR {
version: string // schema version for reproducibility
type: DiagramType
unitConfig: UnitConfig // base units for this model
components: Component[]
events: Event[]
gates: Gate[] // FTA, Bow-tie
states: State[] // Markov
transitions: Transition[] // Markov, ETA
blocks: Block[] // RBD
barriers: Barrier[] // Bow-tie
dependencies: Dependency[]
parameters: Parameter[] // named values (lambda, mu, etc.)
distributions: Distribution[] // exponential, Weibull, etc.
initialCondition: InitialCondition // how the model starts
missionTime?: ValueRef
repairPolicy?: RepairPolicy
}
interface Component {
id: string
name: string
failureRate?: ValueRef // ValueRef, not number
repairRate?: ValueRef // ValueRef, not number
distribution?: DistributionRef
metadata: Record<string, unknown>
}
interface State {
id: string
label: string
type: 'operational' | 'degraded' | 'failed' | 'absorbing'
}
// Initial condition supports single state or distribution across states
type InitialCondition =
| { type: 'single'; stateId: string }
| { type: 'distribution'; probabilities: Record<string, number> }
interface Transition {
id: string
from: string
to: string
rate?: ValueRef // ValueRef, not number
probability?: ValueRef // ValueRef, not number
label?: string
condition?: string
}
interface Gate {
id: string
type: 'AND' | 'OR' | 'NOT' | 'K_OF_N' | 'XOR'
k?: number // for K_OF_N
inputs: string[] // child event/gate IDs
output: string // parent event ID
}
interface Parameter {
name: string // e.g., "lambda_1"
value: number
unit: string // e.g., "1/h" — required, not optional
description?: string
}
interface Distribution {
id: string
type: 'exponential' | 'weibull' | 'lognormal' | 'constant'
params: Record<string, ValueRef> // ValueRef, not number
}Dependencies model correlations and shared-cause relationships between components. Critical for realistic FTA, RBD, and bow-tie analysis:
type Dependency =
| CommonCauseFailure
| FunctionalDependency
| ConditionalProbability
| InhibitCondition
// Components that share a common failure cause (Beta-factor, MGL, Alpha-factor)
interface CommonCauseFailure {
type: 'common_cause'
id: string
group: string[] // component IDs in the CCF group
model: 'beta_factor' | 'mgl' | 'alpha_factor'
params: Record<string, ValueRef> // e.g., { beta: 0.1 }
}
// Component B fails whenever component A fails
interface FunctionalDependency {
type: 'functional'
id: string
source: string // component that triggers
targets: string[] // components that depend on it
}
// Probability of B given A (for event trees, dependent branches)
interface ConditionalProbability {
type: 'conditional'
id: string
given: string // conditioning event/component
target: string // dependent event/component
probability: ValueRef
}
// Inhibit gate condition (FTA)
interface InhibitCondition {
type: 'inhibit'
id: string
gate: string // gate ID
condition: string // condition event ID
probability: ValueRef
}Defines how repairs are modeled. Schema supports future complexity; v1 implements unlimited only:
interface RepairPolicy {
type: 'unlimited' | 'single_repairman' | 'priority_queue' | 'k_repairmen'
maxSimultaneousRepairs?: number // for k_repairmen
priorityOrder?: string[] // component IDs in repair priority
preemptive?: boolean // can a higher-priority repair interrupt?
}
// v1 default: { type: 'unlimited' }
// Schema allows future extension without breaking changesinterface AnalyzeRequest {
modelIR: ModelIR
method: AnalysisMethod
options: AnalysisOptions
executionTarget: 'client' | 'server' | 'auto'
}
type AnalysisMethod =
// Markov
| 'steady_state'
| 'transient'
| 'mttf'
| 'availability'
// Fault tree
| 'minimal_cut_sets'
| 'top_event_probability'
| 'importance_measures'
// Event tree
| 'outcome_probabilities'
| 'path_ranking'
// RBD
| 'system_reliability'
| 'system_availability'
// Bow-tie
| 'end_state_frequencies'
| 'barrier_effectiveness'
// FMEA
| 'rpn_calculation'
| 'criticality_analysis'
// General
| 'validate'
interface AnalysisOptions {
tolerance?: number // convergence tolerance
maxIterations?: number
truncationLimit?: number // cut set order limit
timePoints?: number[] // for transient analysis
missionTime?: number
confidenceLevel?: number
method?: string // solver algorithm selection
}
interface AnalyzeResponse {
status: 'success' | 'warning' | 'error'
solver: {
name: string
version: string
}
modelIRVersion: string // schema version used
contentHash: string // SHA-256 of ModelIR + options
metrics: Record<string, number | number[]>
contributions?: ContributionTable[]
cutSets?: CutSet[]
importanceMeasures?: ImportanceMeasure[]
// Structured numeric metadata — mandatory for every result
numericMetadata: {
method: string // e.g., "uniformization", "sparse_lu", "bdd"
tolerance: number // convergence tolerance used
iterations?: number // iterations to convergence
residualNorm?: number // final residual norm
truncation?: {
enabled: boolean
threshold: number // e.g., 1e-12
cutSetsDropped?: number // how many cut sets were truncated
}
stiffnessDetected?: boolean // for CTMC solvers
methodAutoSelected?: boolean // was the method chosen automatically?
}
trace: {
assumptions: string[] // e.g., "component independence assumed"
normalizations: string[] // e.g., "rates converted from 1/d to 1/h"
unitConversions: string[] // e.g., "lambda_1: 0.024/d → 0.001/h"
simplifications: string[] // e.g., "k-of-n gate expanded to OR/AND"
methodDetails: string // human-readable explanation of method
}
warnings: Warning[]
errorBounds?: {
lower: number
upper: number
description: string // e.g., "cut set truncation error bound"
}
computeTimeMs: number
timestamp: string // ISO 8601 — when this result was computed
}
interface Warning {
severity: 'info' | 'warning' | 'error'
code: string
message: string
affectedComponents: string[]
}| Analysis | Method | Notes |
|---|---|---|
| Steady-state | Solve piQ = 0 with normalization | Sparse linear solvers for large models |
| Transient | Uniformization (default) | Krylov methods for large sparse systems |
| MTTF | Absorbing CTMC methods | Solve linear system for expected absorption time |
| Availability | Steady-state P(operational states) | Sum of operational + degraded state probabilities |
Robustness features:
- Automatic stiffness detection with method selection
- State-space explosion mitigation: basic lumping, symmetry reduction
- "Model size / confidence" panel predicting runtime and suggesting simplifications
| Analysis | Method | Notes |
|---|---|---|
| Minimal cut sets | BDD-based (exact) for moderate size | Approximate/truncated for large trees |
| Top event probability | Exact via BDD when possible | Rare-event approximation as labeled option |
| Importance measures | Birnbaum, Fussell-Vesely, RAW, RRW | Computed from cut set results |
Differentiator: Show error bounds when truncating cut sets, show which cut sets were dropped.
| Analysis | Method | Notes |
|---|---|---|
| Outcome probabilities | Conditional probability propagation | Handle dependencies between branches |
| Path ranking | Ranked by probability | Top contributing paths |
Differentiator: "What-if branch probability slider" for fast recomputation.
| Analysis | Method | Notes |
|---|---|---|
| System reliability (non-repairable) | Combinatorial (series/parallel/k-of-n) | Support exponential + Weibull distributions |
| System availability (repairable) | CTMC or Markov approximation | Auto-convert to CTMC for repairable k-of-n |
Differentiator: Auto-convert RBD ↔ equivalent FTA with side-by-side comparison.
- Combined IR linking initiating event frequency, preventive barriers, and mitigative barriers
- Compute end-state frequencies directly with attribution
- Differentiator: Barrier management dashboard showing risk reduction per barrier
| Analysis | Method | Notes |
|---|---|---|
| RPN | Severity x Occurrence x Detection | Classic method, explicitly labeled |
| Criticality | MIL-STD-1629 style | Alternative to RPN |
| Custom scoring | Configurable weights | User-defined risk formula |
Differentiator: Connect FMEA items to model components, show how improvements change system-level risk.
The API server (Fastify) is strictly an orchestrator. It accepts analysis requests, validates them, submits jobs, and returns results. All heavy math runs in separate processes.
- Runs in dedicated Web Worker to keep UI responsive
- Handles: validation, normalization, small/medium computations
- Threshold: models with < ~50 states (Markov) or < ~100 basic events (FTA)
- Used for: sensitivity sliders, instant recalc, FMEA RPN
- Same
analyze(modelIR, options) → resultinterface as server-side
- Runs as a separate process or container, NOT inside Fastify
- Communicates via BullMQ (Redis-backed job queue)
- Handles: large CTMC, exact BDD-based FTA, Monte Carlo (future), batch runs
- Result caching via Redis:
hash(ModelIR + options) → cached result - Audit logging for reproducibility and compliance
Solver Worker boundary is designed for future migration:
- v1: Node.js worker process (same language, separate process)
- Future: Rust (performance), Python/scipy (numerical libraries), or WASM
- The interface (
AnalyzeRequest → AnalyzeResponse) stays the same regardless of implementation language
Client submits analysis request
│
▼
Fastify API validates request
│
├── Small model? → return to client for Web Worker execution
│
└── Large model? → submit to BullMQ job queue
│
▼
Solver Worker picks up job
├── status: queued → running
├── reports progress (0..1)
├── on success: status → succeeded, result cached
└── on failure: status → failed, error logged
│
▼
Client polls or receives WebSocket notification
Result returned from cache
Job API endpoints:
POST /api/analysis/jobs— submit analysis jobGET /api/analysis/jobs/:id— poll status + progressPOST /api/analysis/jobs/:id/cancel— cancel running/queued jobGET /api/analysis/jobs/:id/result— retrieve cached result
executionTarget: 'auto' →
if (stateCount < 50 && !isStiff) → client (Web Worker)
if (stateCount < 500) → server (sync worker)
if (stateCount >= 500 || monteCarlo) → server (queued job)
User sees: "Running locally..." or "Running on server (estimated ~3s)..." or "Queued (position 2)..."
| Need | Library | Purpose |
|---|---|---|
| Matrix operations | mathjs or ml-matrix | Steady-state solving, transient analysis |
| BDD | Custom implementation | Exact fault tree cut set enumeration |
| Graph algorithms | graphology | Reachability, cycle detection, components |
| Distributions | jstat | Weibull, exponential, lognormal |
The engine is a shared package used by both the client Web Worker and the server Solver Worker. This ensures identical results regardless of where computation runs.
packages/engine/ // shared package (no framework dependencies)
├── src/
│ ├── ir/
│ │ ├── schema.ts // ModelIR TypeScript types + ValueRef
│ │ ├── validate.ts // IR validation (structural correctness)
│ │ ├── normalize.ts // Canonicalization: unit conversion,
│ │ │ // expand k-of-n, resolve ValueRefs,
│ │ │ // convert to base units
│ │ └── units.ts // Unit conversion utilities
│ ├── serializers/
│ │ ├── markov.ts // React Flow state → ModelIR
│ │ ├── fault-tree.ts
│ │ ├── event-tree.ts
│ │ ├── rbd.ts
│ │ ├── bow-tie.ts
│ │ └── fmea.ts
│ ├── solvers/
│ │ ├── interface.ts // Shared Solver interface:
│ │ │ // analyze(ModelIR, options) → AnalyzeResponse
│ │ ├── registry.ts // Solver registry (method → solver mapping)
│ │ ├── markov/
│ │ │ ├── steady-state.ts
│ │ │ ├── transient.ts
│ │ │ └── mttf.ts
│ │ ├── fta/
│ │ │ ├── cut-sets.ts
│ │ │ ├── probability.ts
│ │ │ └── importance.ts
│ │ ├── eta/
│ │ │ └── outcome-probability.ts
│ │ ├── rbd/
│ │ │ ├── reliability.ts
│ │ │ └── availability.ts
│ │ ├── bow-tie/
│ │ │ └── end-state-frequency.ts
│ │ └── fmea/
│ │ ├── rpn.ts
│ │ └── criticality.ts
│ └── test-harness/
│ ├── golden-models/ // Known models with verified outputs
│ └── cross-check.ts // Dual-method verification
│
├── package.json // standalone package, no React/Fastify deps
packages/frontend/
├── src/engine/
│ └── worker.ts // Web Worker: imports from @ramsey/engine
packages/backend/
├── src/worker/
│ ├── solver-worker.ts // BullMQ worker: imports from @ramsey/engine
│ └── job-queue.ts // BullMQ queue setup + job submission
- Golden models: Library of known small models with published/hand-verified outputs
- Cross-checking: Compute via two methods where possible (FTA cut sets vs BDD, RBD vs equivalent FTA)
- Property tests: Probabilities in [0,1], monotonicity under component improvement
- Numeric diagnostics: Convergence flags, residual norms, truncation error estimates
The AI chat is not just Q&A — it's an agent with tools to read, manipulate, and analyze diagrams.
| Capability | Description |
|---|---|
| Natural language → Diagram | "Create a Markov chain for a redundant pump system with repair" |
| Q&A about diagram | "Is this chain irreducible?" / "What's the MTTF?" |
| DSL → Diagram | Paste DSL in chat, AI parses and generates (when DSL module is built) |
| Validation & error checking | "Check my diagram for problems" — finds structural issues, suggests fixes |
The AI agent has access to these tools via Vercel AI SDK function calling:
// Diagram manipulation
add_node(type, label, position?, properties?)
add_edge(from, to, label?, properties?)
remove_node(id)
remove_edge(id)
update_node(id, changes)
update_edge(id, changes)
auto_layout()
select_nodes(ids[])
clear_diagram()
// Analysis (calls actual solver, no hallucinated math)
run_steady_state()
run_transient(time)
run_mttf()
run_availability()
run_cut_sets()
run_rpn()
validate_diagram()
// Context
get_diagram_state()
get_selected_nodes()
get_diagram_metadata()Before each AI request, the diagram is serialized into a compact text format:
Current diagram (Markov Chain): "Redundant Pump System"
States: S0("Both OK", initial), S1("One failed"), S2("Both failed", absorbing)
Transitions: S0->S1(rate=2*lambda), S1->S2(rate=lambda), S2->S1(rate=mu), S1->S0(rate=mu)
Parameters: lambda=0.001, mu=0.05
When the AI generates a diagram:
- AI text response streams in the chat panel
- Nodes and edges appear on canvas as each tool call resolves
- Final auto-layout once generation completes
The user watches the AI "draw" the diagram in real-time.
Client Server
────── ──────
Chat UI POST /api/ai/chat
│ │
│ message + diagram state ────▶ │ system prompt
│ + conversation history │ + diagram context
│ │ + tool definitions
│ │
│ streaming response + tool ◄──── │ streams from LLM
│ calls │ executes analysis tools
│ │ server-side
│ applies diagram changes │
│ via Yjs (synced to all │
│ collaborators) │
- Platform key: RAMSey provides AI features via a shared key, usage-limited per user/team
- BYO key: Users can configure their own Claude/OpenAI API key in settings for unlimited use
| Format | Method | Notes |
|---|---|---|
| SVG | React Flow toSVG() + style cleanup |
Clean vector output, infinite scalability |
| PNG | html-to-image |
Configurable DPI (1x, 2x, 4x) |
| JPEG | html-to-image |
Configurable DPI, quality setting |
| Format | Method | Notes |
|---|---|---|
| Puppeteer (headless Chromium) | Exact rendering with vector graphics | |
| LaTeX/TikZ | Custom serializer per diagram type | Publication-ready TikZ code |
Each diagram type has a dedicated TikZ serializer. Example Markov chain output:
\begin{tikzpicture}[->, >=stealth', auto, node distance=3cm,
thick, every state/.style={circle, draw, minimum size=1.2cm}]
\node[state] (S0) {$S_0$};
\node[state] (S1) [right of=S0] {$S_1$};
\node[state] (S2) [below of=S1] {$S_2$};
\path (S0) edge [bend left] node {$\lambda_1$} (S1)
(S1) edge [bend left] node {$\mu_1$} (S0)
(S1) edge node {$\lambda_2$} (S2)
(S2) edge [bend right] node {$\mu_2$} (S0);
\end{tikzpicture}Users can paste directly into Overleaf or any LaTeX editor.
Runs headless Chromium in isolation. Receives diagram state, renders, exports. Keeps the main backend lean.
Export Diagram
─────────────────────────────
Format: [SVG] [PNG] [JPEG] [PDF] [LaTeX]
Scale: [1x] [2x] [4x] (PNG/JPEG only)
Background: [Transparent] [White] (PNG/SVG only)
Include: [x] Labels [x] Probabilities [ ] Grid
[x] Title [ ] Legend
─────────────────────────────
[Export]
- y-websocket handles WebSocket sync, awareness (cursors/presence), and reconnection
- Offline-first: Edits queue locally and merge on reconnect
- Sub-documents: Large diagrams can be split into sync-able chunks
- UndoManager: Built-in undo/redo per user session
- Cursor positions on canvas — see where collaborators are pointing
- User colors — each collaborator gets a distinct color
- Selection highlights — see what others have selected
- Presence indicators — user avatars in the toolbar showing who's online
- Yjs document state periodically persisted to PostgreSQL (
diagrams.yjs_state) - Named snapshots saved to
diagram_snapshotsfor version history - Auto-save interval: configurable (default: every 30 seconds of inactivity)
- Manual "Save version" button for named snapshots
For multiple backend instances:
- Redis pub/sub for cross-instance y-websocket message relay
- Stateless backend containers — all persistent state in PostgreSQL + Redis
- Minimalist and professional — diagrams must look publication-ready
- Thin lines, muted colors, no drop shadows or gradients
- Serif/math fonts for labels (consistent with LaTeX output)
- Clean white or transparent backgrounds
- Color palette: muted blues, grays, with red/amber for warnings/failures
┌─────────────────────────────────────────────────────────────┐
│ Toolbar: [File] [Edit] [View] [Diagram] [Analysis] [AI] │
│ ────────────────────────────────────────────────────────── │
│ ┌──────┐ ┌──────────────────────────┐ ┌────────────────┐ │
│ │ │ │ │ │ │ │
│ │ Side │ │ Diagram Canvas │ │ AI Chat / │ │
│ │ bar │ │ (React Flow) │ │ Properties / │ │
│ │ │ │ │ │ Analysis │ │
│ │ Node │ │ │ │ Panel │ │
│ │ palet│ │ │ │ │ │
│ │ te + │ │ │ │ (collapsible │ │
│ │ props│ │ │ │ tabs) │ │
│ │ │ │ │ │ │ │
│ └──────┘ └──────────────────────────┘ └────────────────┘ │
│ Status bar: [Collaborators] [Zoom] [Connection] [Autosave] │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ RAMSey [My Projects] [Teams] [Templates] [Profile] │
│ ────────────────────────────────────────────────────────── │
│ │
│ Recent Projects │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ thumbnail │ │ thumbnail │ │ + │ │
│ │ │ │ │ │ New │ │
│ │ Project A │ │ Project B │ │ Project │ │
│ │ 3 diagrams│ │ 1 diagram │ │ │ │
│ │ Team Alpha│ │ Personal │ │ │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
/dashboard Personal overview
/project/:projectId Project view (list diagrams)
/project/:projectId/d/:diagramId Diagram editor
/team/:teamSlug Team dashboard
/team/:teamSlug/settings Team management
/invite/:token Share link entry point
/templates Browse templates
/login Auth pages
/register
services:
frontend:
# Vite dev server
build: ./packages/frontend
ports: ["5173:5173"]
volumes: ["./packages/frontend/src:/app/src"]
backend:
# Fastify API + y-websocket + Better Auth (orchestrator only, no heavy math)
build: ./packages/backend
ports: ["3000:3000"]
depends_on: [postgres, redis]
environment:
DATABASE_URL: postgresql://ramsey:ramsey@postgres:5432/ramsey
REDIS_URL: redis://redis:6379
solver-worker:
# Separate process for analysis computation
build: ./packages/backend
command: ["node", "dist/worker/solver-worker.js"]
depends_on: [postgres, redis]
environment:
DATABASE_URL: postgresql://ramsey:ramsey@postgres:5432/ramsey
REDIS_URL: redis://redis:6379
# Can be scaled: docker compose up --scale solver-worker=3
postgres:
image: postgres:16-alpine
ports: ["5432:5432"]
environment:
POSTGRES_USER: ramsey
POSTGRES_PASSWORD: ramsey
POSTGRES_DB: ramsey
volumes: ["pgdata:/var/lib/postgresql/data"]
redis:
image: redis:7-alpine
ports: ["6379:6379"]
export-service:
# Headless Chromium for PDF export
build: ./packages/export-service
ports: ["3001:3001"]
volumes:
pgdata:| Component | Host | Notes |
|---|---|---|
| Frontend (Vite SPA) | Vercel | Static files, global CDN, preview deploys |
| Backend (Fastify + WS) | Fly.io / Railway / VPS | Must support persistent WebSocket connections |
| Solver Worker | Same container host as backend | Separate container, scalable independently |
| PostgreSQL | Managed (Neon, Supabase, RDS) or self-hosted | Managed recommended for production |
| Redis | Managed (Upstash, ElastiCache) or self-hosted | Upstash has a generous free tier |
| Export service | Same container host as backend | Separate container, internal network |
Design for it now, deploy on it later:
- Stateless backend containers (sessions in Redis, state in PostgreSQL)
- Horizontal pod autoscaler on backend based on WebSocket connection count
- Horizontal pod autoscaler on solver workers based on job queue depth
- Separate deployments for API, y-websocket, solver worker, export service
On Pull Request:
├── Lint (ESLint + Prettier)
├── Type check (tsc --noEmit)
├── Unit tests (Vitest)
├── Build check (vite build)
├── Solver golden model tests
└── Vercel preview deploy (automatic)
On Merge to Main:
├── All PR checks
├── E2E tests (Playwright against preview)
├── Vercel production deploy (frontend)
└── Docker build + push → deploy backend
On Release Tag:
└── Versioned Docker image push to container registry
- Analysis engine: Every solver tested against golden models with known outputs
- ModelIR serializers: Diagram state → IR conversion correctness
- Validation rules: Per diagram type
- Permission resolver: Role resolution logic
- TikZ serializer: Output matches expected LaTeX
- API endpoints: Auth flows, project CRUD, sharing, analysis requests
- CRDT sync: Multi-client edit merging
- Core user flows: Register → create project → create diagram → add nodes → run analysis → export
- Collaboration: Two browser instances editing simultaneously
- Share link flow: Generate link → open in incognito → verify access level
- Library of known small models with published or hand-verified results
- Cross-checking via dual methods (FTA cut sets vs BDD, RBD vs equivalent FTA)
- Property-based tests: probabilities in [0,1], monotonicity, convergence
- Numerical diagnostics: residual norms, convergence rates
| Tool | Purpose |
|---|---|
| ESLint + Prettier | Code formatting and linting |
| Husky + lint-staged | Pre-commit hooks (lint, type check) |
| Sentry | Error tracking (frontend + backend) |
| Pino (structured logging) | JSON logs with requestId, userId, projectId, diagramId context. Fastify uses Pino natively — adopt from day one |
| GitHub branch protection | Require PR reviews + passing CI for main |
| Conventional Commits | Standardized commit messages |
| Dependabot / Renovate | Automated dependency updates |
- Monorepo setup (Turborepo or npm workspaces) with shared
@ramsey/enginepackage - Vite + React + TypeScript frontend scaffold
- Fastify + TypeScript backend scaffold with Pino structured logging
- Prisma + PostgreSQL schema + migrations (including audit_log, analysis_jobs tables)
- Docker Compose for local development (frontend, backend, solver-worker, postgres, redis)
- ESLint, Prettier, Husky, lint-staged
- GitHub Actions basic CI (lint + type check + build)
- Better Auth integration (Google + GitHub OAuth)
- React Flow canvas with custom Markov chain nodes/edges
- Node palette sidebar (drag to create)
- Edge creation (click source → click target)
- Node/edge property panel (labels, rates, probabilities)
- Basic validation (probability sums, unreachable states)
- Auto-layout with ELK.js
- Undo/redo (Yjs UndoManager)
- Save/load diagram to PostgreSQL
- Yjs + y-websocket integration
- Multi-user editing on same diagram
- Cursor/presence awareness (user colors, positions)
- Conflict-free concurrent edits
- Offline editing with sync on reconnect
- Diagram version history (snapshots)
- Dashboard (project list, thumbnails)
- Project CRUD (create, rename, delete)
- Multiple diagrams per project
- Team creation and member management
- Project sharing (direct + link)
- Permission enforcement across all endpoints
- Notifications
- Fault Tree (FTA) — gate nodes, basic events, top-down layout
- Event Tree (ETA) — horizontal branching
- Reliability Block Diagram (RBD) — series/parallel/k-of-n blocks
- Bow-Tie — combined FTA + ETA with central event
- FMEA — table-based editor with TanStack Table
- ModelIR schema with ValueRef, dependencies, repair policy, unit config
- IR validation + normalization (unit conversion, ValueRef resolution, canonicalization)
- Solver interface (
analyze(ModelIR, options) → AnalyzeResponse) with numeric metadata - Markov solvers: steady-state, transient, MTTF, availability
- FTA solvers: minimal cut sets, top event probability, importance measures
- ETA solver: outcome probabilities, path ranking
- RBD solver: system reliability, availability
- Bow-tie solver: end-state frequencies
- FMEA solver: RPN, criticality
- Web Worker for client-side computation (shared engine package)
- BullMQ job queue + Solver Worker (separate process)
- Job lifecycle: submit, poll, cancel, result retrieval
- Content-hash caching via Redis
- Audit logging for analysis runs
- Golden model test harness
- Results panel UI with explainability (assumptions, warnings, traces, numeric metadata)
- SVG export (React Flow native)
- PNG / JPEG export (html-to-image, configurable DPI)
- PDF export (Puppeteer server-side)
- LaTeX/TikZ export (per diagram type serializer)
- Export dialog UI
- AI chat panel UI
- Vercel AI SDK integration with Claude
- Diagram context serialization
- Tool definitions for diagram manipulation
- Natural language → diagram generation
- Diagram Q&A (context-aware)
- AI-powered validation and error checking
- Streaming UX (nodes appear as AI generates)
- BYO API key support
- Dark mode
- Keyboard shortcuts
- Diagram templates (built-in library)
- Comments/annotations on diagrams
- Sentry error tracking
- Performance optimization (large diagrams)
- Production deployment (Vercel + container host)
- Documentation / help system
- DSL (text → diagram) — dedicated syntax per diagram type
- PRISM model import
- Monte Carlo / rare-event simulation
- Scenario management (parameter sets with diffable results)
- Public REST API for programmatic access
- i18n (internationalization)
- Kubernetes deployment
- SAML/OIDC for enterprise SSO
| Item | Notes |
|---|---|
| DSL design | Deferred — will be added after diagram drawing works |
| PRISM import | Parse .prism files → ModelIR for migration |
| Monte Carlo simulation | Server-side, queued jobs, for rare-event analysis |
| Enterprise SSO | Generic OIDC / SAML via Better Auth |
| Public API | REST API for programmatic model creation and analysis |
| Mobile support | Desktop-first; responsive sidebar is sufficient |
| Kubernetes | When horizontal scaling is needed |
| Decision | Choice | Rationale |
|---|---|---|
| Frontend framework | Vite + React | Fast HMR, modern ESM, TypeScript-first |
| Diagram library | React Flow | Purpose-built for node/edge UIs |
| CRDT | Yjs + y-websocket | Most mature JS CRDT, offline-first, awareness protocol |
| Auth | Better Auth | Framework-agnostic, organization + RBAC plugins, active development |
| Backend framework | Fastify | Fast, TypeScript-native, plugin system |
| ORM | Prisma | Type-safe, auto-migrations, PostgreSQL support |
| State management | Zustand | Lightweight, React Flow recommended |
| Layout engine | ELK.js | Covers all needed layout algorithms |
| AI SDK | Vercel AI SDK | Provider-agnostic, streaming, tool-use support |
| ID strategy | UUID everywhere | No sequential ID exposure, safe for sharing |
| Team roles | admin / member | Simple, extensible via Better Auth |
| Project roles | owner / editor / viewer | Granular per-project access |
| Team admin ≠ project owner | Explicit policy | Team admins manage membership, not automatic project owners |
| Computation | Hybrid client/server | Snappy UX for small models, server power for large |
| Solver boundary | Separate worker process (BullMQ) | API never computes; solvers replaceable with Rust/Python |
| Job queue | BullMQ (Redis-backed) | Job lifecycle, cancellation, retries, scaling |
| ModelIR values | ValueRef (literal / param / expr) | Enables scenarios, parameter sweeps, shared parameters |
| Unit handling | Normalize to base units | Consistent solver input, display in original units |
| Structured logging | Pino | Native to Fastify, JSON logs with request context |
| Audit trail | audit_log table | Compliance, trust, who-did-what traceability |
| Frontend hosting | Vercel | Static SPA, CDN, preview deploys |
| Backend hosting | Docker on Fly.io / Railway / VPS | WebSocket support, persistent processes |
This document is the single source of truth for RAMSey's architecture and implementation plan. Update it as decisions evolve.