cascadeflow Quick Start Guide (TypeScript)

Get started with cascadeflow in TypeScript/JavaScript in 5 minutes. This guide walks you through the basics of intelligent model cascading.

📚 Table of Contents

What is cascadeflow?
Installation
Your First Cascade
How It Works
Understanding Costs
Configuration Options
Best Practices
Troubleshooting
Next Steps

What is cascadeflow?

cascadeflow is an intelligent model router that saves you 40-60% on AI costs by automatically using cheaper models when possible and only escalating to expensive models when needed.

The Problem

Using GPT-4o for everything is expensive:

10,000 queries/month × $0.005/query = $50/month

But using GPT-4o-mini for everything sacrifices quality.

The Solution

cascadeflow tries the cheap model first, checks quality, and only uses the expensive model if needed:

Simple query → GPT-4o-mini ✅ (draft accepted) → Cost: $0.0004
Complex query → GPT-4o-mini ❌ (draft rejected) → GPT-4o ✅ → Cost: $0.006

Result: 40-60% savings while maintaining quality!

Installation

Step 1: Install cascadeflow

npm install @cascadeflow/core

Step 2: Install Provider SDKs

Install the SDK for your chosen provider:

# OpenAI (most common)
npm install openai

# Anthropic
npm install @anthropic-ai/sdk

# Groq
npm install groq-sdk

Step 3: Set Up API Key

# OpenAI
export OPENAI_API_KEY="sk-..."

# Or add to your .env file
echo "OPENAI_API_KEY=sk-..." >> .env

Step 4: Verify Installation

node -e "import('@cascadeflow/core').then(m => console.log('✅ Installed'))"

Your First Cascade

Create a file called my-first-cascade.ts:

import { CascadeAgent, ModelConfig } from '@cascadeflow/core';

async function main() {
  // Configure cascade with two tiers
  const agent = new CascadeAgent({
    models: [
      // Tier 1: Cheap model (tries first)
      {
        name: 'gpt-4o-mini',
        provider: 'openai',
        cost: 0.000375,  // $0.375 per 1M tokens (blended)
      },

      // Tier 2: Expensive model (only if needed)
      {
        name: 'gpt-4o',
        provider: 'openai',
        cost: 0.00625,  // $6.25 per 1M tokens (blended)
      },
    ],
  });
  // Quality validation uses default cascade-optimized config (0.7 threshold)
  // See "Configuration Options" section below to customize

  // Try a simple query
  const result = await agent.run('What color is the sky?');

  console.log(`Response: ${result.content}`);
  console.log(`Model used: ${result.modelUsed}`);
  console.log(`Cost: $${result.totalCost.toFixed(6)}`);
  console.log(`Draft accepted: ${result.draftAccepted}`);
}

main().catch(console.error);

Run it:

npx tsx my-first-cascade.ts

Expected output:

Response: The sky is typically blue during the day.
Model used: gpt-4o-mini
Cost: $0.000081
Draft accepted: true

What happened?

Query sent to GPT-4o-mini (cheap)
Response passed quality check
GPT-4o was NOT called (saved money!)

How It Works

The Cascade Process

┌─────────────────┐
│  Your Query     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Complexity     │ ─────► Simple/Moderate/Complex
│  Detection      │
└────────┬────────┘
         │
         ▼
   ┌─────────────┐
   │ Direct to   │ ───► Very simple → GPT-4o-mini only
   │ GPT-4o-mini?│ ───► Very complex → GPT-4o directly
   └──────┬──────┘
          │ Maybe cascade
          ▼
┌─────────────────┐
│ GPT-4o-mini     │ ────► Generate response
│ Draft           │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Quality Check   │ ────► Confidence > threshold?
└────────┬────────┘
         │
    ┌────┴────┐
    │         │
    ▼         ▼
  PASS      FAIL
    │         │
    │    ┌────────────────┐
    │    │  GPT-4o Verify │
    │    └────────┬───────┘
    │             │
    └─────────────┘
         │
         ▼
   ┌──────────────┐
   │  Final       │
   │  Response    │
   └──────────────┘

Key Concepts

1. Draft Model

Purpose: Try to answer with cheap model
Cost: Low (~$0.000375 per 1K tokens)
Speed: Fast
Quality: Good for simple queries

2. Verifier Model

Purpose: Verify draft or handle complex queries
Cost: Higher (~$0.00625 per 1K tokens)
Speed: Slower
Quality: Best quality

3. Quality Check

Checks: Confidence score, alignment, coherence
Threshold: Configurable (default: 0.7)
Result: Pass → use draft, Fail → use verifier

4. Draft Accepted vs Rejected

Draft Accepted ✅

Cheap model response is good enough
Verifier is NOT called
Cost = cheap model only
This is where you save money!

Draft Rejected ❌

Cheap model response not good enough
BOTH models are called
Cost = cheap + expensive
Quality is ensured

Understanding Costs

Token-Based Pricing

cascadeflow uses actual token-based pricing, not flat rates:

// Your query
const query = "What is TypeScript?";  // ~4 tokens

// Model's response
const response = "TypeScript is a programming language...";  // ~50 tokens

// Total tokens
const total = 4 (input) + 50 (output) = 54 tokens

// Cost calculation (GPT-4o-mini example)
const inputCost  = (4 / 1000) × $0.00015 = $0.0000006
const outputCost = (50 / 1000) × $0.0006 = $0.000030
const totalCost  = $0.0000306

Cost Breakdown by Scenario

Scenario 1: Draft Accepted (Best Case)

Query → GPT-4o-mini ✅ (accepted)

Costs:
  GPT-4o-mini: $0.000031
  GPT-4o:      $0.000000 (not called)
  ─────────────────
  Total:       $0.000031

Savings: ~95% vs GPT-4o only

Scenario 2: Draft Rejected (Worst Case)

Query → GPT-4o-mini ❌ (rejected) → GPT-4o ✅

Costs:
  GPT-4o-mini: $0.000031
  GPT-4o:      $0.000650
  ─────────────────
  Total:       $0.000681

Savings: -5% vs GPT-4o only (paid extra for GPT-4o-mini)

Scenario 3: Direct Routing

Query → GPT-4o directly (complex query)

Costs:
  GPT-4o-mini: $0.000000 (not called)
  GPT-4o:      $0.000650
  ─────────────────
  Total:       $0.000650

Savings: 0% (same as GPT-4o only)

Expected Savings

Your savings depend on your query mix:

Query Mix	Draft Acceptance Rate	Expected Savings
80% simple, 20% complex	80%	60-70%
50% simple, 50% complex	50%	40-50%
20% simple, 80% complex	20%	10-20%

Rule of thumb: The more simple queries, the more you save!

Configuration Options

Model Configuration

import { ModelConfig } from '@cascadeflow/core';

const modelConfig: ModelConfig = {
  name: 'gpt-4o-mini',               // Model name
  provider: 'openai',                // Provider (openai, anthropic, groq, ollama)
  cost: 0.000375,                    // Cost per 1K tokens (blended estimate)
  apiKey: process.env.OPENAI_API_KEY // Optional: override default API key
};

Agent Configuration

const agent = new CascadeAgent({
  models: [tier1, tier2],            // List of models (ordered by cost)
  verbose: true,                     // Enable logging
  enableCascade: true,               // Enable cascade system
});

Quality Configuration

Quality validation is controlled via the quality option on the agent:

import { CascadeAgent } from '@cascadeflow/core';

// Option 1: Use default (recommended for cascading)
const agent = new CascadeAgent({
  models: [...],
  // Default quality config automatically applied (0.7 threshold)
});

// Option 2: Customize quality settings
const agent = new CascadeAgent({
  models: [...],
  quality: {
    threshold: 0.7,                  // Confidence threshold (0.0-1.0)
    requireMinimumTokens: 10,        // Minimum response length
  },
});

// Option 3: Enable semantic validation with ML
const agent = new CascadeAgent({
  models: [...],
  quality: {
    threshold: 0.40,                 // Traditional confidence threshold
    requireMinimumTokens: 5,
    useSemanticValidation: true,     // Enable ML-based validation
    semanticThreshold: 0.5,          // 50% minimum similarity
  },
});

Quality Threshold Trade-offs:

Higher threshold (0.8+) → Better quality, fewer drafts accepted, lower savings
Medium threshold (0.7) → Balanced quality and savings (recommended)
Lower threshold (0.6-) → More drafts accepted, higher savings, occasional quality issues

Best Practices

1. Choose the Right Models

Good Combinations:

GPT-4o-mini → GPT-4o (balanced, recommended)
Claude Haiku → GPT-4o (cross-provider)
Llama 3.1 8B (Groq) → GPT-4o (maximum savings)

Avoid:

Similar-tier models (GPT-4o-mini → GPT-3.5 Turbo)
Reverse ordering (GPT-4o → GPT-4o-mini)

2. Tune Quality Thresholds

Start with default (0.7) and adjust based on your needs:

// Track acceptance rates
const results: boolean[] = [];
for (const query of yourQueries) {
  const result = await agent.run(query);
  results.push(result.draftAccepted || false);
}

const acceptanceRate = results.filter(Boolean).length / results.length;
console.log(`Draft acceptance rate: ${(acceptanceRate * 100).toFixed(1)}%`);

If acceptance rate is:

< 30% → Lower threshold (0.6) or use better draft model
30-70% → Perfect! (balanced)
70% → Can raise threshold (0.75) for better quality

3. Monitor Costs

// Track costs over time
let totalCost = 0;
for (const query of yourQueries) {
  const result = await agent.run(query);
  totalCost += result.totalCost;
}

console.log(`Total cost: $${totalCost.toFixed(6)}`);
console.log(`Average per query: $${(totalCost / yourQueries.length).toFixed(6)}`);

4. Handle Failures Gracefully

try {
  const result = await agent.run(query);
  console.log(result.content);
} catch (error) {
  console.error('Error:', error);
  // Fallback logic here
}

5. Use Appropriate Max Tokens

// Short responses (save cost)
const result = await agent.run(query, { maxTokens: 50 });

// Medium responses (balanced)
const result = await agent.run(query, { maxTokens: 150 });

// Long responses (quality)
const result = await agent.run(query, { maxTokens: 500 });

Troubleshooting

Issue: All Queries Go to Expensive Model

Symptoms:

Draft acceptance rate < 10%
Costs almost same as GPT-4 only

Solutions:

Lower quality threshold:

const agent = new CascadeAgent({
  models: [...],
  quality: { threshold: 0.6 }
});

Use better draft model: Try GPT-4o-mini (already recommended)
Check query complexity: Ensure you have simple queries in your mix

Issue: Poor Quality Responses

Symptoms:

Draft acceptance rate > 80%
Responses are incorrect or low quality

Solutions:

Raise quality threshold:

const agent = new CascadeAgent({
  models: [...],
  quality: { threshold: 0.75 }
});

Use better verifier model: Try GPT-4o instead of GPT-4
Enable verbose mode to see quality scores: verbose: true

Issue: High Latency

Symptoms:

Responses take too long
Users complaining about wait times

Solutions:

Use faster models: Groq Llama for draft, GPT-4o-mini for verifier
Reduce max_tokens: maxTokens: 100
Skip cascade for time-critical queries

Issue: Costs Higher Than Expected

Symptoms:

Savings < 30%
Many drafts rejected

Possible Causes:

Query mix too complex (mostly hard queries)
Quality threshold too high (rejecting good drafts)
Token estimates inaccurate

Solutions:

Analyze your query complexity distribution
Lower quality threshold slightly
Use cheaper draft model (Groq Llama, Ollama)

Issue: TypeScript Type Errors

Symptoms:

Compilation errors about missing types
IDE not showing autocomplete

Solutions:

Ensure TypeScript version 4.5+:
```
npm install -D typescript@latest
```

Check tsconfig.json includes:

{
  "compilerOptions": {
    "moduleResolution": "node16",
    "module": "ES2022"
  }
}

Next Steps

1. Run the Basic Example

cd packages/core/examples/nodejs
npx tsx basic-usage.ts

2. Customize for Your Use Case

Modify models
Adjust thresholds
Add your queries

3. Read Advanced Guides

4. Explore More Examples

Tool Calling: tool-calling.ts
Cost Tracking: cost-tracking.ts
Multi-Provider: multi-provider.ts
Reasoning Models: reasoning-models.ts
Semantic Quality: semantic-quality.ts
Production Patterns: production-patterns.ts

5. Deploy to Production

Set up monitoring
Configure logging
Implement fallbacks
Track costs

6. Join the Community

⭐ Star the GitHub repo
💬 Join Discussions
🐛 Report issues
🤝 Contribute examples

Quick Reference

Common Commands

# Install
npm install @cascadeflow/core openai

# Run example
npx tsx my-cascade.ts

# Check types
npx tsc --noEmit

# Run with watch mode
npx tsx watch my-cascade.ts

Code Snippets

Basic Usage:

import { CascadeAgent } from '@cascadeflow/core';

const agent = new CascadeAgent({
  models: [
    { name: 'gpt-4o-mini', provider: 'openai', cost: 0.000375 },
    { name: 'gpt-4o', provider: 'openai', cost: 0.00625 },
  ],
});

const result = await agent.run('Your query here');

Check Result:

console.log(`Response: ${result.content}`);
console.log(`Model: ${result.modelUsed}`);
console.log(`Cost: $${result.totalCost.toFixed(6)}`);
console.log(`Draft accepted: ${result.draftAccepted}`);

Track Costs:

const total = results.reduce((sum, r) => sum + r.totalCost, 0);
console.log(`Total: $${total.toFixed(6)}`);

With Tools:

const result = await agent.run(query, {
  tools: [
    {
      type: 'function',
      function: {
        name: 'get_weather',
        description: 'Get weather for a location',
        parameters: {
          type: 'object',
          properties: {
            location: { type: 'string' }
          }
        }
      }
    }
  ]
});

TypeScript-Specific Features

Full Type Safety

import { CascadeAgent, ModelConfig, CascadeResult } from '@cascadeflow/core';

const models: ModelConfig[] = [
  { name: 'gpt-4o-mini', provider: 'openai', cost: 0.000375 },
  { name: 'gpt-4o', provider: 'openai', cost: 0.00625 },
];

const agent = new CascadeAgent({ models });
const result: CascadeResult = await agent.run('Hello');

// IDE autocomplete for all properties
result.content;
result.modelUsed;
result.totalCost;
result.draftAccepted;
result.cascaded;

Async/Await

All operations are async - use await or .then():

// Using await
const result = await agent.run(query);

// Using .then()
agent.run(query).then(result => {
  console.log(result.content);
});

Universal Runtime Support

Works in Node.js, browser, and edge runtimes:

// Node.js
import { CascadeAgent } from '@cascadeflow/core';

// Browser/Edge
import { CascadeAgent } from '@cascadeflow/core';
// Same code works everywhere!

Support

Need help?

💬 Ask in Discussions
🐛 Report a bug
📧 Email hello@lemony.ai

Happy Cascading! 🌊

FilesExpand file tree

quickstart-typescript.md

Latest commit

History

quickstart-typescript.md

File metadata and controls

cascadeflow Quick Start Guide (TypeScript)

📚 Table of Contents

What is cascadeflow?

The Problem

The Solution

Installation

Step 1: Install cascadeflow

Step 2: Install Provider SDKs

Step 3: Set Up API Key

Step 4: Verify Installation

Your First Cascade

How It Works

The Cascade Process

Key Concepts

1. Draft Model

2. Verifier Model

3. Quality Check

4. Draft Accepted vs Rejected

Understanding Costs

Token-Based Pricing

Cost Breakdown by Scenario

Scenario 1: Draft Accepted (Best Case)

Scenario 2: Draft Rejected (Worst Case)

Scenario 3: Direct Routing

Expected Savings

Configuration Options

Model Configuration

Agent Configuration

Quality Configuration

Best Practices

1. Choose the Right Models

2. Tune Quality Thresholds

3. Monitor Costs

4. Handle Failures Gracefully

5. Use Appropriate Max Tokens

Troubleshooting

Issue: All Queries Go to Expensive Model

Issue: Poor Quality Responses

Issue: High Latency

Issue: Costs Higher Than Expected

Issue: TypeScript Type Errors

Next Steps

1. Run the Basic Example

2. Customize for Your Use Case

3. Read Advanced Guides

4. Explore More Examples

5. Deploy to Production

6. Join the Community

Quick Reference

Common Commands

Code Snippets

TypeScript-Specific Features

Full Type Safety

Async/Await

Universal Runtime Support

Support