diff --git a/CLAUDE.md b/CLAUDE.md
index dfd77b179d..0901d35bb1 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -59,6 +59,7 @@ For detailed browser and feature compatibility across different chatbot sites, s
    - `AudioModule.js` - Main audio coordination and state management
    - `OffscreenAudioBridge.js` - Communication bridge between content script and offscreen audio processing
    - `AudioInputMachine.ts`, `AudioOutputMachine.ts` - State machines for audio input/output flow
+   - **Dictation transcription**: Uses dual-phase approach (live streaming + refinement) - see [doc/DUAL_PHASE_TRANSCRIPTION.md](doc/DUAL_PHASE_TRANSCRIPTION.md)
 
 3. **Voice Activity Detection** (`src/vad/`)
    - `OffscreenVADClient.ts` - Content script client for VAD communication
diff --git a/doc/DUAL_PHASE_TRANSCRIPTION.md b/doc/DUAL_PHASE_TRANSCRIPTION.md
new file mode 100644
index 0000000000..070e45510c
--- /dev/null
+++ b/doc/DUAL_PHASE_TRANSCRIPTION.md
@@ -0,0 +1,298 @@
+# Dual-Phase Contextual Transcription for Dictation
+
+This document describes the two-phase transcription system used in dictation mode to balance real-time responsiveness with high accuracy.
+
+## Overview
+
+In dictation mode, each dictation target (form field or input element) receives transcriptions through two distinct phases:
+
+1. **Phase 1 (Live Streaming)**: Fast, incremental transcription of speech as it's captured
+2. **Phase 2 (Refinement)**: High-accuracy re-transcription of accumulated audio with full context
+
+## Phase 1: Live Streaming
+
+### Purpose
+Provide immediate visual feedback to the user as they speak, creating a responsive real-time experience.
+
+### Characteristics
+- **Speed**: Low-latency transcription (typically < 1 second from speech to display)
+- **Accuracy**: Lower accuracy due to limited audio and contextual information
+- **Audio**: Short bursts (typically 1-3 seconds per segment)
+- **Context Sent**: Each request includes:
+  - Text transcripts of preceding segments (for continuity)
+  - Target field's label and input type (for domain context)
+  - Sequence number (for ordering and intelligent merging)
+
+### Sequence Tracking
+Each live segment is assigned an **incremental sequence number** (positive integers starting from 1). This allows:
+- **Ordering**: Segments can be stitched together in the correct order even if responses arrive out-of-sequence
+- **Merging**: The API server can merge consecutive segments intelligently using their sequence numbers
+- **Target Mapping**: Each sequence number is associated with the specific input element it was dictated to
+
+### Implementation
+See [DictationMachine.ts:1246-1303](../src/state-machines/DictationMachine.ts#L1246-L1303) for the `userSpeaking` state handling, and [TranscriptionModule.ts:203-309](../src/TranscriptionModule.ts#L203-L309) for the `uploadAudioWithRetry` function that sends live segments.
+
+---
+
+## Phase 2: Refinement
+
+### Purpose
+Re-transcribe accumulated audio with maximum context to achieve significantly higher accuracy.
+
+### Characteristics
+- **Speed**: Higher latency (3-10 seconds depending on accumulated audio length)
+- **Accuracy**: Significantly higher accuracy due to full audio context
+- **Audio**: All unrefined audio captured for this target since last refinement
+- **Context Sent**:
+  - **Complete audio only** (no text transcripts, no sequence numbers)
+  - This is a standalone transcription request to the stateless `/transcribe` API
+  - The audio itself contains all necessary context
+
+### Request Tracking
+Refinement requests use **UUID-based tracking** (not sequence numbers):
+- Each refinement gets a unique `requestId` (UUID v4)
+- Tracked separately via `context.pendingRefinements` Map
+- No global sequence counter involvement
+- Responses are handled synchronously in Promise callbacks (not via event bus)
+
+### Refinement Triggers
+A refinement request is sent when **ALL** of the following conditions are met:
+
+1. **Minimum segments**: Two or more unrefined live segments have accumulated in the target field's buffer, AND
+2. **Endpoint detection**: One of these events occurs:
+   - **EOS (End-of-Speech)**: The app and transcription API implicitly determine the user has probably finished speaking
+   - **Field Switch**: User tabs or clicks to a different target field
+   - **Session End**: User ends the dictation session ("hang up")
+
+### Refinement Targets
+The refinement response:
+- **Replaces** all previously transcribed text from live segments in that target field
+- **Preserves** any pre-existing text that was in the field before the dictation session started
+- Only affects the specific target field that was active when the refined audio was captured
+
+### Multiple Refinement Passes
+**Important**: A given dictation target may receive **multiple refinement passes** before field switch or session end.
+
+**Why?** Because EOS is an implicit prediction:
+- If EOS is detected but the user resumes speaking (false positive), another EOS event will eventually occur
+- Each EOS event triggers a refinement request (if ≥2 unrefined segments exist)
+- Each successive refinement includes **more audio** than the previous one
+- Each refinement still **replaces all prior live segment transcripts** (and may also replace a previous refinement)
+
+**Example Timeline:**
+```
+User dictates → EOS detected → Refinement #1 (segments 1-3)
+User resumes → EOS detected → Refinement #2 (segments 1-6, includes previous + new)
+User switches field → End of refinements for this target
+```
+
+### Audio Buffering
+- Audio segments are buffered per target in `context.audioSegmentsByTarget`
+- Maximum buffer size: **120 seconds** (2 minutes) per target to prevent unbounded memory growth
+- When limit is reached, oldest segments are automatically trimmed
+- Buffers persist across multiple EOS events (enabling multiple refinement passes)
+- Buffers are cleared when:
+  - User switches to a different target field
+  - Dictation session ends
+  - Manual edit is detected (triggers session termination)
+
+### Implementation
+See:
+- [DictationMachine.ts:1943-2063](../src/state-machines/DictationMachine.ts#L1943-L2063) for `performContextualRefinement` action
+- [DictationMachine.ts:375-450](../src/state-machines/DictationMachine.ts#L375-L450) for `handleRefinementComplete` function
+- [TranscriptionModule.ts:329-403](../src/TranscriptionModule.ts#L329-L403) for `uploadAudioForRefinement` function
+
+---
+
+## Endpoint Detection
+
+### EOS (End-of-Speech) Detection
+The system uses a **probability-based endpoint detection** mechanism:
+
+- After each transcription, the API returns `pFinishedSpeaking` (probability user finished speaking) and `tempo` (speech pace)
+- A dynamic delay is calculated using these signals (see [DictationMachine.ts:2104-2146](../src/state-machines/DictationMachine.ts#L2104-L2146))
+- Maximum delay for dictation: **8 seconds** (`REFINEMENT_MAX_DELAY_MS`)
+  - Longer than prompt-based interactions (no AI waiting for input)
+  - Reduces premature refinement from brief pauses during continuous dictation
+
+### State Machine Integration
+The refinement trigger is managed by XState:
+- State: `listening.converting.accumulating` ([DictationMachine.ts:1346-1377](../src/state-machines/DictationMachine.ts#L1346-L1377))
+- After `refinementDelay` timeout, transitions to `refining` state if `refinementConditionsMet` guard passes
+- Guard checks: `context.refinementPendingForTargets.size > 0 && !context.isTranscribing`
+
+---
+
+## Data Structures
+
+### Context Fields
+
+```typescript
+// Phase 1 (Live Streaming) - per sequence number
+transcriptions: Record<number, string>           // Global transcriptions (all targets)
+transcriptionsByTarget: Record<string, Record<number, string>> // Grouped by target ID
+transcriptionTargets: Record<number, HTMLElement> // Maps sequence → target element
+provisionalTranscriptionTarget?: {               // Pre-upload target mapping
+  sequenceNumber: number;
+  element: HTMLElement;
+}
+
+// Phase 2 (Refinement) - per target ID
+audioSegmentsByTarget: Record<string, AudioSegment[]>  // Audio buffers by target
+refinementPendingForTargets: Set<string>               // Target IDs awaiting refinement
+pendingRefinements: Map<string, {                      // UUID → metadata
+  targetId: string;
+  targetElement: HTMLElement;
+  segmentCount: number;
+  timestamp: number;
+}>
+```
+
+### AudioSegment Structure
+```typescript
+interface AudioSegment {
+  blob: Blob;              // WAV audio blob
+  frames: Float32Array;    // Raw PCM audio data (for concatenation)
+  duration: number;        // Milliseconds
+  sequenceNumber: number;  // Original Phase 1 sequence number
+  captureTimestamp?: number; // When captured by VAD
+}
+```
+
+---
+
+## Key Distinctions
+
+| Aspect | Phase 1 (Live) | Phase 2 (Refinement) |
+|--------|---------------|---------------------|
+| **Purpose** | Real-time feedback | High accuracy |
+| **Audio Length** | 1-3 seconds | Up to 120 seconds |
+| **Context** | Preceding transcripts + field metadata | Audio only |
+| **Tracking** | Sequence numbers (integers) | Request IDs (UUIDs) |
+| **API Fields** | `sequenceNumber`, `messages`, `inputType`, `inputLabel` | `requestId` only |
+| **Response Route** | Event bus → state machine | Promise callback → direct handler |
+| **Frequency** | After each VAD segment | After EOS/field switch/session end |
+| **Multiple Passes** | One per segment | Potentially multiple per target |
+
+---
+
+## Error Handling
+
+### Phase 1 Failures
+- Retry logic with exponential backoff (up to 3 attempts)
+- On terminal failure, emit `saypi:transcribeFailed` event
+- State machine transitions to error state, then returns to listening after 3 seconds
+
+### Phase 2 Failures
+- Same retry logic (up to 3 attempts)
+- On terminal failure:
+  - Emit `saypi:refinement:failed` event
+  - Clean up refinement metadata
+  - **Audio buffers are preserved** (may retry on next EOS)
+  - Phase 1 transcripts remain visible to user (graceful degradation)
+
+---
+
+## Example Flow
+
+```
+1. User starts dictating into Field A
+   → [Phase 1] Segment 1 → "Hello" (seq 1)
+   → [Phase 1] Segment 2 → "world" (seq 2)
+
+2. Brief pause (EOS detected)
+   → [Phase 2] Refinement #1 (segments 1-2) → "Hello, world!"
+   → Replaces "Hello world" with "Hello, world!"
+
+3. User resumes dictating
+   → [Phase 1] Segment 3 → "how are" (seq 3)
+   → [Phase 1] Segment 4 → "you" (seq 4)
+
+4. Another pause (EOS detected)
+   → [Phase 2] Refinement #2 (segments 1-4) → "Hello, world! How are you?"
+   → Replaces entire field text
+
+5. User tabs to Field B
+   → Final refinement for Field A completes (if needed)
+   → Capture initial text for Field B
+   → Continue with new Phase 1 segments
+```
+
+---
+
+## Related Files
+
+### Core Implementation
+- [src/state-machines/DictationMachine.ts](../src/state-machines/DictationMachine.ts) - State machine orchestration
+- [src/TranscriptionModule.ts](../src/TranscriptionModule.ts) - Upload logic for both phases
+- [src/audio/AudioSegmentPersistence.ts](../src/audio/AudioSegmentPersistence.ts) - Audio segment storage utilities
+
+### Supporting Modules
+- [src/TranscriptMergeService.ts](../src/TranscriptMergeService.ts) - Local transcript merging
+- [src/text-insertion/TextInsertionManager.ts](../src/text-insertion/TextInsertionManager.ts) - DOM text insertion
+- [src/TimerModule.ts](../src/TimerModule.ts) - Endpoint delay calculation
+
+---
+
+## Configuration
+
+### Constants (DictationMachine.ts)
+- `MAX_AUDIO_BUFFER_DURATION_MS = 120000` - Maximum audio buffer per target (2 minutes)
+- `REFINEMENT_MAX_DELAY_MS = 8000` - Maximum delay for EOS detection (8 seconds)
+
+### User Preferences
+- `transcriptionMode` - STT model preference (passed to both Phase 1 and Phase 2)
+- `removeFillerWords` - Filter filler words (applied in both phases)
+- `keepSegments` - Debug option to save audio files to disk
+
+---
+
+## Testing Considerations
+
+When testing dual-phase transcription:
+
+1. **Phase 1 Accuracy**: Test with short phrases to verify live streaming responsiveness
+2. **Phase 2 Accuracy**: Test with longer utterances and verify refinement improves accuracy
+3. **Multiple Refinements**: Test false-positive EOS scenarios (brief pauses mid-sentence)
+4. **Field Switching**: Verify refinements complete for previous field when switching
+5. **Buffer Limits**: Test 120-second limit with extended dictation
+6. **Error Recovery**: Test network failures during each phase
+7. **Manual Edits**: Verify manual edits terminate dictation and clear buffers
+
+### Mock Requirements
+- Mock Chrome extension APIs (`chrome.runtime.sendMessage`)
+- Mock EventBus for Phase 1 events
+- Mock TranscriptionModule functions for API responses
+- Use JSDOM for DOM manipulation testing
+
+---
+
+## Performance Notes
+
+### Memory Management
+- Audio buffers automatically trim when exceeding 120s per target
+- Refinement metadata cleaned up after completion/failure
+- Phase 1 transcripts cleared when replaced by Phase 2
+
+### Network Optimization
+- Phase 1: Many small requests (optimized for latency)
+- Phase 2: Fewer large requests (optimized for accuracy)
+- No duplicate audio uploads (Phase 2 uses buffered segments)
+
+### User Experience
+- Live streaming provides immediate feedback (no "dead air")
+- Refinements improve accuracy without user intervention
+- Multiple refinement passes handle natural speech pauses
+- Pre-existing text preserved across refinements
+
+---
+
+## Future Enhancements
+
+Potential improvements to the dual-phase system:
+
+1. **Incremental Refinement**: Only re-transcribe new segments since last refinement
+2. **Adaptive Buffering**: Adjust 120s limit based on available memory
+3. **Confidence Scoring**: Display visual indicators for Phase 1 vs Phase 2 text
+4. **Smart EOS**: Improve endpoint detection using linguistic features
+5. **Batch Refinement**: Refine multiple targets in a single request
diff --git a/src/TranscriptionModule.ts b/src/TranscriptionModule.ts
index 9cfbab5675..771f21d294 100644
--- a/src/TranscriptionModule.ts
+++ b/src/TranscriptionModule.ts
@@ -210,6 +210,7 @@ export async function uploadAudioWithRetry(
   clientReceiveTimestamp?: number,
   inputType?: string,
   inputLabel?: string,
+  onSequenceNumber?: (sequenceNumber: number) => void,
 ): Promise<number> {
   let retryCount = 0;
   let delay = 1000; // initial delay of 1 second
@@ -240,6 +241,16 @@ export async function uploadAudioWithRetry(
   while (retryCount < maxRetries) {
     try {
       usedSequenceNumber = transcriptionSent();
+      if (onSequenceNumber) {
+        try {
+          onSequenceNumber(usedSequenceNumber);
+        } catch (callbackError) {
+          logger.error(
+            "[TranscriptionModule] onSequenceNumber callback threw an error",
+            callbackError
+          );
+        }
+      }
       await uploadAudio(
         audioBlob,
         audioDurationMillis,
@@ -297,6 +308,194 @@ export async function uploadAudioWithRetry(
   throw new Error("Max retries reached");
 }
 
+/**
+ * Upload audio for refinement (Phase 2).
+ * Uses UUID tracking instead of sequence numbers. No precedingTranscripts sent.
+ */
+export async function uploadAudioForRefinement(
+  audioBlob: Blob,
+  audioDurationMillis: number,
+  requestId: string,
+  sessionId?: string,
+  maxRetries: number = 3
+): Promise<string> {
+  let retryCount = 0;
+  let delay = 1000; // initial delay of 1 second
+  const transcriptionStartTimestamp = Date.now();
+
+  // Emit refinement started event (moved to outer function to avoid multiple emissions on retry)
+  EventBus.emit("saypi:refinement:started", {
+    requestId,
+    timestamp: transcriptionStartTimestamp,
+    audioDurationMs: audioDurationMillis,
+    audioBytes: audioBlob.size,
+  });
+
+  const sleep = (ms: number) =>
+    new Promise((resolve) => setTimeout(resolve, ms));
+
+  while (retryCount < maxRetries) {
+    try {
+      const transcriptionText = await uploadAudioForRefinementInternal(
+        audioBlob,
+        audioDurationMillis,
+        requestId,
+        sessionId,
+        transcriptionStartTimestamp
+      );
+
+      // Emit refinement-specific completion event
+      EventBus.emit("saypi:refinement:completed", {
+        requestId,
+        text: transcriptionText,
+      });
+
+      return transcriptionText;
+    } catch (error) {
+      // check for timeout errors (30s on Heroku)
+      if (
+        error instanceof TypeError &&
+        knownNetworkErrorMessages.includes(error.message)
+      ) {
+        logger.info(
+          `[Refinement ${requestId}] Attempt ${retryCount + 1}/${maxRetries} failed. Retrying in ${
+            delay / 1000
+          } seconds...`
+        );
+        await sleep(delay);
+
+        // Exponential backoff
+        delay *= 2;
+
+        retryCount++;
+      } else {
+        console.error(`[Refinement ${requestId}] Unexpected error:`, error);
+        // Emit refinement-specific failure event
+        EventBus.emit("saypi:refinement:failed", {
+          requestId,
+          error,
+        });
+        throw error; // Re-throw non-network errors to exit the retry loop
+      }
+    }
+  }
+
+  logger.error(`[Refinement ${requestId}] Max retries reached. Giving up.`);
+  EventBus.emit("saypi:refinement:failed", {
+    requestId,
+    error: new Error("Max retries reached"),
+  });
+  throw new Error("Max retries reached");
+}
+
+/**
+ * Internal refinement upload (bare-bones request).
+ * No sequence numbers, precedingTranscripts, or acceptsMerge.
+ */
+async function uploadAudioForRefinementInternal(
+  audioBlob: Blob,
+  audioDurationMillis: number,
+  requestId: string,
+  sessionId?: string,
+  transcriptionStartTimestamp?: number
+): Promise<string> {
+  try {
+    const chatbot = await ChatbotService.getChatbot();
+
+    // Build minimal FormData (no sequence number, no messages, no acceptsMerge)
+    const formData = new FormData();
+    let audioFilename = "audio.webm";
+    if (audioBlob.type === "audio/mp4") {
+      audioFilename = "audio.mp4";
+    } else if (audioBlob.type === "audio/wav") {
+      audioFilename = "audio.wav";
+    }
+
+    formData.append("audio", audioBlob, audioFilename);
+    formData.append("duration", (audioDurationMillis / 1000).toString());
+    formData.append("requestId", requestId); // UUID for correlation
+
+    if (sessionId) {
+      formData.append("sessionId", sessionId);
+    }
+
+    // Add minimal usage metadata
+    try {
+      const usageMeta = await buildUsageMetadata(chatbot);
+      if (usageMeta.clientId) formData.append("clientId", usageMeta.clientId);
+      if (usageMeta.version) formData.append("version", usageMeta.version);
+      if (usageMeta.app) formData.append("app", usageMeta.app);
+      if (usageMeta.language) formData.append("language", usageMeta.language);
+    } catch (error) {
+      logger.warn(`[Refinement ${requestId}] Failed to add usage metadata:`, error);
+    }
+
+    // Get user preferences for transcription
+    const preference = userPreferences.getCachedTranscriptionMode();
+    if (preference) {
+      formData.append("prefer", preference);
+    }
+
+    // Remove filler words if enabled
+    const removeFiller = userPreferences.getCachedRemoveFillerWords();
+    if (removeFiller) {
+      formData.append("removeFillerWords", "true");
+    }
+
+    logger.debug(
+      `[Refinement ${requestId}] Uploading ${(audioBlob.size / 1024).toFixed(2)}kb of audio`
+    );
+
+    const controller = new AbortController();
+    const { signal } = controller;
+    setTimeout(() => controller.abort(), TIMEOUT_MS);
+
+    const startTime = Date.now();
+
+    // Build URL params
+    const usageMeta = await buildUsageMetadata(chatbot);
+    const params = new URLSearchParams();
+    if (usageMeta.app) params.set("app", usageMeta.app);
+    if (usageMeta.language) params.set("language", usageMeta.language);
+
+    const response = await callApi(
+      `${config.apiServerUrl}/transcribe${params.toString() ? `?${params.toString()}` : ""}`,
+      {
+        method: "POST",
+        body: formData,
+        signal,
+      }
+    );
+
+    if (!response.ok) {
+      throw new Error(`HTTP ${response.status}: ${response.statusText}`);
+    }
+
+    const responseJson = await response.json();
+    const endTime = Date.now();
+    const transcriptionDurationMillis = endTime - startTime;
+    const transcript = responseJson.text;
+    const wc = transcript.split(" ").length;
+
+    logger.debug(
+      `[Refinement ${requestId}] Transcribed ${Math.round(
+        audioDurationMillis / 1000
+      )}s of audio into ${wc} words in ${Math.round(
+        transcriptionDurationMillis / 1000
+      )}s`
+    );
+
+    if (transcript.length === 0) {
+      logger.warn(`[Refinement ${requestId}] Received empty transcription`);
+    }
+
+    return transcript;
+  } catch (error) {
+    logger.error(`[Refinement ${requestId}] Upload failed:`, error);
+    throw error;
+  }
+}
+
 async function uploadAudio(
   audioBlob: Blob,
   audioDurationMillis: number,
diff --git a/src/UniversalDictationModule.ts b/src/UniversalDictationModule.ts
index 60c684f90d..bb39e5c3e6 100644
--- a/src/UniversalDictationModule.ts
+++ b/src/UniversalDictationModule.ts
@@ -1,6 +1,12 @@
 import { Observation } from "./dom/Observation";
 import { addChild } from "./dom/DOMModule";
-import { createDictationMachine } from "./state-machines/DictationMachine";
+import {
+  createDictationMachine,
+  DictationTranscribedEvent,
+  DictationSpeechStoppedEvent,
+  DictationAudioConnectedEvent,
+  DictationSessionAssignedEvent,
+} from "./state-machines/DictationMachine";
 import { interpret } from "xstate";
 import EventBus from "./events/EventBus.js";
 import { IconModule } from "./icons/IconModule";
@@ -328,6 +334,19 @@ export class UniversalDictationModule {
     };
 
     const hideButton = () => {
+      // Trigger refinement if dictation is active for this element
+      if (target.machine) {
+        const state = target.machine.getSnapshot();
+        // Check if machine is in a state where refinement makes sense
+        if (state.matches("listening")) {
+          console.debug("[UniversalDictation] Field blur - triggering refinement for element:", element);
+          target.machine.send({
+            type: "saypi:refineTranscription",
+            targetElement: element,
+          });
+        }
+      }
+
       // Use setTimeout to delay hiding so click event can fire first
       setTimeout(() => {
         if (button && !this.currentActiveTarget) {
@@ -387,14 +406,14 @@ export class UniversalDictationModule {
     element.addEventListener("input", handleContentChange);
     
     // Listen for dictation updates to track dictated content
-    EventBus.on("dictation:contentUpdated", (data) => {
+    EventBus.on("dictation:contentUpdated", (data: { targetElement: HTMLElement }) => {
       if (data.targetElement === element) {
         markDictationUpdate();
       }
     });
-    
+
     // Listen for dictation termination due to manual edit
-    EventBus.on("dictation:terminatedByManualEdit", (data) => {
+    EventBus.on("dictation:terminatedByManualEdit", (data: { targetElement: HTMLElement; reason: string }) => {
       if (data.targetElement === element && this.currentActiveTarget?.element === element) {
         console.debug("Dictation terminated due to manual edit on element:", element);
         // Clean up the active dictation state
@@ -1026,7 +1045,7 @@ export class UniversalDictationModule {
 
     // Events with additional data
     [USER_STOPPED_SPEAKING, AUDIO_DEVICE_CONNECTED, SESSION_ASSIGNED].forEach((eventName) => {
-      EventBus.on(eventName, (detail) => {
+      EventBus.on(eventName, (detail: Omit<DictationSpeechStoppedEvent, 'type'> | Omit<DictationAudioConnectedEvent, 'type'> | Omit<DictationSessionAssignedEvent, 'type'>) => {
         if (detail) {
           // sanitise the detail object to replace any `frames` property with `[REDACTED]`
           const sanitisedDetail = { ...detail };
@@ -1042,7 +1061,7 @@ export class UniversalDictationModule {
     });
 
     // Listen for transcription events
-    EventBus.on("saypi:transcription:completed", (detail) => {
+    EventBus.on("saypi:transcription:completed", (detail: Omit<DictationTranscribedEvent, 'type'>) => {
       logger.debug(`[UniversalDictationModule] Forwarding transcription to dictation machine`, detail);
       dictationService.send({ type: "saypi:transcribed", ...detail });
     });
@@ -1052,6 +1071,17 @@ export class UniversalDictationModule {
       dictationService.send("saypi:transcribeFailed");
     });
 
+    // Listen for refinement events (Phase 2 dual-phase transcription)
+    // Refinements are handled internally by DictationMachine via Promise callbacks
+    // These listeners are for telemetry/debugging only
+    EventBus.on("saypi:refinement:completed", (detail: {requestId: string, text: string}) => {
+      logger.debug(`[UniversalDictationModule] Refinement ${detail.requestId} completed: ${detail.text.substring(0, 50)}...`);
+    });
+
+    EventBus.on("saypi:refinement:failed", (detail: {requestId: string, error: any}) => {
+      logger.warn(`[UniversalDictationModule] Refinement ${detail.requestId} failed:`, detail.error);
+    });
+
     EventBus.on("saypi:transcribedEmpty", () => {
       logger.debug(`[UniversalDictationModule] Forwarding empty transcription to dictation machine`);
       dictationService.send("saypi:transcribedEmpty");
diff --git a/src/audio/AudioSegmentPersistence.ts b/src/audio/AudioSegmentPersistence.ts
new file mode 100644
index 0000000000..e771798da8
--- /dev/null
+++ b/src/audio/AudioSegmentPersistence.ts
@@ -0,0 +1,49 @@
+/**
+ * Shared utility for persisting audio segments to disk for debugging.
+ * Used by AudioInputMachine (VAD segments) and DictationMachine (refinement chunks).
+ */
+
+import { config } from "../ConfigModule";
+
+/**
+ * Persists an audio blob to disk via the background script's downloads API.
+ * Only saves if the `keepSegments` config is enabled.
+ *
+ * @param audioBlob - The audio blob to save
+ * @param captureTimestamp - When the audio was captured (or refinement started)
+ * @param duration - Duration of the audio in milliseconds
+ * @param prefix - Filename prefix (e.g., "saypi-segment" or "saypi-refinement")
+ */
+export function persistAudioSegment(
+  audioBlob: Blob,
+  captureTimestamp: number,
+  duration: number,
+  prefix: string = "saypi-segment"
+): void {
+  try {
+    const keep = config.keepSegments === true || config.keepSegments === 'true';
+    if (!keep || audioBlob.size === 0) {
+      return;
+    }
+
+    // Create a unique filename with timestamps
+    const startedAt = captureTimestamp - Math.round(duration);
+    const startedIso = new Date(startedAt).toISOString().replace(/[:.]/g, "-");
+    const endedIso = new Date(captureTimestamp).toISOString().replace(/[:.]/g, "-");
+    const filename = `SayPiSegments/${prefix}_${startedIso}_to_${endedIso}_${Math.round(duration)}ms.wav`;
+
+    const reader = new FileReader();
+    reader.onloadend = () => {
+      const base64Data = (reader.result as string).split(",")[1];
+      // Send to background to save via downloads API
+      chrome.runtime.sendMessage({
+        type: "SAVE_SEGMENT_WAV",
+        filename,
+        base64: base64Data
+      }, () => void 0);
+    };
+    reader.readAsDataURL(audioBlob);
+  } catch (e) {
+    console.warn(`Failed to persist ${prefix} locally:`, e);
+  }
+}
diff --git a/src/state-machines/AudioInputMachine.ts b/src/state-machines/AudioInputMachine.ts
index 98c4afa7a2..c655a2d8f8 100644
--- a/src/state-machines/AudioInputMachine.ts
+++ b/src/state-machines/AudioInputMachine.ts
@@ -11,7 +11,7 @@ import { logger } from "../LoggingModule";
 import { likelySupportsOffscreen, getBrowserInfo } from "../UserAgentModule";
 import { VADPreset } from "../vad/VADConfigs";
 import { ChatbotIdentifier } from "../chatbots/ChatbotIdentifier";
-import { config } from "../ConfigModule";
+import { persistAudioSegment } from "../audio/AudioSegmentPersistence";
 
 setupInterceptors();
 
@@ -137,29 +137,7 @@ EventBus.on("saypi:userStoppedSpeaking", (data: {
   logger.debug(`Reconstructed Blob size: ${audioBlob.size} bytes`);
 
   // Optionally persist the segment if keepSegments is enabled
-  try {
-    const keep = String(config.keepSegments || '').toLowerCase() === 'true';
-    if (keep && audioBlob.size > 0) {
-      // Create a unique filename with timestamps
-      const startedAt = data.captureTimestamp - Math.round(data.duration);
-      const startedIso = new Date(startedAt).toISOString().replace(/[:.]/g, "-");
-      const endedIso = new Date(data.captureTimestamp).toISOString().replace(/[:.]/g, "-");
-      const filename = `SayPiSegments/saypi-segment_${startedIso}_to_${endedIso}_${Math.round(data.duration)}ms.wav`;
-      const reader = new FileReader();
-      reader.onloadend = () => {
-        const base64Data = (reader.result as string).split(",")[1];
-        // Send to background to save via downloads API
-        chrome.runtime.sendMessage({
-          type: "SAVE_SEGMENT_WAV",
-          filename,
-          base64: base64Data
-        }, () => void 0);
-      };
-      reader.readAsDataURL(audioBlob);
-    }
-  } catch (e) {
-    console.warn("Failed to persist segment locally:", e);
-  }
+  persistAudioSegment(audioBlob, data.captureTimestamp, data.duration, "saypi-segment");
   
   // Emit both blob and duration for transcription
   EventBus.emit("audio:dataavailable", {
diff --git a/src/state-machines/ConversationMachine.ts b/src/state-machines/ConversationMachine.ts
index e3e904dd63..58903d8108 100644
--- a/src/state-machines/ConversationMachine.ts
+++ b/src/state-machines/ConversationMachine.ts
@@ -1096,7 +1096,10 @@ const machine = createMachine<ConversationContext, ConversationEvent, Conversati
             context.sessionId,
             3, // default maxRetries
             event.captureTimestamp,
-            event.clientReceiveTimestamp
+            event.clientReceiveTimestamp,
+            undefined, // inputType - not used in conversation mode
+            undefined, // inputLabel - not used in conversation mode
+            undefined  // onSequenceNumber - no target switching in conversation mode
           );
           EventBus.emit("session:transcribing", {
             audio_duration_seconds: event.duration / 1000,
diff --git a/src/state-machines/DictationMachine.ts b/src/state-machines/DictationMachine.ts
index 65d1afdc62..95507acbab 100644
--- a/src/state-machines/DictationMachine.ts
+++ b/src/state-machines/DictationMachine.ts
@@ -8,6 +8,7 @@ import {
 } from "xstate";
 import {
   uploadAudioWithRetry,
+  uploadAudioForRefinement,
   isTranscriptionPending,
   clearPendingTranscriptions,
   getCurrentSequenceNumber,
@@ -18,9 +19,24 @@ import { UserPreferenceModule } from "../prefs/PreferenceModule";
 import TranscriptionErrorManager from "../error-management/TranscriptionErrorManager";
 import { TranscriptMergeService } from "../TranscriptMergeService";
 import { convertToWavBlob } from "../audio/AudioEncoder";
+import { persistAudioSegment } from "../audio/AudioSegmentPersistence";
 import { TextInsertionManager } from "../text-insertion/TextInsertionManager";
+import { calculateDelay } from "../TimerModule";
 
-type DictationTranscribedEvent = {
+/**
+ * Normalizes ellipses and whitespace in transcription text.
+ * Converts Unicode ellipsis (…) and triple dots (...) to spaces,
+ * then collapses consecutive spaces/tabs into single spaces.
+ */
+function normalizeTranscriptionText(text: string): string {
+  return text
+    .replace(/\u2026/g, " ")      // "…" → space
+    .replace(/\.{3}/g, " ")       // "..." → space
+    .replace(/[ \t]{2,}/g, " ")   // collapse runs of spaces/tabs
+    .trim();
+}
+
+export type DictationTranscribedEvent = {
   type: "saypi:transcribed";
   text: string;
   sequenceNumber: number;
@@ -29,7 +45,7 @@ type DictationTranscribedEvent = {
   merged?: number[];
 };
 
-type DictationSpeechStoppedEvent = {
+export type DictationSpeechStoppedEvent = {
   type: "saypi:userStoppedSpeaking";
   duration: number;
   blob?: Blob;
@@ -39,13 +55,13 @@ type DictationSpeechStoppedEvent = {
   handlerTimestamp?: number;
 };
 
-type DictationAudioConnectedEvent = {
+export type DictationAudioConnectedEvent = {
   type: "saypi:audio:connected";
   deviceId: string;
   deviceLabel: string;
 };
 
-type DictationSessionAssignedEvent = {
+export type DictationSessionAssignedEvent = {
   type: "saypi:session:assigned";
   session_id: string;
 };
@@ -62,6 +78,11 @@ type DictationManualEditEvent = {
   oldContent: string;
 };
 
+type DictationRefineTranscriptionEvent = {
+  type: "saypi:refineTranscription";
+  targetElement: HTMLElement;
+};
+
 type DictationEvent =
   | { type: "saypi:userSpeaking" }
   | DictationSpeechStoppedEvent
@@ -77,7 +98,16 @@ type DictationEvent =
   | { type: "saypi:visible" }
   | DictationAudioConnectedEvent
   | DictationSessionAssignedEvent
-  | DictationManualEditEvent;
+  | DictationManualEditEvent
+  | DictationRefineTranscriptionEvent;
+
+interface AudioSegment {
+  blob: Blob;
+  frames: Float32Array;
+  duration: number;
+  sequenceNumber: number;
+  captureTimestamp?: number;
+}
 
 interface DictationContext {
   transcriptions: Record<number, string>; // Global transcriptions for backwards compatibility
@@ -87,6 +117,7 @@ interface DictationContext {
   userIsSpeaking: boolean;
   timeUserStoppedSpeaking: number;
   timeUserStartedSpeaking: number; // Track when current speech started
+  timeLastTranscriptionReceived: number; // Track when last transcription was received (for endpoint timing)
   sessionId?: string;
   targetElement?: HTMLElement; // The input field being dictated to
   accumulatedText: string; // Text accumulated during this dictation session
@@ -105,6 +136,16 @@ interface DictationContext {
    * that we always know which target the very first portion of audio belongs to.
    */
   speechStartTarget?: HTMLElement;
+
+  // Phase 2 (Refinement) - See doc/DUAL_PHASE_TRANSCRIPTION.md
+  audioSegmentsByTarget: Record<string, AudioSegment[]>;
+  refinementPendingForTargets: Set<string>;
+  pendingRefinements: Map<string, {
+    targetId: string;
+    targetElement: HTMLElement;
+    segmentCount: number;
+    timestamp: number;
+  }>;
 }
 
 // Define the state schema
@@ -130,6 +171,7 @@ type DictationStateSchema = {
           states: {
             transcribing: {};
             accumulating: {};
+            refining: {};
           };
         };
       };
@@ -328,6 +370,84 @@ function getTranscriptionsForTarget(context: DictationContext, targetElement: HT
   return context.transcriptionsByTarget[targetId] || {};
 }
 
+/**
+ * Handle completion of a refinement request (Phase 2).
+ * Replaces Phase 1 transcriptions with the refined result.
+ * Note: Multiple passes may occur per target (false-positive EOS detection).
+ */
+function handleRefinementComplete(
+  context: DictationContext,
+  requestId: string,
+  transcription: string
+): void {
+  const meta = context.pendingRefinements.get(requestId);
+  if (!meta) {
+    console.warn(`[DictationMachine] No metadata found for refinement ${requestId} - already cleaned up?`);
+    return;
+  }
+
+  const { targetId, targetElement, segmentCount } = meta;
+
+  console.debug(
+    `[DictationMachine] Received refinement transcription [${requestId}] for target ${targetId}: ${transcription}`
+  );
+
+  // Normalize the refined transcription
+  transcription = normalizeTranscriptionText(transcription);
+
+  // Get Phase 1 sequences for this target
+  const oldTranscriptions = context.transcriptionsByTarget[targetId] || {};
+  const phase1Sequences = Object.keys(oldTranscriptions)
+    .map(k => parseInt(k, 10))
+    .filter(seq => seq > 0); // Only clear Phase 1 (positive sequences), not previous refinements (negative keys)
+
+  // Clear Phase 1 transcriptions from global storage
+  phase1Sequences.forEach(seq => {
+    delete context.transcriptions[seq];
+    delete context.transcriptionTargets[seq];
+  });
+
+  console.debug(
+    `[DictationMachine] Cleared ${phase1Sequences.length} Phase 1 transcriptions: [${phase1Sequences.join(', ')}]`
+  );
+
+  // Store refinement result using negative timestamp as key (avoids collision with Phase 1 sequences)
+  const refinementKey = -(Date.now());
+  context.transcriptionsByTarget[targetId] = {
+    [refinementKey]: transcription
+  };
+  context.transcriptions[refinementKey] = transcription;
+  context.transcriptionTargets[refinementKey] = targetElement;
+
+  // Calculate final text (initial + refinement)
+  const initialText = context.initialTextByTarget[targetId] || "";
+  const finalText = smartJoinTwoTexts(initialText, transcription);
+
+  setTextInTarget(finalText, targetElement, true); // Replace all content
+
+  // Update accumulated text if this is the current target
+  if (targetElement === context.targetElement) {
+    context.accumulatedText = finalText;
+  }
+
+  // Clean up refinement metadata
+  context.pendingRefinements.delete(requestId);
+
+  // Emit refinement complete event
+  EventBus.emit("dictation:refined", {
+    targetElement,
+    targetId,
+    requestId,
+    refinedText: transcription,
+    finalText,
+    segmentCount
+  });
+
+  console.debug(
+    `[DictationMachine] Refinement ${requestId} complete for target ${targetId}. Final text: ${finalText}`
+  );
+}
+
 function mapTargetForSequence(
   context: DictationContext,
   expectedSequenceNumber: number,
@@ -357,6 +477,108 @@ function mapTargetForSequence(
   return finalTarget;
 }
 
+// Maximum audio buffer per target (120s) - prevents unbounded memory growth
+const MAX_AUDIO_BUFFER_DURATION_MS = 120000;
+
+// Maximum delay for refinement endpoint detection (8s)
+// Longer than prompt-based interactions to reduce premature refinements during continuous dictation
+const REFINEMENT_MAX_DELAY_MS = 8000;
+
+/**
+ * Store audio segment for later refinement (Phase 2).
+ * Buffers accumulate up to MAX_AUDIO_BUFFER_DURATION_MS and persist across EOS events.
+ */
+function storeAudioSegment(
+  context: DictationContext,
+  targetElement: HTMLElement,
+  blob: Blob,
+  frames: Float32Array,
+  duration: number,
+  sequenceNumber: number,
+  captureTimestamp?: number
+): void {
+  const targetId = getTargetElementId(targetElement);
+
+  // Initialize array for this target if it doesn't exist
+  if (!context.audioSegmentsByTarget[targetId]) {
+    context.audioSegmentsByTarget[targetId] = [];
+  }
+
+  const segments = context.audioSegmentsByTarget[targetId];
+
+  // Calculate total duration including the new segment
+  const currentTotalDuration = segments.reduce((sum, seg) => sum + seg.duration, 0);
+  const newTotalDuration = currentTotalDuration + duration;
+
+  // If adding this segment would exceed the max buffer duration, trim old segments
+  if (newTotalDuration > MAX_AUDIO_BUFFER_DURATION_MS) {
+    let excessDuration = newTotalDuration - MAX_AUDIO_BUFFER_DURATION_MS;
+    let segmentsToRemove = 0;
+
+    // Remove oldest segments until we're under the limit
+    for (let i = 0; i < segments.length && excessDuration > 0; i++) {
+      excessDuration -= segments[i].duration;
+      segmentsToRemove++;
+    }
+
+    if (segmentsToRemove > 0) {
+      const removed = segments.splice(0, segmentsToRemove);
+      console.debug(
+        `Trimmed ${segmentsToRemove} old audio segments for target ${targetId} to stay under ${MAX_AUDIO_BUFFER_DURATION_MS}ms limit. ` +
+        `Removed ${removed.reduce((sum, seg) => sum + seg.duration, 0)}ms of audio.`
+      );
+    }
+  }
+
+  // Store the new segment
+  segments.push({
+    blob,
+    frames,
+    duration,
+    sequenceNumber,
+    captureTimestamp,
+  });
+
+  // Mark this target as pending refinement
+  context.refinementPendingForTargets.add(targetId);
+
+  const totalDuration = segments.reduce((sum, seg) => sum + seg.duration, 0);
+  console.debug(
+    `Stored audio segment ${sequenceNumber} for target ${targetId}. Total: ${segments.length} segments, ${(totalDuration / 1000).toFixed(1)}s of audio`
+  );
+}
+
+/**
+ * Clear audio buffers for a specific target element.
+ * @param context - The dictation context
+ * @param targetId - The target element ID to clear buffers for
+ */
+function clearAudioForTarget(context: DictationContext, targetId: string): void {
+  delete context.audioSegmentsByTarget[targetId];
+  context.refinementPendingForTargets.delete(targetId);
+
+  // Clear any pending refinements for this target (UUID-based tracking)
+  for (const [requestId, meta] of context.pendingRefinements.entries()) {
+    if (meta.targetId === targetId) {
+      context.pendingRefinements.delete(requestId);
+      console.debug(`Cleared pending refinement ${requestId} for target ${targetId}`);
+    }
+  }
+
+  console.debug(`Cleared audio buffers for target ${targetId}`);
+}
+
+/**
+ * Clear all audio buffers.
+ * @param context - The dictation context
+ */
+function clearAllAudioBuffers(context: DictationContext): void {
+  context.audioSegmentsByTarget = {};
+  context.refinementPendingForTargets.clear();
+  context.pendingRefinements.clear();
+  console.debug('Cleared all audio buffers');
+}
+
 /**
  * Common helper for preparing and uploading an audio segment.
  */
@@ -368,7 +590,8 @@ function uploadAudioSegment(
   sessionId?: string,
   maxRetries: number = 3,
   captureTimestamp?: number,
-  clientReceiveTimestamp?: number
+  clientReceiveTimestamp?: number,
+  frames?: Float32Array
 ) {
   const expectedSequenceNumber = getCurrentSequenceNumber() + 1;
   const finalTarget = mapTargetForSequence(
@@ -387,10 +610,23 @@ function uploadAudioSegment(
     )}`
   );
 
-  // Extract input context for dictation mode
+  // Extract input context for dictation mode (Phase 1)
   const { inputType, inputLabel } = getInputContext(finalTarget);
   console.debug(`Input context for transcription: type="${inputType}", label="${inputLabel}"`);
 
+  // Store audio segment for Phase 2 refinement if frames are available
+  if (frames) {
+    storeAudioSegment(
+      context,
+      finalTarget,
+      audioBlob,
+      frames,
+      duration,
+      expectedSequenceNumber,
+      captureTimestamp
+    );
+  }
+
   uploadAudioWithRetry(
     audioBlob,
     duration,
@@ -400,7 +636,14 @@ function uploadAudioSegment(
     captureTimestamp,
     clientReceiveTimestamp,
     inputType || undefined,
-    inputLabel || undefined
+    inputLabel || undefined,
+    (sequenceNum) => {
+      // Keep transcription target mapping in sync even if sequence numbers shift
+      if (sequenceNum !== expectedSequenceNumber) {
+        delete context.transcriptionTargets[expectedSequenceNumber];
+      }
+      context.transcriptionTargets[sequenceNum] = finalTarget;
+    }
   ).then((sequenceNum) => {
     console.debug(`Sent transcription ${sequenceNum} to target`, finalTarget);
     if (sequenceNum !== expectedSequenceNumber) {
@@ -860,11 +1103,15 @@ const machine = createMachine<DictationContext, DictationEvent, DictationTypesta
       userIsSpeaking: false,
       timeUserStoppedSpeaking: 0,
       timeUserStartedSpeaking: 0,
+      timeLastTranscriptionReceived: 0,
       accumulatedText: "",
       transcriptionTargets: {},
       provisionalTranscriptionTarget: undefined,
       targetSwitchesDuringSpeech: undefined,
       speechStartTarget: undefined,
+      audioSegmentsByTarget: {},
+      refinementPendingForTargets: new Set<string>(),
+      pendingRefinements: new Map(),
     },
     id: "dictation",
     initial: "idle",
@@ -1086,6 +1333,13 @@ const machine = createMachine<DictationContext, DictationEvent, DictationTypesta
               },
               accumulating: {
                 description: "Accumulating transcriptions and streaming to target field.",
+                after: {
+                  refinementDelay: {
+                    target: "refining",
+                    cond: "refinementConditionsMet",
+                    description: "Trigger refinement after endpoint (EOS) detected",
+                  },
+                },
                 on: {
                   "saypi:transcribed": {
                     target: "accumulating",
@@ -1094,6 +1348,11 @@ const machine = createMachine<DictationContext, DictationEvent, DictationTypesta
                     },
                     description: "Additional transcriptions received.",
                   },
+                  "saypi:refineTranscription": {
+                    target: "refining",
+                    cond: "hasSegmentsForRefinement",
+                    description: "Explicit refinement request (e.g., from field blur)",
+                  },
                   "saypi:transcribeFailed": {
                     target: "#dictation.errors.transcribeFailed",
                     description: "Error response from the /transcribe API",
@@ -1104,6 +1363,27 @@ const machine = createMachine<DictationContext, DictationEvent, DictationTypesta
                   },
                 },
               },
+              refining: {
+                description: "Phase 2 refinement: re-transcribing with full audio context",
+                entry: [
+                  {
+                    type: "performContextualRefinement",
+                  },
+                ],
+                on: {
+                  "saypi:transcribed": {
+                    target: "accumulating",
+                    actions: {
+                      type: "handleTranscriptionResponse",
+                    },
+                    description: "Refinement transcription received.",
+                  },
+                  "saypi:transcribeFailed": {
+                    target: "accumulating",
+                    description: "Refinement failed, continue with Phase 1 text",
+                  },
+                },
+              },
               transcribing: {
                 description: "Transcribing audio to text.",
                 entry: [
@@ -1244,20 +1524,23 @@ const machine = createMachine<DictationContext, DictationEvent, DictationTypesta
         context: DictationContext,
         event: DictationTranscribedEvent
       ) => {
+        // Update the timestamp for endpoint detection
+        context.timeLastTranscriptionReceived = Date.now();
+
         let transcription = event.text;
         const sequenceNumber = event.sequenceNumber;
         const mergedSequences = event.merged || [];
+
+        // NOTE: Refinement responses bypass event bus (handled in performContextualRefinement).
+        // This handler ONLY processes Phase 1 (live streaming) transcriptions.
+
         // ---- NORMALISE ELLIPSES ----
         // Convert any ellipsis—either the single Unicode "…" character or the
         // three-dot sequence "..." — into a single space so downstream merging
         // sees consistent whitespace. Then collapse *spaces or tabs* (but not
         // line breaks) and trim the string.
         const originalTranscription = transcription;
-        transcription = transcription
-          .replace(/\u2026/g, " ")   // "…" → space
-          .replace(/\.{3}/g, " ")    // "..." → space
-          .replace(/[ \t]{2,}/g, " ")   // collapse runs of spaces/tabs but keep line-breaks
-          .trim();
+        transcription = normalizeTranscriptionText(transcription);
 
         console.debug(
           `Dictation transcript [${sequenceNumber}]: ${transcription}` +
@@ -1374,29 +1657,36 @@ const machine = createMachine<DictationContext, DictationEvent, DictationTypesta
         userIsSpeaking: false,
         timeUserStoppedSpeaking: 0,
         timeUserStartedSpeaking: 0,
+        timeLastTranscriptionReceived: 0,
         targetElement: () => undefined,
         accumulatedText: "",
         transcriptionTargets: () => ({}),
         provisionalTranscriptionTarget: () => undefined,
         targetSwitchesDuringSpeech: () => undefined,
         speechStartTarget: () => undefined,
+        audioSegmentsByTarget: () => ({}),
+        refinementPendingForTargets: () => new Set<string>(),
+        pendingRefinements: () => new Map(),
       }),
 
       finalizeDictation: (context: DictationContext) => {
         // Generate final merged text from current target's transcriptions
         let finalText = context.accumulatedText;
-        
+
         if (context.targetElement) {
           const targetTranscriptions = getTranscriptionsForTarget(context, context.targetElement);
           finalText = computeFinalText(targetTranscriptions, [], finalText, "", false);
         }
-        
+
+        // Clear all audio buffers when dictation is finalized
+        clearAllAudioBuffers(context);
+
         // Emit event that dictation is complete
         EventBus.emit("dictation:complete", {
           targetElement: context.targetElement,
           text: finalText,
         });
-        
+
         console.log("Dictation completed for target:", context.targetElement, "with text:", finalText);
       },
 
@@ -1504,7 +1794,8 @@ const machine = createMachine<DictationContext, DictationEvent, DictationTypesta
             context.sessionId,
             MAX_RETRIES,
             captureTs,
-            clientReceiveTs
+            clientReceiveTs,
+            audioData  // Pass frames for buffering
           );
         };
 
@@ -1564,7 +1855,8 @@ const machine = createMachine<DictationContext, DictationEvent, DictationTypesta
             context.sessionId,
             MAX_RETRIES,
             event.captureTimestamp,
-            event.clientReceiveTimestamp
+            event.clientReceiveTimestamp,
+            event.frames  // Pass frames for buffering
           );
         } else {
           console.warn("No target element set for transcription");
@@ -1613,22 +1905,148 @@ const machine = createMachine<DictationContext, DictationEvent, DictationTypesta
         
         // Clear initial text for this target
         delete context.initialTextByTarget[targetId];
-        
+
         // Reset accumulated text if this is the current target
         if (event.targetElement === context.targetElement) {
           context.accumulatedText = "";
         }
 
+        // Clear audio buffers and refinement state for this target
+        // This prevents stale audio (up to 120s) from being refined later
+        clearAudioForTarget(context, targetId);
+
         // Emit event to notify that dictation was terminated due to manual edit
         EventBus.emit("dictation:terminatedByManualEdit", {
           targetElement: event.targetElement,
           reason: "manual-edit"
         });
-        
+
         // Stop recording and cleanup
         EventBus.emit("audio:stopRecording");
         EventBus.emit("audio:tearDownRecording");
       },
+
+      performContextualRefinement: (
+        context: DictationContext,
+        event: DictationEvent
+      ) => {
+        console.debug("[DictationMachine] performContextualRefinement triggered");
+
+        // Determine which target(s) to refine
+        let targetsToRefine: HTMLElement[] = [];
+
+        if (event.type === "saypi:refineTranscription") {
+          // Explicit refinement request for a specific target
+          targetsToRefine = [(event as DictationRefineTranscriptionEvent).targetElement];
+        } else {
+          // Endpoint-triggered: refine ALL pending targets (not just current one)
+          // This handles the case where user switched targets mid-dictation
+          for (const targetId of context.refinementPendingForTargets) {
+            // Find the target element by looking through transcription targets
+            const targetElement = Object.values(context.transcriptionTargets).find(
+              el => getTargetElementId(el) === targetId
+            );
+
+            if (targetElement) {
+              targetsToRefine.push(targetElement);
+            } else {
+              console.warn(`[DictationMachine] No element found for pending refinement target ${targetId}`);
+              context.refinementPendingForTargets.delete(targetId);
+            }
+          }
+        }
+
+        if (targetsToRefine.length === 0) {
+          console.debug("[DictationMachine] No targets to refine");
+          return;
+        }
+
+        // Process each target
+        for (const targetElement of targetsToRefine) {
+          const targetId = getTargetElementId(targetElement);
+          const segments = context.audioSegmentsByTarget[targetId];
+
+          if (!segments || segments.length === 0) {
+            console.debug(`[DictationMachine] No audio segments to refine for target ${targetId}`);
+            // Clear the pending flag even if no segments (cleanup)
+            context.refinementPendingForTargets.delete(targetId);
+            continue;
+          }
+
+          // Skip refinement if only 1 segment (no additional context for improvement)
+          if (segments.length === 1) {
+            console.debug(`[DictationMachine] Skipping refinement for target ${targetId} until more segments arrive`);
+            continue; // Keep buffering, don't clear
+          }
+
+          // Remove pending flag now that refinement is in-flight to avoid duplicate submissions
+          context.refinementPendingForTargets.delete(targetId);
+
+          // Always refine ALL segments for maximum context (full contextual refinement)
+          console.debug(
+            `[DictationMachine] Starting refinement for target ${targetId} with ${segments.length} segments (full context)`
+          );
+
+          // Concatenate ALL audio segments from session start
+          const totalDuration = segments.reduce((sum, seg) => sum + seg.duration, 0);
+          const totalFrames = segments.reduce((sum, seg) => sum + seg.frames.length, 0);
+
+          // Combine all frames into a single Float32Array
+          const combinedFrames = new Float32Array(totalFrames);
+          let offset = 0;
+          for (const segment of segments) {
+            combinedFrames.set(segment.frames, offset);
+            offset += segment.frames.length;
+          }
+
+          // Convert combined frames to WAV blob
+          const combinedBlob = convertToWavBlob(combinedFrames);
+
+          console.debug(
+            `[DictationMachine] Concatenated ${segments.length} segments: ${totalDuration}ms, ${totalFrames} frames, ${combinedBlob.size} bytes`
+          );
+
+          // For logging, treat the capture timestamp as the time we initiate refinement.
+          // Refinement intentionally reuses the *current* timestamp so Telemetry treats the
+          // Phase 2 upload as new work rather than flagging the earlier capture delay.
+          const refinementStartTimestamp = Date.now();
+
+          // Optionally persist the refinement chunk if keepSegments is enabled
+          persistAudioSegment(combinedBlob, refinementStartTimestamp, totalDuration, "saypi-refinement");
+
+          // Generate UUID for this refinement request (separate from Phase 1 sequence tracking)
+          const requestId = crypto.randomUUID();
+
+          // Track refinement metadata independently
+          context.pendingRefinements.set(requestId, {
+            targetId,
+            targetElement,
+            segmentCount: segments.length,
+            timestamp: refinementStartTimestamp
+          });
+
+          console.debug(
+            `[DictationMachine] Refinement ${requestId} initiated for target ${targetId}`
+          );
+
+          // Upload using bare-bones refinement function (no sequence number tracking)
+          uploadAudioForRefinement(
+            combinedBlob,
+            totalDuration,
+            requestId,
+            context.sessionId,
+            3 // max retries
+          ).then((transcriptionText) => {
+            // Handle response inline (no event bus routing needed)
+            handleRefinementComplete(context, requestId, transcriptionText);
+          }).catch((error) => {
+            console.error(`[DictationMachine] Refinement ${requestId} failed:`, error);
+            // Clean up metadata on failure
+            context.pendingRefinements.delete(requestId);
+            // Note: We don't clear audio segments - they may be retried later
+          });
+        } // end for loop over targetsToRefine
+      },
     },
     services: {},
     guards: {
@@ -1650,8 +2068,69 @@ const machine = createMachine<DictationContext, DictationEvent, DictationTypesta
         }
         return false;
       },
+      refinementConditionsMet: (context: DictationContext) => {
+        // Check if we have pending refinements and not currently transcribing
+        return context.refinementPendingForTargets.size > 0 && !context.isTranscribing;
+      },
+      hasSegmentsForRefinement: (context: DictationContext, event: DictationEvent) => {
+        if (event.type !== "saypi:refineTranscription") {
+          return false;
+        }
+        const targetElement =
+          (event as DictationRefineTranscriptionEvent).targetElement ?? context.targetElement;
+        if (!targetElement) {
+          return false;
+        }
+        const targetId = getTargetElementId(targetElement);
+        const segments = context.audioSegmentsByTarget[targetId];
+        return Array.isArray(segments) && segments.length > 0;
+      },
+    },
+    delays: {
+      refinementDelay: (context: DictationContext, event: DictationEvent) => {
+        // Only calculate delay for transcription events
+        if (event.type !== "saypi:transcribed") {
+          return 0;
+        }
+
+        const transcriptionEvent = event as DictationTranscribedEvent;
+
+        // Use configured max delay for dictation endpoint detection
+        const maxDelay = REFINEMENT_MAX_DELAY_MS;
+
+        // Use pFinishedSpeaking from API, default to 1 if not provided
+        let probabilityFinished = transcriptionEvent.pFinishedSpeaking ?? 1;
+
+        // Use tempo from API, default to 0 if not provided (neutral)
+        let tempo = transcriptionEvent.tempo ?? 0;
+        // Clamp tempo to [0, 1]
+        tempo = Math.max(0, Math.min(1, tempo));
+
+        const scheduledAt = Date.now();
+        const timeElapsed = scheduledAt - context.timeLastTranscriptionReceived;
+        const finalDelay = calculateDelay(
+          context.timeLastTranscriptionReceived,
+          probabilityFinished,
+          tempo,
+          maxDelay
+        );
+
+        console.debug(
+          "[DictationMachine] refinementDelay:",
+          JSON.stringify({
+            seq: transcriptionEvent.sequenceNumber,
+            pFinished: probabilityFinished,
+            tempo,
+            maxDelay,
+            timeElapsed,
+            finalDelay,
+            scheduledAt
+          })
+        );
+
+        return finalDelay;
+      },
     },
-    delays: {},
   }
 );
 
@@ -1660,4 +2139,4 @@ export function createDictationMachine(targetElement?: HTMLElement) {
   return machine;
 }
 
-export { machine as DictationMachine };
\ No newline at end of file
+export { machine as DictationMachine };
diff --git a/test/state-machines/DictationMachine-Refinement.spec.ts b/test/state-machines/DictationMachine-Refinement.spec.ts
new file mode 100644
index 0000000000..34dc6896a2
--- /dev/null
+++ b/test/state-machines/DictationMachine-Refinement.spec.ts
@@ -0,0 +1,1194 @@
+import { describe, it, expect, vi, beforeEach, afterEach, beforeAll } from 'vitest';
+import { interpret } from 'xstate';
+import EventBus from '../../src/events/EventBus.js';
+
+// Mock dependencies
+vi.mock('../../src/TranscriptionModule', () => ({
+  uploadAudioWithRetry: vi.fn((...args: any[]) => {
+    const callback = args[9];
+    if (typeof callback === 'function') {
+      callback(1);
+    }
+    return Promise.resolve(1);
+  }),
+  uploadAudioForRefinement: vi.fn((blob, duration, requestId) => {
+    // Mock refinement upload - returns full transcription of audio
+    return Promise.resolve('Refined transcription text');
+  }),
+  isTranscriptionPending: vi.fn(() => false),
+  clearPendingTranscriptions: vi.fn(),
+  getCurrentSequenceNumber: vi.fn(() => 0),
+}));
+
+vi.mock('../../src/ConfigModule', () => ({
+  config: {
+    apiServerUrl: 'http://localhost:3000',
+  },
+}));
+
+vi.mock('../../src/prefs/PreferenceModule', () => ({
+  UserPreferenceModule: {
+    getInstance: () => ({
+      getLanguage: vi.fn(() => Promise.resolve('en')),
+    }),
+  },
+}));
+
+vi.mock('../../src/error-management/TranscriptionErrorManager', () => ({
+  default: {
+    recordAttempt: vi.fn(),
+  },
+}));
+
+vi.mock('../../src/TranscriptMergeService', () => ({
+  TranscriptMergeService: vi.fn().mockImplementation(() => ({
+    mergeTranscriptsLocal: vi.fn((transcripts) => {
+      return Object.keys(transcripts)
+        .sort((a, b) => parseInt(a) - parseInt(b))
+        .map(key => transcripts[key])
+        .join(' ');
+    }),
+  })),
+}));
+
+vi.mock('../../src/audio/AudioEncoder', () => ({
+  convertToWavBlob: vi.fn((frames: Float32Array) => {
+    // Return a mock blob with size proportional to frame count
+    return new Blob([new ArrayBuffer(frames.length * 4)], { type: 'audio/wav' });
+  }),
+}));
+
+vi.mock('../../src/TimerModule', () => ({
+  calculateDelay: vi.fn(() => 100), // Short delay for testing
+}));
+
+// Mock EventBus
+vi.spyOn(EventBus, 'emit');
+
+// Import the machine after mocks are set up
+import { createDictationMachine } from '../../src/state-machines/DictationMachine';
+import * as TranscriptionModule from '../../src/TranscriptionModule';
+import * as AudioEncoder from '../../src/audio/AudioEncoder';
+import * as TimerModule from '../../src/TimerModule';
+
+const resolveUpload = (sequence: number) => (...args: any[]) => {
+  const callback = args[9] as ((seq: number) => void) | undefined;
+  if (typeof callback === 'function') {
+    callback(sequence);
+  }
+  return Promise.resolve(sequence);
+};
+
+/**
+ * NOTE: These tests were updated for UUID-based refinement tracking.
+ *
+ * Key changes from sequence-based approach:
+ * - Refinements no longer use sequence numbers or `saypi:transcribed` events
+ * - Refinements are uploaded via `uploadAudioForRefinement()` (not `uploadAudioWithRetry()`)
+ * - Refinement responses are handled via Promise callbacks (not event bus)
+ * - Refinement tracking uses `pendingRefinements` Map (requestId → metadata)
+ * - Refinement transcriptions use negative keys to avoid collision with Phase 1 sequences
+ *
+ * To test refinement completion:
+ * 1. Trigger refinement with `saypi:refineTranscription` event
+ * 2. Wait for `uploadAudioForRefinement` Promise to resolve
+ * 3. Check `pendingRefinements` Map is cleared
+ * 4. Check `transcriptionsByTarget` has negative key with refinement text
+ * 5. Check Phase 1 sequences (positive keys) are deleted
+ */
+describe('DictationMachine - Dual-Phase Refinement', () => {
+  let service: any;
+  let inputElement: HTMLInputElement;
+
+  beforeAll(() => {
+    inputElement = document.createElement('input');
+    inputElement.id = 'test-input';
+    inputElement.name = 'testField';
+    inputElement.placeholder = 'Test input';
+  });
+
+  beforeEach(() => {
+    vi.clearAllMocks();
+
+    vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockClear();
+    vi.mocked(TranscriptionModule.uploadAudioForRefinement).mockClear();
+    vi.mocked(TranscriptionModule.uploadAudioForRefinement).mockImplementation((blob, duration, requestId) => {
+      // Default: Return full refined transcription
+      return Promise.resolve('refined transcription');
+    });
+    vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(0);
+    vi.mocked(EventBus.emit).mockClear();
+    vi.mocked(AudioEncoder.convertToWavBlob).mockClear();
+    vi.mocked(TimerModule.calculateDelay).mockReturnValue(100);
+
+    inputElement.value = '';
+
+    const machine = createDictationMachine();
+    service = interpret(machine);
+  });
+
+  afterEach(() => {
+    if (service) {
+      service.stop();
+    }
+  });
+
+  describe('Audio Segment Buffering', () => {
+    it('should buffer audio segments when frames are provided', async () => {
+      service.start();
+
+      // Start dictation
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      // Speaking events with frames
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockFrames = new Float32Array(1000);
+      const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(1);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(2));
+
+      // Stop speaking - should buffer the audio
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob,
+        frames: mockFrames,
+      });
+
+      // Verify audio was uploaded for Phase 1
+      expect(TranscriptionModule.uploadAudioWithRetry).toHaveBeenCalled();
+
+      // Check that the context has buffered audio
+      const state = service.getSnapshot();
+      const targetId = `${inputElement.id || inputElement.name}`;
+      expect(state.context.audioSegmentsByTarget[targetId]).toBeDefined();
+      expect(state.context.audioSegmentsByTarget[targetId].length).toBe(1);
+    });
+
+    it('should not buffer audio segments when frames are missing', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(1);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(2));
+
+      // Stop speaking WITHOUT frames
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob,
+        // No frames parameter
+      });
+
+      // Check that no audio was buffered
+      const state = service.getSnapshot();
+      const targetId = `${inputElement.id || inputElement.name}`;
+      expect(state.context.audioSegmentsByTarget[targetId]).toBeUndefined();
+    });
+
+    it('should trim old segments when exceeding 120s buffer limit', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      // Add 13 segments of 10 seconds each (130s total, exceeds 120s limit)
+      for (let i = 0; i < 13; i++) {
+        service.send({ type: 'saypi:userSpeaking' });
+
+        const mockFrames = new Float32Array(10000);
+        const mockBlob = new Blob([new ArrayBuffer(40000)]);
+
+        vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(i + 1);
+        vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(i + 2));
+
+        service.send({
+          type: 'saypi:userStoppedSpeaking',
+          duration: 10000, // 10 seconds
+          blob: mockBlob,
+          frames: mockFrames,
+        });
+
+        // Simulate transcription response
+        service.send({
+          type: 'saypi:transcribed',
+          text: `segment ${i}`,
+          sequenceNumber: i + 2,
+        });
+      }
+
+      // Check that buffer was trimmed
+      const state = service.getSnapshot();
+      const targetId = `${inputElement.id || inputElement.name}`;
+      const segments = state.context.audioSegmentsByTarget[targetId];
+
+      expect(segments).toBeDefined();
+      // Should have trimmed the first segment (10s) to stay under 120s
+      expect(segments.length).toBeLessThanOrEqual(12);
+
+      // Calculate total duration
+      const totalDuration = segments.reduce((sum: number, seg: any) => sum + seg.duration, 0);
+      expect(totalDuration).toBeLessThanOrEqual(120000);
+    });
+  });
+
+  describe('Refinement Delay Calculation', () => {
+    it('should calculate refinement delay based on pFinishedSpeaking and tempo', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockFrames = new Float32Array(1000);
+      const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(1);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(2));
+
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob,
+        frames: mockFrames,
+      });
+
+      // Send transcription with endpoint indicators
+      service.send({
+        type: 'saypi:transcribed',
+        text: 'hello world',
+        sequenceNumber: 2,
+        pFinishedSpeaking: 0.9,
+        tempo: 0.5,
+      });
+
+      // Verify calculateDelay was called with correct parameters
+      expect(TimerModule.calculateDelay).toHaveBeenCalled();
+    });
+
+    it('should not trigger refinement if refinementConditionsMet guard fails', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      // Don't add any audio segments
+
+      // Try to trigger refinement manually
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      // Should NOT upload refinement because no segments exist
+      const uploadCalls = vi.mocked(TranscriptionModule.uploadAudioWithRetry).mock.calls;
+      const refinementCalls = uploadCalls.filter(call => {
+        // Refinement calls have empty precedingTranscripts
+        return Object.keys(call[2] || {}).length === 0;
+      });
+
+      expect(refinementCalls.length).toBe(0);
+    });
+  });
+
+  describe('Audio Concatenation', () => {
+    it('should concatenate multiple audio segments correctly', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      // Add 3 segments
+      const segmentCount = 3;
+      for (let i = 0; i < segmentCount; i++) {
+        service.send({ type: 'saypi:userSpeaking' });
+
+        const mockFrames = new Float32Array(1000);
+        const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+        vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(i * 2 + 1);
+        vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(i * 2 + 2));
+
+        service.send({
+          type: 'saypi:userStoppedSpeaking',
+          duration: 1000,
+          blob: mockBlob,
+          frames: mockFrames,
+        });
+
+        service.send({
+          type: 'saypi:transcribed',
+          text: `segment ${i}`,
+          sequenceNumber: i * 2 + 2,
+        });
+      }
+
+      // Clear the mock to track refinement upload
+      vi.mocked(AudioEncoder.convertToWavBlob).mockClear();
+
+      // Trigger refinement manually
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      // Wait for async operations
+      await new Promise(resolve => setTimeout(resolve, 50));
+
+      // Verify convertToWavBlob was called with concatenated frames
+      expect(AudioEncoder.convertToWavBlob).toHaveBeenCalled();
+      const concatenatedFrames = vi.mocked(AudioEncoder.convertToWavBlob).mock.calls[0][0];
+
+      // Should have combined 3 segments of 1000 frames each = 3000 frames
+      expect(concatenatedFrames.length).toBe(3000);
+
+      // Verify refinement upload was called (UUID-based, no sequence numbers)
+      expect(TranscriptionModule.uploadAudioForRefinement).toHaveBeenCalled();
+      const uploadCall = vi.mocked(TranscriptionModule.uploadAudioForRefinement).mock.calls[0];
+
+      // uploadAudioForRefinement has signature: (blob, duration, requestId, sessionId, maxRetries)
+      // Check duration parameter (index 1)
+      expect(uploadCall[1]).toBe(3000); // Total duration should be 3000ms
+
+      // Check requestId is a UUID string (index 2)
+      expect(typeof uploadCall[2]).toBe('string');
+
+      // Check blob was passed (index 0)
+      expect(uploadCall[0]).toBeInstanceOf(Blob);
+    });
+  });
+
+  describe('Refinement Response Handling', () => {
+    it('should replace Phase 1 transcriptions with refined result (full contextual refinement)', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      // Add 2 Phase 1 segments
+      for (let i = 0; i < 2; i++) {
+        service.send({ type: 'saypi:userSpeaking' });
+
+        const mockFrames = new Float32Array(1000);
+        const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+        vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(i * 2 + 1);
+        vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(i * 2 + 2));
+
+        service.send({
+          type: 'saypi:userStoppedSpeaking',
+          duration: 1000,
+          blob: mockBlob,
+          frames: mockFrames,
+        });
+
+        service.send({
+          type: 'saypi:transcribed',
+          text: `phase1 ${i}`,
+          sequenceNumber: i * 2 + 2,
+        });
+      }
+
+      // Trigger refinement (UUID-based, no sequence number)
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      // Wait for refinement Promise to resolve
+      await new Promise(resolve => setTimeout(resolve, 50));
+
+      // Check that Phase 1 transcriptions were REPLACED (not appended)
+      const state = service.getSnapshot();
+      const targetId = `${inputElement.id || inputElement.name}`;
+
+      // Refinement should be stored with negative key (not sequence number)
+      const transcriptionKeys = Object.keys(state.context.transcriptionsByTarget[targetId] || {}).map(k => parseInt(k, 10));
+
+      // Should have exactly 1 transcription (refinement with negative key)
+      expect(transcriptionKeys.length).toBe(1);
+      expect(transcriptionKeys[0]).toBeLessThan(0); // Negative timestamp key
+
+      // Phase 1 transcriptions should be deleted from global storage
+      expect(state.context.transcriptions[2]).toBeUndefined();
+      expect(state.context.transcriptions[4]).toBeUndefined();
+
+      // Refinement should remain in global storage
+      const refinementKey = transcriptionKeys[0];
+      expect(state.context.transcriptions[refinementKey]).toBe('refined transcription');
+
+      // Pending refinement metadata should be cleared
+      expect(state.context.pendingRefinements.size).toBe(0);
+    });
+
+    it('should emit dictation:refined event on successful refinement', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      // Add 2 segments (need >=2 for refinement)
+      for (let i = 0; i < 2; i++) {
+        service.send({ type: 'saypi:userSpeaking' });
+
+        const mockFrames = new Float32Array(1000);
+        const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+        vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(i * 2 + 1);
+        vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(i * 2 + 2));
+
+        service.send({
+          type: 'saypi:userStoppedSpeaking',
+          duration: 1000,
+          blob: mockBlob,
+          frames: mockFrames,
+        });
+
+        service.send({
+          type: 'saypi:transcribed',
+          text: `phase1 segment ${i}`,
+          sequenceNumber: i * 2 + 2,
+        });
+      }
+
+      // Clear EventBus mock to track refinement event
+      vi.mocked(EventBus.emit).mockClear();
+
+      // Trigger refinement
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      // Wait for refinement Promise to resolve
+      await new Promise(resolve => setTimeout(resolve, 50));
+
+      // Verify dictation:refined event was emitted
+      const refinedEvents = vi.mocked(EventBus.emit).mock.calls.filter(
+        call => call[0] === 'dictation:refined'
+      );
+
+      expect(refinedEvents.length).toBeGreaterThan(0);
+      expect(refinedEvents[0][1]).toMatchObject({
+        targetElement: inputElement,
+        refinedText: 'refined transcription', // From default mock in beforeEach
+      });
+    });
+  });
+
+  describe('State Transitions', () => {
+    it('should transition from accumulating to refining on explicit trigger', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockFrames = new Float32Array(1000);
+      const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(1);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(2));
+
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob,
+        frames: mockFrames,
+      });
+
+      service.send({
+        type: 'saypi:transcribed',
+        text: 'hello',
+        sequenceNumber: 2,
+      });
+
+      // Should be in accumulating state
+      expect(service.getSnapshot().matches({ listening: { converting: 'accumulating' } })).toBe(true);
+
+      // Trigger refinement
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(100));
+
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      // Should transition to refining state
+      expect(service.getSnapshot().matches({ listening: { converting: 'refining' } })).toBe(true);
+    });
+
+    it('should return to accumulating after refinement response', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockFrames = new Float32Array(1000);
+      const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(1);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(2));
+
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob,
+        frames: mockFrames,
+      });
+
+      service.send({
+        type: 'saypi:transcribed',
+        text: 'hello',
+        sequenceNumber: 2,
+      });
+
+      // Trigger refinement
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(100));
+
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      await new Promise(resolve => setTimeout(resolve, 10));
+
+      // Should be in refining
+      expect(service.getSnapshot().matches({ listening: { converting: 'refining' } })).toBe(true);
+
+      // Send refinement response
+      service.send({
+        type: 'saypi:transcribed',
+        text: 'refined',
+        sequenceNumber: 100,
+      });
+
+      // Should return to accumulating
+      expect(service.getSnapshot().matches({ listening: { converting: 'accumulating' } })).toBe(true);
+    });
+  });
+
+  describe('Buffer Cleanup', () => {
+    it('should clear audio buffers when target switches', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockFrames = new Float32Array(1000);
+      const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(1);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(2));
+
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob,
+        frames: mockFrames,
+      });
+
+      // Verify buffer exists
+      const targetId1 = `${inputElement.id || inputElement.name}`;
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId1]).toBeDefined();
+
+      // Create new input element
+      const inputElement2 = document.createElement('input');
+      inputElement2.id = 'test-input-2';
+
+      // Switch target
+      service.send({ type: 'saypi:switchTarget', targetElement: inputElement2 });
+
+      // Original target's buffer should still exist (not cleared on switch)
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId1]).toBeDefined();
+    });
+
+    it('should keep audio buffers after refinement (for incremental refinement)', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockFrames = new Float32Array(1000);
+      const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(1);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(2));
+
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob,
+        frames: mockFrames,
+      });
+
+      service.send({
+        type: 'saypi:transcribed',
+        text: 'hello',
+        sequenceNumber: 2,
+      });
+
+      const targetId = `${inputElement.id || inputElement.name}`;
+
+      // Verify buffer exists
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId]).toBeDefined();
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId].length).toBe(1);
+
+      // Trigger refinement
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(100));
+
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      // Wait for refinement Promise to resolve
+      await new Promise(resolve => setTimeout(resolve, 50));
+
+      // Audio buffer should still exist (kept for future refinements)
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId]).toBeDefined();
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId].length).toBe(1);
+    });
+  });
+
+  describe('Error Scenarios', () => {
+    it('should handle refinement failure gracefully', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      // Add 2 segments (need >=2 for refinement to trigger)
+      for (let i = 0; i < 2; i++) {
+        service.send({ type: 'saypi:userSpeaking' });
+
+        const mockFrames = new Float32Array(1000);
+        const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+        vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(i * 2 + 1);
+        vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(i * 2 + 2));
+
+        service.send({
+          type: 'saypi:userStoppedSpeaking',
+          duration: 1000,
+          blob: mockBlob,
+          frames: mockFrames,
+        });
+
+        service.send({
+          type: 'saypi:transcribed',
+          text: `segment ${i}`,
+          sequenceNumber: i * 2 + 2,
+        });
+      }
+
+      // Clear the mock and set up rejection BEFORE triggering refinement
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockReset();
+
+      // Make refinement upload fail
+      vi.mocked(TranscriptionModule.uploadAudioForRefinement).mockRejectedValue(new Error('Upload failed'));
+
+      // Manually trigger refinement
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      // Wait for async error handling
+      await new Promise(resolve => setTimeout(resolve, 100));
+
+      // With UUID-based tracking, audio buffers are NOT cleared on refinement failure
+      // (they may be retried later)
+      const targetId = `${inputElement.id || inputElement.name}`;
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId]).toBeDefined();
+
+      // But pending refinement metadata should be cleaned up
+      expect(service.getSnapshot().context.pendingRefinements.size).toBe(0);
+    });
+
+    it('should handle missing target element in refinement response', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockFrames = new Float32Array(1000);
+      const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(1);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(2));
+
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob,
+        frames: mockFrames,
+      });
+
+      service.send({
+        type: 'saypi:transcribed',
+        text: 'hello',
+        sequenceNumber: 2,
+      });
+
+      // Trigger refinement
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(100));
+
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      await new Promise(resolve => setTimeout(resolve, 10));
+
+      // Manually delete the target mapping to simulate the missing element case
+      const state = service.getSnapshot();
+      delete state.context.transcriptionTargets[100];
+
+      // Send refinement response with missing target
+      service.send({
+        type: 'saypi:transcribed',
+        text: 'refined',
+        sequenceNumber: 100,
+      });
+
+      // Should handle gracefully without crashing
+      // The refinement should be discarded
+      const targetId = `${inputElement.id || inputElement.name}`;
+      expect(service.getSnapshot().context.transcriptionsByTarget[targetId]).not.toEqual({
+        100: 'refined',
+      });
+    });
+  });
+
+  describe('Incremental Refinement', () => {
+    it('should skip refinement for single segment until more arrive', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockFrames = new Float32Array(1000);
+      const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(1);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(2));
+
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob,
+        frames: mockFrames,
+      });
+
+      service.send({
+        type: 'saypi:transcribed',
+        text: 'segment 0',
+        sequenceNumber: 2,
+      });
+
+      // Clear upload mock to track refinement attempts
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockClear();
+
+      // Try to trigger refinement with only 1 segment
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      await new Promise(resolve => setTimeout(resolve, 10));
+
+      // Should NOT upload refinement (only 1 segment)
+      expect(TranscriptionModule.uploadAudioWithRetry).not.toHaveBeenCalled();
+
+      // Buffer should still exist
+      const targetId = `${inputElement.id || inputElement.name}`;
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId]).toBeDefined();
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId].length).toBe(1);
+    });
+
+    it('should refine ALL segments with full context in subsequent passes', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      // Add 2 segments
+      for (let i = 0; i < 2; i++) {
+        service.send({ type: 'saypi:userSpeaking' });
+
+        const mockFrames = new Float32Array(1000);
+        const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+        vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(i * 2 + 1);
+        vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(i * 2 + 2));
+
+        service.send({
+          type: 'saypi:userStoppedSpeaking',
+          duration: 1000,
+          blob: mockBlob,
+          frames: mockFrames,
+        });
+
+        service.send({
+          type: 'saypi:transcribed',
+          text: `segment ${i}`,
+          sequenceNumber: i * 2 + 2,
+        });
+      }
+
+      // First refinement pass
+      vi.mocked(AudioEncoder.convertToWavBlob).mockClear();
+
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      await new Promise(resolve => setTimeout(resolve, 50)); // Wait for refinement Promise
+
+      // Should have concatenated 2 segments (2000 frames)
+      expect(AudioEncoder.convertToWavBlob).toHaveBeenCalled();
+      let concatenatedFrames = vi.mocked(AudioEncoder.convertToWavBlob).mock.calls[0][0];
+      expect(concatenatedFrames.length).toBe(2000);
+
+      // Add one more segment
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockFrames3 = new Float32Array(1000);
+      const mockBlob3 = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(5);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(6));
+
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob3,
+        frames: mockFrames3,
+      });
+
+      service.send({
+        type: 'saypi:transcribed',
+        text: 'segment 2',
+        sequenceNumber: 6,
+      });
+
+      // Second refinement pass - should refine ALL 3 segments (full context)
+      vi.mocked(AudioEncoder.convertToWavBlob).mockClear();
+
+      service.send({
+        type: 'saypi:refineTranscription',
+        targetElement: inputElement,
+      });
+
+      await new Promise(resolve => setTimeout(resolve, 50)); // Wait for refinement Promise
+
+      // Should have concatenated ALL 3 segments (3000 frames) for full context
+      expect(AudioEncoder.convertToWavBlob).toHaveBeenCalled();
+      concatenatedFrames = vi.mocked(AudioEncoder.convertToWavBlob).mock.calls[0][0];
+      expect(concatenatedFrames.length).toBe(3000);
+
+      // Check final state after second refinement completes
+      const state = service.getSnapshot();
+      const targetId = `${inputElement.id || inputElement.name}`;
+
+      // Should have exactly 1 transcription (latest refinement with negative key)
+      const transcriptionKeys = Object.keys(state.context.transcriptionsByTarget[targetId] || {}).map(k => parseInt(k, 10));
+      expect(transcriptionKeys.length).toBe(1);
+      expect(transcriptionKeys[0]).toBeLessThan(0); // Negative timestamp key
+
+      // Latest refinement text should be stored (from mock default)
+      const refinementKey = transcriptionKeys[0];
+      expect(state.context.transcriptions[refinementKey]).toBe('refined transcription');
+    });
+  });
+
+  describe('Concurrent Refinements', () => {
+    it('should handle concurrent refinements for multiple targets', async () => {
+      service.start();
+
+      const inputElement2 = document.createElement('input');
+      inputElement2.id = 'test-input-2';
+
+      // Start dictation on first target
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      // Add segment to first target
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockFrames1 = new Float32Array(1000);
+      const mockBlob1 = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(1);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(2));
+
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob1,
+        frames: mockFrames1,
+      });
+
+      service.send({
+        type: 'saypi:transcribed',
+        text: 'target1 text',
+        sequenceNumber: 2,
+      });
+
+      // Switch to second target
+      service.send({ type: 'saypi:switchTarget', targetElement: inputElement2 });
+
+      // Add segment to second target
+      service.send({ type: 'saypi:userSpeaking' });
+
+      const mockFrames2 = new Float32Array(1000);
+      const mockBlob2 = new Blob([new ArrayBuffer(4000)]);
+
+      vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(3);
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(4));
+
+      service.send({
+        type: 'saypi:userStoppedSpeaking',
+        duration: 1000,
+        blob: mockBlob2,
+        frames: mockFrames2,
+      });
+
+      service.send({
+        type: 'saypi:transcribed',
+        text: 'target2 text',
+        sequenceNumber: 4,
+      });
+
+      // Both targets should have buffers
+      const targetId1 = `${inputElement.id || inputElement.name}`;
+      const targetId2 = `${inputElement2.id || inputElement2.name}`;
+
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId1]).toBeDefined();
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId2]).toBeDefined();
+
+      // Both should be pending refinement
+      expect(service.getSnapshot().context.refinementPendingForTargets.has(targetId1)).toBe(true);
+      expect(service.getSnapshot().context.refinementPendingForTargets.has(targetId2)).toBe(true);
+    });
+
+    it('should refine all pending targets even after target switch (Codex bug)', async () => {
+      service.start();
+
+      const inputElement2 = document.createElement('input');
+      inputElement2.id = 'test-input-2';
+      inputElement2.name = 'testField2';
+
+      // Start dictation on first target
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      // Add TWO segments to first target (need >=2 for refinement)
+      for (let i = 0; i < 2; i++) {
+        service.send({ type: 'saypi:userSpeaking' });
+
+        const mockFrames = new Float32Array(1000);
+        const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+        vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(i * 2 + 1);
+        vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(i * 2 + 2));
+
+        service.send({
+          type: 'saypi:userStoppedSpeaking',
+          duration: 1000,
+          blob: mockBlob,
+          frames: mockFrames,
+        });
+
+        service.send({
+          type: 'saypi:transcribed',
+          text: `target1 segment ${i}`,
+          sequenceNumber: i * 2 + 2,
+          pFinishedSpeaking: 0.9, // High probability - will trigger endpoint delay
+          tempo: 0.5,
+        });
+      }
+
+      const targetId1 = `${inputElement.id || inputElement.name}`;
+
+      // Verify target1 is pending refinement
+      expect(service.getSnapshot().context.refinementPendingForTargets.has(targetId1)).toBe(true);
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId1]).toBeDefined();
+
+      // NOW SWITCH TO SECOND TARGET (this is the bug scenario)
+      service.send({ type: 'saypi:switchTarget', targetElement: inputElement2 });
+
+      // Verify current target is now target2
+      expect(service.getSnapshot().context.targetElement).toBe(inputElement2);
+
+      // Mock refinement upload for target1
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockClear();
+      vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(100));
+
+      // Wait for endpoint delay to trigger refinement
+      await new Promise(resolve => setTimeout(resolve, 150));
+
+      // The bug was: refinement would check context.targetElement (now target2)
+      // and find no segments, leaving target1 unrefined
+      // The fix: iterate over refinementPendingForTargets instead
+
+      // Verify refinement was triggered for target1 (not current target) using UUID-based approach
+      expect(TranscriptionModule.uploadAudioForRefinement).toHaveBeenCalled();
+
+      const uploadCall = vi.mocked(TranscriptionModule.uploadAudioForRefinement).mock.calls[0];
+
+      // uploadAudioForRefinement has signature: (blob, duration, requestId, sessionId, maxRetries)
+      expect(uploadCall[0]).toBeInstanceOf(Blob); // blob
+      expect(uploadCall[1]).toBeGreaterThan(0); // duration
+      expect(typeof uploadCall[2]).toBe('string'); // requestId (UUID)
+      expect(uploadCall[3]).toBe('test-session'); // sessionId
+
+      // Wait for refinement Promise to resolve (handled internally)
+      await new Promise(resolve => setTimeout(resolve, 50));
+
+      // Verify target1 was refined even though current target is target2
+      const state = service.getSnapshot();
+
+      // Refinement should be stored with negative key
+      const transcriptionKeys = Object.keys(state.context.transcriptionsByTarget[targetId1] || {}).map(k => parseInt(k, 10));
+      expect(transcriptionKeys.length).toBe(1);
+      expect(transcriptionKeys[0]).toBeLessThan(0); // Negative timestamp key
+
+      // Audio buffer for target1 should still exist (kept for future refinements)
+      expect(state.context.audioSegmentsByTarget[targetId1]).toBeDefined();
+
+      // Refinement pending flag should be cleared
+      expect(state.context.refinementPendingForTargets.has(targetId1)).toBe(false);
+
+      // Pending refinement metadata should be cleared
+      expect(state.context.pendingRefinements.size).toBe(0);
+    });
+  });
+
+  describe('Manual Edit Cleanup', () => {
+    it('should clear audio buffers and refinement state on manual edit (Codex bug)', async () => {
+      service.start();
+
+      service.send({ type: 'saypi:startDictation', targetElement: inputElement });
+      service.send({ type: 'saypi:callReady' });
+      service.send({ type: 'saypi:audio:connected', deviceId: 'test', deviceLabel: 'Test Mic' });
+      service.send({ type: 'saypi:session:assigned', session_id: 'test-session' });
+
+      // Add multiple segments to build up buffer
+      for (let i = 0; i < 3; i++) {
+        service.send({ type: 'saypi:userSpeaking' });
+
+        const mockFrames = new Float32Array(1000);
+        const mockBlob = new Blob([new ArrayBuffer(4000)]);
+
+        vi.mocked(TranscriptionModule.getCurrentSequenceNumber).mockReturnValue(i * 2 + 1);
+        vi.mocked(TranscriptionModule.uploadAudioWithRetry).mockImplementationOnce(resolveUpload(i * 2 + 2));
+
+        service.send({
+          type: 'saypi:userStoppedSpeaking',
+          duration: 1000,
+          blob: mockBlob,
+          frames: mockFrames,
+        });
+
+        service.send({
+          type: 'saypi:transcribed',
+          text: `segment ${i}`,
+          sequenceNumber: i * 2 + 2,
+        });
+      }
+
+      const targetId = `${inputElement.id || inputElement.name}`;
+
+      // Verify we have buffered audio and pending refinement
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId]).toBeDefined();
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId].length).toBe(3);
+      expect(service.getSnapshot().context.refinementPendingForTargets.has(targetId)).toBe(true);
+
+      // User manually edits the field
+      service.send({
+        type: 'saypi:manualEdit',
+        targetElement: inputElement,
+        newContent: 'user typed this',
+        oldContent: 'segment 0 segment 1 segment 2',
+      });
+
+      // The bug was: audio buffers and refinement state were not cleared
+      // This could lead to stale audio (up to 120s) being refined later
+
+      // Verify audio buffers are cleared
+      expect(service.getSnapshot().context.audioSegmentsByTarget[targetId]).toBeUndefined();
+
+      // Verify refinement pending flag is cleared
+      expect(service.getSnapshot().context.refinementPendingForTargets.has(targetId)).toBe(false);
+
+      // Verify pending refinements are cleared (UUID-based tracking)
+      expect(service.getSnapshot().context.pendingRefinements.size).toBe(0);
+
+      // Verify transcription state is also cleared (existing behavior)
+      expect(service.getSnapshot().context.transcriptionsByTarget[targetId]).toBeUndefined();
+
+      // Should transition to idle
+      expect(service.getSnapshot().matches('idle')).toBe(true);
+    });
+  });
+});
diff --git a/test/state-machines/DictationMachine-TargetSwitchBreak.spec.ts b/test/state-machines/DictationMachine-TargetSwitchBreak.spec.ts
index e040059e17..899844991c 100644
--- a/test/state-machines/DictationMachine-TargetSwitchBreak.spec.ts
+++ b/test/state-machines/DictationMachine-TargetSwitchBreak.spec.ts
@@ -187,7 +187,8 @@ describe('DictationMachine - Target Switch Audio Breaking', () => {
         expect.any(Number),
         expect.any(Number),
         "text", // inputType from HTML input element
-        "Enter your name" // inputLabel from placeholder attribute
+        "Enter your name", // inputLabel from placeholder attribute
+        expect.any(Function)
       );
       
       // Verify that only one upload was called (normal processing)
@@ -274,4 +275,4 @@ describe('DictationMachine - Target Switch Audio Breaking', () => {
       expect(service.state.context.targetSwitchesDuringSpeech).toBeUndefined();
     });
   });
-});
\ No newline at end of file
+});
diff --git a/test/state-machines/DictationMachine.spec.ts b/test/state-machines/DictationMachine.spec.ts
index 92768624a6..848244d2ab 100644
--- a/test/state-machines/DictationMachine.spec.ts
+++ b/test/state-machines/DictationMachine.spec.ts
@@ -506,7 +506,8 @@ describe('DictationMachine', () => {
         expect.any(Number),
         expect.any(Number),
         "text", // inputType from HTML input element
-        "Enter your name" // inputLabel from placeholder attribute
+        "Enter your name", // inputLabel from placeholder attribute
+        expect.any(Function)
       );
     });
   });
@@ -928,4 +929,4 @@ describe('DictationMachine', () => {
       });
     });
   });
-});
\ No newline at end of file
+});