Feature: Beat Detection — Phrase-Level Stress Shape Analysis

## Overview

**Beat** in prose is the perceived rhythmic pulse when reading aloud — the pattern of stressed and unstressed syllables across a phrase or sentence. This is distinct from aggregate stress statistics (which we already have) and refers specifically to **sequential stress shape** across parallel units.

This is a standalone prosodic feature and also a hard blocker for tricolon validation (#73).

---

## What We Already Have

`prosody/rhythm_prosody.py` contains substantial foundation:

| Function | What it does |
|---|---|
| `_get_stress_pattern(word)` | Returns per-syllable stress values `[0, 1, 2]` from CMU dict |
| `_compute_stress_entropy(words)` | Shannon entropy of stress patterns across full text |
| `_compute_metrical_feet(words)` | Iambic / trochaic / dactylic / anapestic ratios across full text |
| `_compute_rhythmic_regularity(syllable_counts)` | CV-based regularity score |
| `word_stress_patterns` (metadata) | Per-word stress sequences stored at document level |

The CMU wrapper in `prosody/pronouncing.py` gives us `phones_for_word()` returning ARPAbet strings with stress digits (0 = unstressed, 1 = primary, 2 = secondary). All the raw materials exist.

**The gap**: everything above produces **aggregate statistics** across a full document. Beat detection requires **sequential stress analysis at the phrase or member level** — not "what is the overall iambic ratio" but "given this specific sequence of words or phrases, what is its stress shape, and how does it compare to adjacent members."

---

## What Needs to Be Built

A `compute_beat(units: list[str]) -> BeatResult` function that takes a list of text units (words, phrases, or sentences) and returns:

### Per-unit analysis
- Stress sequence: `[1, 0, 1, 0]` for each unit
- Syllabic weight: total primary-stressed syllables per unit
- Beat string: human-readable form — `"DUM-da"`, `"da-DUM"`, `"da-da-DUM"`

### Cross-unit analysis (the key new capability)
- Weight sequence across units: `[2, 2, 4]`
- CV of weights → **isocolon** (low CV) vs. **climactic** (rising) vs. **anti-climactic** (falling)
- Beat shape classification:

| Shape | Definition | Example |
|---|---|---|
| **Isocolon** | All units roughly equal weight (CV < 0.2) | `"da-da \| da-da \| da-da"` |
| **Climactic** | Third unit heavier than first two | `"da \| da \| da-da-DUM"` |
| **Anti-climactic** | Third unit lighter | Rare, stylistically unusual |
| **Irregular** | No clear pattern | High CV, no monotonic trend |

### Why this matters for AI detection
LLMs default to **isocolon** — they generate parallel units of roughly equal syllabic weight because it's the safest/most-learned pattern. Human writers, especially skilled ones, tend toward **climactic** structure — the third member carries more weight as the rhetorical culmination. This asymmetry is a discriminating signal *within* tricolon, not just in its detection.

---

## Implementation Path

1. **Reuse `_get_stress_pattern(word)`** from `rhythm_prosody.py` — already does the CMU lookup
2. **Add phrase-level aggregation**: sum stress values across all words in a unit to get unit weight
3. **Add shape classifier**: compare weights across units, classify as isocolon / climactic / anti-climactic / irregular
4. **Add beat string renderer**: translate stress sequences to `DUM/da` notation for human inspection

No new dependencies required. All infrastructure is in place.

---

## Output Type

```
BeatResult:
  units: list[BeatUnit]        # per-unit analysis
    - text: str
    - stress_sequence: list[int]
    - syllabic_weight: int
    - beat_string: str          # e.g. "da-DUM-da"
  weight_sequence: list[int]   # across units
  weight_cv: float             # isocolon indicator
  beat_shape: str              # "isocolon" | "climactic" | "anti-climactic" | "irregular"
  cmu_coverage: float          # fraction of words found in CMU dict
```

---

## Blockers

- Blocks **#73** (Tricolon Detector) — tricolon candidate validation requires beat shape to distinguish strong / weak / rejected candidates and to detect isocolon vs. climactic patterns as an AI-tell discriminator
- No other blockers — can be built now against existing infrastructure

## Related

- #69 — AI Stylistic Tell Detection
- #73 — Tricolon Detector (downstream consumer)
- `prosody/rhythm_prosody.py` — existing stress infrastructure
- `prosody/pronouncing.py` — CMU dict wrapper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Beat Detection — Phrase-Level Stress Shape Analysis #76

Overview

What We Already Have

What Needs to Be Built

Per-unit analysis

Cross-unit analysis (the key new capability)

Why this matters for AI detection

Implementation Path

Output Type

Blockers

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Function	What it does
`_get_stress_pattern(word)`	Returns per-syllable stress values `[0, 1, 2]` from CMU dict
`_compute_stress_entropy(words)`	Shannon entropy of stress patterns across full text
`_compute_metrical_feet(words)`	Iambic / trochaic / dactylic / anapestic ratios across full text
`_compute_rhythmic_regularity(syllable_counts)`	CV-based regularity score
`word_stress_patterns` (metadata)	Per-word stress sequences stored at document level

Shape	Definition	Example
Isocolon	All units roughly equal weight (CV < 0.2)	`"da-da \| da-da \| da-da"`
Climactic	Third unit heavier than first two	`"da \| da \| da-da-DUM"`
Anti-climactic	Third unit lighter	Rare, stylistically unusual
Irregular	No clear pattern	High CV, no monotonic trend

Feature: Beat Detection — Phrase-Level Stress Shape Analysis #76

Description

Overview

What We Already Have

What Needs to Be Built

Per-unit analysis

Cross-unit analysis (the key new capability)

Why this matters for AI detection

Implementation Path

Output Type

Blockers

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions