Skip to content

Feature: Beat Detection — Phrase-Level Stress Shape Analysis #76

Description

@craigtrim

Overview

Beat in prose is the perceived rhythmic pulse when reading aloud — the pattern of stressed and unstressed syllables across a phrase or sentence. This is distinct from aggregate stress statistics (which we already have) and refers specifically to sequential stress shape across parallel units.

This is a standalone prosodic feature and also a hard blocker for tricolon validation (#73).


What We Already Have

prosody/rhythm_prosody.py contains substantial foundation:

Function What it does
_get_stress_pattern(word) Returns per-syllable stress values [0, 1, 2] from CMU dict
_compute_stress_entropy(words) Shannon entropy of stress patterns across full text
_compute_metrical_feet(words) Iambic / trochaic / dactylic / anapestic ratios across full text
_compute_rhythmic_regularity(syllable_counts) CV-based regularity score
word_stress_patterns (metadata) Per-word stress sequences stored at document level

The CMU wrapper in prosody/pronouncing.py gives us phones_for_word() returning ARPAbet strings with stress digits (0 = unstressed, 1 = primary, 2 = secondary). All the raw materials exist.

The gap: everything above produces aggregate statistics across a full document. Beat detection requires sequential stress analysis at the phrase or member level — not "what is the overall iambic ratio" but "given this specific sequence of words or phrases, what is its stress shape, and how does it compare to adjacent members."


What Needs to Be Built

A compute_beat(units: list[str]) -> BeatResult function that takes a list of text units (words, phrases, or sentences) and returns:

Per-unit analysis

  • Stress sequence: [1, 0, 1, 0] for each unit
  • Syllabic weight: total primary-stressed syllables per unit
  • Beat string: human-readable form — "DUM-da", "da-DUM", "da-da-DUM"

Cross-unit analysis (the key new capability)

  • Weight sequence across units: [2, 2, 4]
  • CV of weights → isocolon (low CV) vs. climactic (rising) vs. anti-climactic (falling)
  • Beat shape classification:
Shape Definition Example
Isocolon All units roughly equal weight (CV < 0.2) "da-da | da-da | da-da"
Climactic Third unit heavier than first two "da | da | da-da-DUM"
Anti-climactic Third unit lighter Rare, stylistically unusual
Irregular No clear pattern High CV, no monotonic trend

Why this matters for AI detection

LLMs default to isocolon — they generate parallel units of roughly equal syllabic weight because it's the safest/most-learned pattern. Human writers, especially skilled ones, tend toward climactic structure — the third member carries more weight as the rhetorical culmination. This asymmetry is a discriminating signal within tricolon, not just in its detection.


Implementation Path

  1. Reuse _get_stress_pattern(word) from rhythm_prosody.py — already does the CMU lookup
  2. Add phrase-level aggregation: sum stress values across all words in a unit to get unit weight
  3. Add shape classifier: compare weights across units, classify as isocolon / climactic / anti-climactic / irregular
  4. Add beat string renderer: translate stress sequences to DUM/da notation for human inspection

No new dependencies required. All infrastructure is in place.


Output Type

BeatResult:
  units: list[BeatUnit]        # per-unit analysis
    - text: str
    - stress_sequence: list[int]
    - syllabic_weight: int
    - beat_string: str          # e.g. "da-DUM-da"
  weight_sequence: list[int]   # across units
  weight_cv: float             # isocolon indicator
  beat_shape: str              # "isocolon" | "climactic" | "anti-climactic" | "irregular"
  cmu_coverage: float          # fraction of words found in CMU dict

Blockers

  • Blocks Feature: Tricolon Detector #73 (Tricolon Detector) — tricolon candidate validation requires beat shape to distinguish strong / weak / rejected candidates and to detect isocolon vs. climactic patterns as an AI-tell discriminator
  • No other blockers — can be built now against existing infrastructure

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions