You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Beat in prose is the perceived rhythmic pulse when reading aloud — the pattern of stressed and unstressed syllables across a phrase or sentence. This is distinct from aggregate stress statistics (which we already have) and refers specifically to sequential stress shape across parallel units.
This is a standalone prosodic feature and also a hard blocker for tricolon validation (#73).
Returns per-syllable stress values [0, 1, 2] from CMU dict
_compute_stress_entropy(words)
Shannon entropy of stress patterns across full text
_compute_metrical_feet(words)
Iambic / trochaic / dactylic / anapestic ratios across full text
_compute_rhythmic_regularity(syllable_counts)
CV-based regularity score
word_stress_patterns (metadata)
Per-word stress sequences stored at document level
The CMU wrapper in prosody/pronouncing.py gives us phones_for_word() returning ARPAbet strings with stress digits (0 = unstressed, 1 = primary, 2 = secondary). All the raw materials exist.
The gap: everything above produces aggregate statistics across a full document. Beat detection requires sequential stress analysis at the phrase or member level — not "what is the overall iambic ratio" but "given this specific sequence of words or phrases, what is its stress shape, and how does it compare to adjacent members."
What Needs to Be Built
A compute_beat(units: list[str]) -> BeatResult function that takes a list of text units (words, phrases, or sentences) and returns:
Per-unit analysis
Stress sequence: [1, 0, 1, 0] for each unit
Syllabic weight: total primary-stressed syllables per unit
Beat string: human-readable form — "DUM-da", "da-DUM", "da-da-DUM"
Cross-unit analysis (the key new capability)
Weight sequence across units: [2, 2, 4]
CV of weights → isocolon (low CV) vs. climactic (rising) vs. anti-climactic (falling)
Beat shape classification:
Shape
Definition
Example
Isocolon
All units roughly equal weight (CV < 0.2)
"da-da | da-da | da-da"
Climactic
Third unit heavier than first two
"da | da | da-da-DUM"
Anti-climactic
Third unit lighter
Rare, stylistically unusual
Irregular
No clear pattern
High CV, no monotonic trend
Why this matters for AI detection
LLMs default to isocolon — they generate parallel units of roughly equal syllabic weight because it's the safest/most-learned pattern. Human writers, especially skilled ones, tend toward climactic structure — the third member carries more weight as the rhetorical culmination. This asymmetry is a discriminating signal within tricolon, not just in its detection.
Implementation Path
Reuse _get_stress_pattern(word) from rhythm_prosody.py — already does the CMU lookup
Add phrase-level aggregation: sum stress values across all words in a unit to get unit weight
Add shape classifier: compare weights across units, classify as isocolon / climactic / anti-climactic / irregular
Add beat string renderer: translate stress sequences to DUM/da notation for human inspection
No new dependencies required. All infrastructure is in place.
Output Type
BeatResult:
units: list[BeatUnit] # per-unit analysis
- text: str
- stress_sequence: list[int]
- syllabic_weight: int
- beat_string: str # e.g. "da-DUM-da"
weight_sequence: list[int] # across units
weight_cv: float # isocolon indicator
beat_shape: str # "isocolon" | "climactic" | "anti-climactic" | "irregular"
cmu_coverage: float # fraction of words found in CMU dict
Blockers
Blocks Feature: Tricolon Detector #73 (Tricolon Detector) — tricolon candidate validation requires beat shape to distinguish strong / weak / rejected candidates and to detect isocolon vs. climactic patterns as an AI-tell discriminator
No other blockers — can be built now against existing infrastructure
Overview
Beat in prose is the perceived rhythmic pulse when reading aloud — the pattern of stressed and unstressed syllables across a phrase or sentence. This is distinct from aggregate stress statistics (which we already have) and refers specifically to sequential stress shape across parallel units.
This is a standalone prosodic feature and also a hard blocker for tricolon validation (#73).
What We Already Have
prosody/rhythm_prosody.pycontains substantial foundation:_get_stress_pattern(word)[0, 1, 2]from CMU dict_compute_stress_entropy(words)_compute_metrical_feet(words)_compute_rhythmic_regularity(syllable_counts)word_stress_patterns(metadata)The CMU wrapper in
prosody/pronouncing.pygives usphones_for_word()returning ARPAbet strings with stress digits (0 = unstressed, 1 = primary, 2 = secondary). All the raw materials exist.The gap: everything above produces aggregate statistics across a full document. Beat detection requires sequential stress analysis at the phrase or member level — not "what is the overall iambic ratio" but "given this specific sequence of words or phrases, what is its stress shape, and how does it compare to adjacent members."
What Needs to Be Built
A
compute_beat(units: list[str]) -> BeatResultfunction that takes a list of text units (words, phrases, or sentences) and returns:Per-unit analysis
[1, 0, 1, 0]for each unit"DUM-da","da-DUM","da-da-DUM"Cross-unit analysis (the key new capability)
[2, 2, 4]"da-da | da-da | da-da""da | da | da-da-DUM"Why this matters for AI detection
LLMs default to isocolon — they generate parallel units of roughly equal syllabic weight because it's the safest/most-learned pattern. Human writers, especially skilled ones, tend toward climactic structure — the third member carries more weight as the rhetorical culmination. This asymmetry is a discriminating signal within tricolon, not just in its detection.
Implementation Path
_get_stress_pattern(word)fromrhythm_prosody.py— already does the CMU lookupDUM/danotation for human inspectionNo new dependencies required. All infrastructure is in place.
Output Type
Blockers
Related
prosody/rhythm_prosody.py— existing stress infrastructureprosody/pronouncing.py— CMU dict wrapper