Skip to content

audio alignment anchors #208

@matyaskopp

Description

@matyaskopp

problem

Currently, the audio alignment follows this structure:

<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w1.ab"/>
<w xml:id="ps2013-001-01-000-999.u1.p1.s1.w1" lemma="vážený" pos="ADJ" msd="UPosTag=ADJ|Animacy=Anim|Case=Voc|Degree=Pos|Gender=Masc|Number=Sing|Polarity=Pos|VerbForm=Part|Voice=Pass" ana="pdt:AAFS5----1A----">Vážení</w>
<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w1.ae"/> 

<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w2.ab"/>
<w xml:id="ps2013-001-01-000-999.u1.p1.s1.w2" lemma="paní" pos="NOUN" msd="UPosTag=NOUN|Case=Voc|Gender=Fem|Number=Sing|Polarity=Pos" ana="pdt:NNFS5-----A----">paní</w>
<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w2.ae"/> 

<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w3.ab"/>
<w xml:id="ps2013-001-01-000-999.u1.p1.s1.w3" lemma="poslankyně" pos="NOUN" msd="UPosTag=NOUN|Case=Voc|Gender=Fem|Number=Sing|Polarity=Pos" ana="pdt:NNFS5-----A----" join="right">poslankyně</w>
<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w3.ae"/>

<pc xml:id="ps2013-001-01-000-999.u1.p1.s1.w4" lemma="," pos="PUNCT" msd="UPosTag=PUNCT" ana="pdt:Z:-------------">,</pc> 

<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w5.ab"/>
<w xml:id="ps2013-001-01-000-999.u1.p1.s1.w5" lemma="vážený" pos="ADJ" msd="UPosTag=ADJ|Animacy=Anim|Case=Nom|Degree=Pos|Gender=Masc|Number=Plur|Polarity=Pos|VerbForm=Part|Voice=Pass" ana="pdt:AAMP5----1A----">vážení</w>
<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w5.ae"/> 

<!-- ... -->

Every aligned token is wrapped with two anchors :

  • w/preceding-sibling::anchor[1][ends-with(@synch,'b')]
  • w/following-sibling::anchor[1][ends-with(@synch,'e')]

This is not very good because it expects specific suffixes in @synch and also the adjected placement.

solution

So, the proposal is to add a @corresp attribute to the anchor that would point to the corresponding token:

<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w1.ab" corresp="ps2013-001-01-000-999.u1.p1.s1.w1"/>
<w xml:id="ps2013-001-01-000-999.u1.p1.s1.w1" lemma="vážený" pos="ADJ" msd="UPosTag=ADJ|Animacy=Anim|Case=Voc|Degree=Pos|Gender=Masc|Number=Sing|Polarity=Pos|VerbForm=Part|Voice=Pass" ana="pdt:AAFS5----1A----">Vážení</w>
<anchor synch="#ps2013-001-01-000-999.u1.p1.s1.w1.ae" corresp="ps2013-001-01-000-999.u1.p1.s1.w1"/> 

Notes:

  • TEITOK conversion should be fixed accordingly

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions