Skip to content

Single-word lines do not form events #177

Description

@kuchenrolle

What you were trying to do

Turn a one-sentence-per-line corpus into events, where some lines are only one word (headings).

What actually happened

The single-word lines do not form events.

How to reproduce

The problem is in the generation of occurrences. This is what happens there:

occurrences = list()
words = ["test"]
before = 2
after = 1
for ii, word in enumerate(words):
    # words before the word to a maximum of before
    cues = words[max(0, ii - before):ii]
    # words after the word to a maximum of before
    cues.extend(words[(ii + 1):min(len(words), ii + 1 + after)])
    # append (cues, outcomes)
    occurrences.append(("_".join(cues), word))

The loop has only one iteration, in which the word is the outcome without any cues, so it is dropped.

This is not necessarily a bug, but it was at least surprising to me. I would expect (want) that the word is treated as a cue that predicts no outcome. I only noticed this randomly, when a cue was dropped, because it occurred exclusively in those circumstances. I haven't checked, but I'm assuming this holds for context structures other than "line" as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions