Skip to content

Why are existing candidates dropped in find_coref? #13

@peblair

Description

@peblair

Hello,

I am trying to understand the with_coref and find_coref functions in the dataset loader. Roughly speaking, it appears that the goal of find_coref is to do the following (in pseudo-code):

find_coref(cur_m) :=
for each mention m in the same document as cur_m:
  if m's mention text starts or ends with the same text as cur_m BUT not equal to cur_m:
    add all of m's candidates to the result list (removing duplicates)
return the collected candidates

The results of find_coref are then used to overwrite cur_m's candidate list. This is a bit confusing to me, though, since the BUT ... above means that the candidates which were previously inside of cur_m's candidate list are lost (or at least potentially lost). Is this intentional? If so, can you explain what with_coref is intended to accomplish?

For example, on a local modification of this repository, I found that the gold entity (Teresa) is dropped from the list of candidates (I've verified in the AIDA train CSV [line 2426] that this is indeed the correct gold entity for this mention):

RuntimeError: Failed to find gold_key 'Teresa' in list: [(0, ('Mother_Teresa', 1.0)), (1, ('Mother_Teresa_High_School', 0.001)), (2, ('The_Missionary_Position', 0.001)), (3, ('Blessed_Mother_Teresa_Catholic_Secondary_School', 0.0))]
orig list: [['Teresa', 0.364], ['Teresa_(Barbie)', 0.138], ['Teresa,_Rizal', 0.115], ['Teresa_Nielsen_Hayden', 0.103], ['Teresa_of_Ávila', 0.092], ['Teresa_Heinz', 0.038], ['Teresa,_Castellón', 0.031], ['Teresa,_Greater_Poland_Voivodeship', 0.029], ['Mother_Teresa', 0.026], ['Teresa_Scanlan', 0.021], ['Teresa_Teng', 0.018], ['Theresa,_Countess_of_Portugal', 0.018], ['George_McGovern', 0.015], ['Teresa_Crippen', 0.013], ['Teresa_Palmer', 0.012], ['Teresa_Cristina_of_the_Two_Sicilies', 0.01], ['Teresa_Earnhardt', 0.01], ['Teresa_Wynn_Roseborough', 0.009], ['Teresa_(2010_telenovela)', 0.009], ['The_Real_Housewives_of_New_Jersey', 0.008], ['Teresa_(film)', 0.008], ['Teresa_Jungman', 0.008], ['Teresa_Bagioli_Sickles', 0.007], ['Teresa_Fernández_de_Traba', 0.007], ['Teresa_Bryant', 0.007], ['Teresa,_Contessa_Guiccioli', 0.007], ['Teresa_Strasser', 0.006], ['Teresa_Vaill', 0.006], ['Teresa_Mak', 0.006], ['Teresa_Murphy', 0.006], ['Teresa_Cheung_(actress)', 0.006], ['Teresa_Rivera', 0.006], ['Teresa_Nzola_Meso_Ba', 0.006], ['Tracy_Bond', 0.006], ['Teresa_Medina', 0.006], ['Infanta_Maria_Teresa_of_Spain', 0.006], ['Teresa_Seiblitz', 0.006], ['Teresa_Forcier', 0.006], ['Teresa_Taylor', 0.006], ['Teresa_Motos', 0.006], ['Teresa_Piotrowska', 0.006], ['Teresa_Ferster_Glazier', 0.006], ['Teresa_Fedor', 0.006], ['Teresa_Ganzel', 0.006], ['Teresa_Portela_(Portuguese_canoeist)', 0.006], ['Teresa_de_la_Parra', 0.006], ['Teresa_Piccini', 0.006], ['Teresa_Borawska', 0.006], ['Princess_Maria_Teresa_of_Savoy', 0.006], ['Teresa_Roncon', 0.006], ['Teresa_Wentzler', 0.006], ['Teresa_Machado', 0.006], ['Teresa_Magbanua', 0.006], ['Teresa_del_Po', 0.006], ['Teresa_Sapieha', 0.006], ['Teresa_Edwards', 0.006], ['Teresa_A._Dolan', 0.006], ['Teresa_Hurtado_de_Ory', 0.006], ['Teresa_De_Sio', 0.006], ['Teresa_Hsu_Chih', 0.006], ['Lady_Teresa_Waugh', 0.006], ['Teresa_Lourenco', 0.006], ['Teresa_Lubomirska', 0.006], ['Teresio_Maria_Languasco', 0.006], ['Teresa_Woo-Paw', 0.006], ['Teresa_de_Cartagena', 0.006], ['Teresa_Bernabe', 0.006], ['Teresa_Amabile', 0.006], ['Maria_Teresa,_Princess_of_Beira', 0.006], ['Teresa_Korwin_Gosiewska', 0.006], ['Teresa_Bright', 0.006], ['Teresa_Daly', 0.006], ['Teresa_Villaverde', 0.006], ['Teresa_Stich-Randall', 0.006], ['Teresa_Polias', 0.006], ['Teresa_Wong', 0.006], ['Teresa_Pavlinek', 0.006], ['Teresa_Ruiz_(politician)', 0.006], ['Teresa_Cooper', 0.006], ['Teresa_Carr_Deni', 0.006], ['Teresa_P._Pica', 0.006], ['Teresa_S._Polley', 0.006], ['Teresa_Stratas', 0.006], ['Teresa_Lipowska', 0.006], ['Teresa_Carpio', 0.006], ['Teresa_Stolz', 0.006], ['Teresa_Wilson', 0.006], ['Teresa_Lalor', 0.006], ['Teresa_Hannigan', 0.006], ['Teresa_Chodkiewicz', 0.006], ['Teresa_Lisbon', 0.006], ['Teresa_Forn', 0.006], ['Teresa_Gutierrez', 0.006], ['Teresa_Maxwell-Conover', 0.006], ['Teresa_Ann_Savoy', 0.006], ['Teresa_Trull', 0.006], ['Teresa_Forcades', 0.006], ['Teresa_Lynch', 0.006], ['Teresa_Furtado', 0.006], ['Teresa_Southwick', 0.006]]

Any help on understanding this would be very useful. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions