Skip to content

Cross-contamination between github and gitlab columns in contributors table #3469

@MoralCode

Description

@MoralCode

im seeing lots of gh_* prefixed columns being used for gitlab data too...

Originally posted by @MoralCode in #3435 (comment)

While looking into this migration issue, I noticed that there are duplicate values in our database for the gh_login and cntrb_login fields in our db. Looking at a full row from one dupicate user, its clear that this data came from two different forges (one github, one gitlab).

There seems to be several issues here:

  • cntrb_login and gh_login may both be populated together by both the github events task and the gitlab issue comments task.
  • for the gitlab user, gl_username is NULL
  • gh_avatar_url has a non-null value for the gitlab user (seems incorrect given the table name) while the gl_avatar_url field is NULL
  • same thing for gh_url
  • gh_user_id has a value for the gitlab user, while gl_id is NULL

Further, it seems like the gitlab username and other contributor data is largely being populated in the columns that have gh_* prefixes. This means that our Database models, which are seemingly written to enforce username uniqueness for gh and GL usernames separately, are not able to actually do so (and may not even be currently being constrained)

Metadata

Metadata

Assignees

No one assigned

    Labels

    databaseRelated to Augur's unifed data model

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions