Skip to content

Repositories should be identified and retrieved via URL rather than name #950

@kriswest

Description

@kriswest

Describe the bug
This issue relates to/overlaps with two past unresolved issues:

Throughout the codebase repositories are assumed to have a unique name property. However, repository names are not globally unique and there are two common cases where they will be duplicated:

  1. Multiple forks of the same project (e.g. within Github, bearing two different user's names as the 'organisation' (issue Provide facility to manage multiple forks of the same upstream repository #66)
  2. Different projects that have the same name, hosted in different platforms.

What we can rely on being unique is the repository URL being pushed to.

In PR #545 where @msagi was looking at this, it is suggested that we move to a platform/org/reponame structure. However, this requires additional processing of the URL and assumes that the platform/org can be extracted from the URL - the org is a GitHub/GitLab concept which doesn't apply to other hosts/base git repos.

On the other hand, a URL is globally unique and already exists within the data model. Switching to it is to identify repositories is easy enough - except where it impacts the URLs used to contribute to those repos through git proxy, e.g.:
git clone http://localhost:8000/finos/git-proxy.git
would need to become:
git clone http://localhost:8000/https://github.com/finos/git-proxy.git

This URL is uglier but easier to understand - the git proxy url has simply been prepended to it.

The challenge would be that you can't automatically migrate existing users with a checkout configured with the eexisting URL as origin (at least AFAIK - github manages repo URL migrations so there may be a way in the git protocol - although I suspect they just maintain some redirects). Hence, to solve this without a breaking change, there may need to be some fallback code to handle older checkouts and/or some form of migration. I'd like to figure out the preferred route forward on that before undertaking work to resolve (as I can see two others took this on and didn't bring it to ground). Options:

  1. Fallback to existing approach: Switch to repo URL for retrieval and if repo not found/URL didn't contain the full url (so no 'http' in the string) attempt lookup by name.
  2. Advise users to migrate: Create a table of the existing repo names at the time of migration. Switch to repo URL for retrieval and if repo not found, check the table and respond with a message advising user to migrate to the new URL.
  3. Issue a breaking change and advise firms updating to inform all users that their repos need to migrated to the new origins.

Expected behaviour

  • Multiple forks of a project can be onboarded, managed and contributed to separately
  • Multiple repos with the same name, but oowned by different orgs or hosted on different platforms can be onboarded, managed and contributed to separately.

Additional context
I may be up for implementing a solution to this issue - perhaps building @msagi's previous attempt in #545. However, i want to agree the approach before taking it on.

We ran across this in initial testing with devs using forks of the same repo for testing. It will come up less often in production use - but it will come up again someday... As the solution is likely to be disruptive or breaking I'd suggest this is sorted sooner rather than later when it will affect more organisations and users... and potentially will need to be address in more places as git proxy itself grows.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions