Skip to content

Conversation

kriswest
Copy link
Contributor

@kriswest kriswest commented Jun 3, 2025

resolves #950
resolves #511
resolves #66
resolves #1107
resolves #1028

Refactor (api, proxy & UI) to remove the assumption of GitHub as the git repository host and the use of the repository name field as the id of the repository (as this prevents git-proxy instances from supporting multiple forks of a project or projects from multiple hosts with the same name).

This PR:

  • Replaces the use of the repo name field in the API with the _id field generated by the database adaptors,
    • Using the repository URL as a key does not work well with express routing, but _id does in both mongo and neDb
    • allows names to be repeated (multiple forks or clashing names from different organisations/repository hosts)
    • UI and CLI were updated accordingly
  • Replaces the use of organisation/repoName.git in the proxy URLs with the repository url
  • Disables GitHub specific functionality in the UI if the host is not Github
  • Completes application of Typescript to the database classes
    • Duplicated code reduced
    • A number of minor differences in behaviour (particularly return types) between the DB adaptors were resolved
    • Does NOT refactor all usages of the DB client to use typescript (still many requires to eliminate)
  • Deprecates and ignores the config property proxyUrl as the proxied host(s) are now determined from the configured repositories
  • Expands the tests for proxy routes and the Repo route of the API

To Do:

  • Annotate PR for review
  • Check test coverage
  • Implement additional tests for the proxy and fallback
    • implement tests for new proxy URLs for github.com
    • implement tests for fallback with legacy proxy urls for github.com
    • implement tests for gitlab.com
    • implement tests for non-github/non-gitlab repo
    • implement tests for multiple forks
  • Add support for GitLab API where repo is hosted at GitLab

(contributed as part of a GitLab CoCreate collaboration with help from @StingRayZA)

Copy link

netlify bot commented Jun 3, 2025

Deploy Preview for endearing-brigadeiros-63f9d0 canceled.

Name Link
🔨 Latest commit cb75f82
🔍 Latest deploy log https://app.netlify.com/projects/endearing-brigadeiros-63f9d0/deploys/68a56a1a0534f800089a85da

Copy link

codecov bot commented Jun 3, 2025

Codecov Report

❌ Patch coverage is 83.72093% with 98 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.81%. Comparing base (de10c80) to head (cb75f82).
⚠️ Report is 99 commits behind head on main.

Files with missing lines Patch % Lines
src/db/mongo/repo.ts 37.14% 22 Missing ⚠️
src/service/routes/repo.js 85.04% 16 Missing ⚠️
src/db/mongo/users.ts 37.50% 15 Missing ⚠️
src/db/file/users.ts 65.21% 8 Missing ⚠️
src/proxy/index.ts 80.95% 8 Missing ⚠️
src/db/file/repo.ts 86.36% 3 Missing and 3 partials ⚠️
src/db/index.ts 93.40% 4 Missing and 2 partials ⚠️
src/proxy/routes/index.ts 92.94% 6 Missing ⚠️
src/db/mongo/pushes.ts 60.00% 4 Missing ⚠️
src/proxy/routes/helper.ts 94.33% 3 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1043      +/-   ##
==========================================
- Coverage   83.29%   82.81%   -0.49%     
==========================================
  Files          59       66       +7     
  Lines        2449     2787     +338     
  Branches      280      335      +55     
==========================================
+ Hits         2040     2308     +268     
- Misses        365      432      +67     
- Partials       44       47       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@sam-holmes2 sam-holmes2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after an initial scan through :) thanks for your contribution!

@kriswest
Copy link
Contributor Author

kriswest commented Jun 5, 2025

Picked up a couple of test failures after merging main - will resolve (and start working on the additional tests needed).

Copy link
Contributor

@jescalada jescalada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the approval/rejection flows with a pre-existing repo, and things work well!

There is an issue with backwards compatibility with older, invalid databases from previous versions of GitProxy (unique URL enforcement with repos). This may also cause issues with the other files (pushes, users).

I also tested the Add Repo flow which caused my server to crash, maybe because of something wrong on my end (invalid input maybe?).

@kriswest

This comment was marked as resolved.

@jescalada
Copy link
Contributor

I can release the unique constraint on the index to avoid this - however, I put it in to catch invalid data as, like the use of the repository name as an ID, it will result in the selection of the wrong repo project at times (although in this case the multiple records would have to be the same repo) and I thought it better for tests to fail etc. if data wasn't cleaned up from a previous run. Where do you think we should go with this @jescalada - an automated migration seems difficult as you'd just have to delete or programmatically edit one of the duplicate records...

I think catching and displaying a simple error message with the invalid entry/entries could be enough - so that the GitProxy administrator can quickly identify the issue and fix it manually. Thankfully, the error seems to occur on backend (db) startup, so end users wouldn't really have the app suddenly blowing up.

kriswest and others added 20 commits July 3, 2025 10:17
Typescript wasn't working on the DB classes due to their dependency imports with require.
@kriswest
Copy link
Contributor Author

kriswest commented Aug 7, 2025

Conflicts resolved and ready for a another look. I haven't had a chance to test it yet, but the tests are all passing.

@kriswest
Copy link
Contributor Author

kriswest commented Aug 7, 2025

I'm aware I haven't done anything in the documentation regarding this PR. That should probably be reviewed and work to add to the docs undertaken under a new issue.

Copy link
Contributor

@jescalada jescalada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I have a few comments - hope we can tackle these and get this PR ready to merge soon. 🚀

A few more things I was wondering:

What exactly are the "breaking" changes, and what are the steps an organization must follow to upgrade GitProxy to v2? I have a feeling that some of the issues I encountered might have been due to "bad data" - something that could be updated with a script to avoid errors in v2.

So two important action points for the v2 release:

  • Documenting the breaking changes for both #973 and this PR
  • Ideally automating the migration process for v1 -> v2 databases so GitProxy administrators don't need to do anything (and potentially mess up the upgrade process)
    • If automating is not plausible, we should document likely problems and their solutions (for example: normalizing .git ending on repo URLs to prevent frontend display bugs)

@jescalada
Copy link
Contributor

Looks good! Just a few comments on the failing tests due to pre-existing GitLab origin (as well as other origins). It'd be fantastic if we could fix those to be agnostic of the data in the database.

@kriswest
Copy link
Contributor Author

Making the tests agnostic of the data would be good, but I haven't figured out how yet - the requests that go out are affected the response from the other end, so we'd probably need to mock something in the proxy, perhaps the URL it would forward a request on to, and then check it had been called and returned the right value?

@jescalada
Copy link
Contributor

@kriswest I've taken another look at the tests, and since the failing ones are "end to end" if anything, it'd be harder than I thought to mock out the database dependency (and it only really makes sense for unit/function tests).

As long as the reason for failure is obvious for contributors, that should be enough for now so we can speed up the release. We can make the tests more robust in another issue (#978 and #1143 are related to this).

@kyet
Copy link

kyet commented Aug 14, 2025

Hi, @kriswest. I found culprit of the No body found problem!

I dumped the packets, I found that my git client was actually sending the contents.

The pattern matching fails in the code below, and therefore the req.body is not generated.

proxy/routes/index.ts

const isPackPost = (req: Request) =>
  req.method === 'POST' &&
  // eslint-disable-next-line no-useless-escape
  /^\/[^\/]+\/[^\/]+\.git\/(?:git-upload-pack|git-receive-pack)$/.test(req.url);

const teeAndValidate = async (req: Request, res: Response, next: NextFunction) => {
  if (!isPackPost(req)) return next();
  ..
	try {
	  ..
		(req as any).body = buf;

In my case, req.url is /git.mygitlab.com/my.name/git-proxy.git/git-receive-pack (sanitized real name).

Upon reviewing the regex, I noticed that there is a change history in the commit below.

29e7d2a

You may need to revert or modify the regex changes.

I would like to share one more thing related to this issue. As I mentioned earlier, our company has a firewall issue that prevents us from pushing to github.com, so I tested it at home. The test environment is different, but when I set up a github.com proxy like this, proxy fails, but proxy2 succeeds.

[remote "proxy"]
        url = http://localhost:8000/github.com/kyet/git-proxy.git
        fetch = +refs/heads/*:refs/remotes/proxy/*
[remote "proxy2"]
        url = http://localhost:8000/kyet/git-proxy.git
        fetch = +refs/heads/*:refs/remotes/proxy2/*

If I think about it in relation to the above issue, when falling back to the default proxy, I assume that req.url probably doesn't contain github.com (e.g., /kyet/git-proxy.git/git-receive-pack)

@kriswest
Copy link
Contributor Author

@kyet many thanks for this - I think you're right. The regex is inflexible on the number of path components. Replacing it with:
^(?>\/[^\/]+)*\/[^\/]+\.git\/(?>git-upload-pack|git-receive-pack)$ should resolve that and be much more flexible - it will be needed to support multiple levels of groups and sub-groups in GitLab.

@fabiovincenzi I think I wrote the original regex, but its your commit adding it. Do you concur on updating? Was there another reason for the current regex on isPackPost that I'm missing? I'm happy to fix in an update to this PR.

@jescalada
Copy link
Contributor

@kriswest Is this ready to merge in? If we need to modify anything else, we can perhaps do it in a separate PR so that we can unblock the other PRs that are waiting to solve merge conflicts.

@fabiovincenzi If you think this is a one-liner, then feel free to fix it directly, but I'm keen on merging this ASAP to get the rc.2 prerelease out and dust off our old PRs!

@fabiovincenzi
Copy link
Contributor

@kriswest you're absolutely right - the regex needs to handle variable path depths. Your solution will properly support GitLab's multi-level group structure instead of being locked to exactly 2 segments.

@kriswest
Copy link
Contributor Author

kriswest commented Aug 19, 2025

@jescalada almost - apologies for the delay! I've just resolved some conflicts and bad merge with the fuzzing changes - and raised an issue for another of those tests that can fail randomly (occasionally randomly produces a near valid email address).

I've fixed the isPackPost issue just now - it did need fixing, that was introduced by merging main and would have prevented working with GitLab dedicated instances.

I think there may be one more test to look at - I don't like seeing the fallback URLs process repos they shouldn't and make requests onto Github.

processing request URL: '/gitlab.com/gitlab-community/meta.git/info/refs?service=git-upload-pack'
proxy keys registered:  ["/github.com/"]
        using fallback
Action processed: Allowed
    Request URL: /gitlab.com/gitlab-community/meta.git/info/refs?service=git-upload-pack
    Host:        127.0.0.1:61485
    User-Agent:  git/2.42.0
Request resolved to https://github.com/gitlab.com/gitlab-community/meta.git/info/refs?service=git-upload-pack

I believe we're passing the test as github returns a 404 itself. We could ignore that, merge and fix under a different issue - but I'm not happy about it TBH

@kriswest
Copy link
Contributor Author

@jescalada regarding debugging the above, I've determined that CheckRepoInAuthList is not running on that request but should as its the only item in the default pull action chain - and frankly I think that should always be run - where we have a test that confirms nothing is run for made up action types such as foo...

I've not managed to complete debugging this today, but will have a look again tomorrow. Would appreciate your opinion on it as well.

I think we are otherwise ready to go.

@jescalada
Copy link
Contributor

@kriswest From what I see in the getRouter function, deleting the gitlab.com host (therefore not available when calling getAllProxiedHosts) results in the logic here automatically resolving the host to github.com instead.

I suppose we could refactor the getRequestPathResolver so that it factors in the URLs domain even if it's not registered? Rather than always prepending the GitHub domain.

const getRouter = async () => {
  // eslint-disable-next-line new-cap
  const router = Router();
  router.use(teeAndValidate);

  const originsToProxy = await getAllProxiedHosts();
  const proxyKeys: string[] = [];
  const proxies: RequestHandler[] = [];

  console.log(`Initializing proxy router for origins: '${JSON.stringify(originsToProxy)}'`);

  // we need to wrap multiple proxy middlewares in a custom middleware as middlewares
  // with path are processed in descending path order (/ then /github.com etc.) and
  // we want the fallback proxy to go last.
  originsToProxy.forEach((origin) => {
    console.log(`\tsetting up origin: '${origin}'`);

    proxyKeys.push(`/${origin}/`);
    proxies.push(
      proxy('https://' + origin, {
        parseReqBody: false,
        preserveHostHdr: false,
        filter: proxyFilter,
        proxyReqPathResolver: getRequestPathResolver('https://'), // no need to add host as it's in the URL
        proxyReqOptDecorator: proxyReqOptDecorator,
        proxyReqBodyDecorator: proxyReqBodyDecorator,
        proxyErrorHandler: proxyErrorHandler,
      }),
    );
  });

  console.log('\tsetting up catch-all route (github.com) for backwards compatibility');
  const fallbackProxy: RequestHandler = proxy('https://github.com', {
    parseReqBody: false,
    preserveHostHdr: false,
    filter: proxyFilter,
    proxyReqPathResolver: getRequestPathResolver('https://github.com'),
    proxyReqOptDecorator: proxyReqOptDecorator,
    proxyReqBodyDecorator: proxyReqBodyDecorator,
    proxyErrorHandler: proxyErrorHandler,
  });

  console.log('proxy keys registered: ', JSON.stringify(proxyKeys));

  router.use('/', (req, res, next) => {
    console.log(`processing request URL: '${req.url}'`);
    console.log('proxy keys registered: ', JSON.stringify(proxyKeys));

    for (let i = 0; i < proxyKeys.length; i++) {
      if (req.url.startsWith(proxyKeys[i])) {
        console.log(`\tusing proxy ${proxyKeys[i]}`);
        return proxies[i](req, res, next);
      }
    }
    // fallback
    console.log(`\tusing fallback`);
    return fallbackProxy(req, res, next);
  });
  return router;
};

@jescalada jescalada merged commit a475fee into finos:main Aug 20, 2025
14 checks passed
@kriswest
Copy link
Contributor Author

@jescalada yes it'll fall back to the fallback... Which isn't much different from the original setup that assumed GitHub. However, the filter function should still apply and check that the repository actually exists in our data, which is how I was expecting this to be controlled.. I don't think that it was this PR that introduced an issue with the pull chain (although I didn't confirm on main before this merged). Regardless, it's a problem for us as it means any repo at GitHub could be fetched or pulled through the proxy, without the auth check applying.

Clearly we need a test that involves a repo that is not in the data but does exist in GitHub. I'll knock something up this morning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants