-
Notifications
You must be signed in to change notification settings - Fork 140
feat(key on repo url): support git hosts other than GitHub + multiple forks #1043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for endearing-brigadeiros-63f9d0 canceled.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1043 +/- ##
==========================================
- Coverage 83.29% 82.81% -0.49%
==========================================
Files 59 66 +7
Lines 2449 2787 +338
Branches 280 335 +55
==========================================
+ Hits 2040 2308 +268
- Misses 365 432 +67
- Partials 44 47 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after an initial scan through :) thanks for your contribution!
Picked up a couple of test failures after merging main - will resolve (and start working on the additional tests needed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the approval/rejection flows with a pre-existing repo, and things work well!
There is an issue with backwards compatibility with older, invalid databases from previous versions of GitProxy (unique URL enforcement with repos). This may also cause issues with the other files (pushes, users).
I also tested the Add Repo flow which caused my server to crash, maybe because of something wrong on my end (invalid input maybe?).
This comment was marked as resolved.
This comment was marked as resolved.
I think catching and displaying a simple error message with the invalid entry/entries could be enough - so that the GitProxy administrator can quickly identify the issue and fix it manually. Thankfully, the error seems to occur on backend (db) startup, so end users wouldn't really have the app suddenly blowing up. |
Signed-off-by: Raimund Hook <[email protected]>
…que as its our new primary key
Signed-off-by: Raimund Hook <[email protected]>
…te non-specific db fns
Signed-off-by: Raimund Hook <[email protected]>
Typescript wasn't working on the DB classes due to their dependency imports with require.
…making github fns optional
…file-based DB implementation
…file-based DB implementation
Conflicts resolved and ready for a another look. I haven't had a chance to test it yet, but the tests are all passing. |
I'm aware I haven't done anything in the documentation regarding this PR. That should probably be reviewed and work to add to the docs undertaken under a new issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I have a few comments - hope we can tackle these and get this PR ready to merge soon. 🚀
A few more things I was wondering:
What exactly are the "breaking" changes, and what are the steps an organization must follow to upgrade GitProxy to v2? I have a feeling that some of the issues I encountered might have been due to "bad data" - something that could be updated with a script to avoid errors in v2.
So two important action points for the v2 release:
- Documenting the breaking changes for both #973 and this PR
- Ideally automating the migration process for v1 -> v2 databases so GitProxy administrators don't need to do anything (and potentially mess up the upgrade process)
- If automating is not plausible, we should document likely problems and their solutions (for example: normalizing .git ending on repo URLs to prevent frontend display bugs)
Looks good! Just a few comments on the failing tests due to pre-existing GitLab origin (as well as other origins). It'd be fantastic if we could fix those to be agnostic of the data in the database. |
Making the tests agnostic of the data would be good, but I haven't figured out how yet - the requests that go out are affected the response from the other end, so we'd probably need to mock something in the proxy, perhaps the URL it would forward a request on to, and then check it had been called and returned the right value? |
@kriswest I've taken another look at the tests, and since the failing ones are "end to end" if anything, it'd be harder than I thought to mock out the database dependency (and it only really makes sense for unit/function tests). As long as the reason for failure is obvious for contributors, that should be enough for now so we can speed up the release. We can make the tests more robust in another issue (#978 and #1143 are related to this). |
Hi, @kriswest. I found culprit of the I dumped the packets, I found that my git client was actually sending the contents. The pattern matching fails in the code below, and therefore the
const isPackPost = (req: Request) =>
req.method === 'POST' &&
// eslint-disable-next-line no-useless-escape
/^\/[^\/]+\/[^\/]+\.git\/(?:git-upload-pack|git-receive-pack)$/.test(req.url);
const teeAndValidate = async (req: Request, res: Response, next: NextFunction) => {
if (!isPackPost(req)) return next();
..
try {
..
(req as any).body = buf; In my case, Upon reviewing the regex, I noticed that there is a change history in the commit below. You may need to revert or modify the regex changes. I would like to share one more thing related to this issue. As I mentioned earlier, our company has a firewall issue that prevents us from pushing to github.com, so I tested it at home. The test environment is different, but when I set up a github.com proxy like this, [remote "proxy"]
url = http://localhost:8000/github.com/kyet/git-proxy.git
fetch = +refs/heads/*:refs/remotes/proxy/*
[remote "proxy2"]
url = http://localhost:8000/kyet/git-proxy.git
fetch = +refs/heads/*:refs/remotes/proxy2/* If I think about it in relation to the above issue, when falling back to the default proxy, I assume that |
@kyet many thanks for this - I think you're right. The regex is inflexible on the number of path components. Replacing it with: @fabiovincenzi I think I wrote the original regex, but its your commit adding it. Do you concur on updating? Was there another reason for the current regex on |
@kriswest Is this ready to merge in? If we need to modify anything else, we can perhaps do it in a separate PR so that we can unblock the other PRs that are waiting to solve merge conflicts. @fabiovincenzi If you think this is a one-liner, then feel free to fix it directly, but I'm keen on merging this ASAP to get the |
@kriswest you're absolutely right - the regex needs to handle variable path depths. Your solution will properly support GitLab's multi-level group structure instead of being locked to exactly 2 segments. |
Signed-off-by: Kris West <[email protected]>
…org/repo.git paths
@jescalada almost - apologies for the delay! I've just resolved some conflicts and bad merge with the fuzzing changes - and raised an issue for another of those tests that can fail randomly (occasionally randomly produces a near valid email address). I've fixed the I think there may be one more test to look at - I don't like seeing the fallback URLs process repos they shouldn't and make requests onto Github.
I believe we're passing the test as github returns a 404 itself. We could ignore that, merge and fix under a different issue - but I'm not happy about it TBH |
@jescalada regarding debugging the above, I've determined that CheckRepoInAuthList is not running on that request but should as its the only item in the default pull action chain - and frankly I think that should always be run - where we have a test that confirms nothing is run for made up action types such as I've not managed to complete debugging this today, but will have a look again tomorrow. Would appreciate your opinion on it as well. I think we are otherwise ready to go. |
@kriswest From what I see in the I suppose we could refactor the const getRouter = async () => {
// eslint-disable-next-line new-cap
const router = Router();
router.use(teeAndValidate);
const originsToProxy = await getAllProxiedHosts();
const proxyKeys: string[] = [];
const proxies: RequestHandler[] = [];
console.log(`Initializing proxy router for origins: '${JSON.stringify(originsToProxy)}'`);
// we need to wrap multiple proxy middlewares in a custom middleware as middlewares
// with path are processed in descending path order (/ then /github.com etc.) and
// we want the fallback proxy to go last.
originsToProxy.forEach((origin) => {
console.log(`\tsetting up origin: '${origin}'`);
proxyKeys.push(`/${origin}/`);
proxies.push(
proxy('https://' + origin, {
parseReqBody: false,
preserveHostHdr: false,
filter: proxyFilter,
proxyReqPathResolver: getRequestPathResolver('https://'), // no need to add host as it's in the URL
proxyReqOptDecorator: proxyReqOptDecorator,
proxyReqBodyDecorator: proxyReqBodyDecorator,
proxyErrorHandler: proxyErrorHandler,
}),
);
});
console.log('\tsetting up catch-all route (github.com) for backwards compatibility');
const fallbackProxy: RequestHandler = proxy('https://github.com', {
parseReqBody: false,
preserveHostHdr: false,
filter: proxyFilter,
proxyReqPathResolver: getRequestPathResolver('https://github.com'),
proxyReqOptDecorator: proxyReqOptDecorator,
proxyReqBodyDecorator: proxyReqBodyDecorator,
proxyErrorHandler: proxyErrorHandler,
});
console.log('proxy keys registered: ', JSON.stringify(proxyKeys));
router.use('/', (req, res, next) => {
console.log(`processing request URL: '${req.url}'`);
console.log('proxy keys registered: ', JSON.stringify(proxyKeys));
for (let i = 0; i < proxyKeys.length; i++) {
if (req.url.startsWith(proxyKeys[i])) {
console.log(`\tusing proxy ${proxyKeys[i]}`);
return proxies[i](req, res, next);
}
}
// fallback
console.log(`\tusing fallback`);
return fallbackProxy(req, res, next);
});
return router;
}; |
@jescalada yes it'll fall back to the fallback... Which isn't much different from the original setup that assumed GitHub. However, the filter function should still apply and check that the repository actually exists in our data, which is how I was expecting this to be controlled.. I don't think that it was this PR that introduced an issue with the pull chain (although I didn't confirm on main before this merged). Regardless, it's a problem for us as it means any repo at GitHub could be fetched or pulled through the proxy, without the auth check applying. Clearly we need a test that involves a repo that is not in the data but does exist in GitHub. I'll knock something up this morning. |
resolves #950
resolves #511
resolves #66
resolves #1107
resolves #1028
Refactor (api, proxy & UI) to remove the assumption of GitHub as the git repository host and the use of the repository
name
field as the id of the repository (as this prevents git-proxy instances from supporting multiple forks of a project or projects from multiple hosts with the same name).This PR:
name
field in the API with the _id field generated by the database adaptors,names
to be repeated (multiple forks or clashing names from different organisations/repository hosts)organisation/repoName.git
in the proxy URLs with the repository urlbecomes
https://myGitProxyInstance.com:8443/github.com/finos/git-proxy.git
To Do:
(contributed as part of a GitLab CoCreate collaboration with help from @StingRayZA)