Skip to content

Conversation

@nuwang
Copy link
Member

@nuwang nuwang commented Nov 2, 2025

This PR:

  1. Removes code that relies on defunct custos services (custos auth provider and the custos vault)
  2. Enhances the generic PSA oidc provider to support PKCE
  3. Reimplements keycloak as an extension of this generic PSA provider
  4. Reimplements cilogon as an extension of this generic PSA provider
  5. Adds additional tests
  6. Migration scripts for CustosAuthnzToken model so that users do no need to re-associate accounts + tests

supercedes: #21090
closes: #20789

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@nuwang
Copy link
Member Author

nuwang commented Nov 3, 2025

@jdavcs I made a commit which adds the migration scripts + tests but I'm not sure I got the alembic migration process right - would be great if you could check: 7f1a254

(I ran python scripts/run_alembic.py revision -m "migrate custos to psa tokens" --head=gxy@head)

@jdavcs jdavcs self-requested a review November 3, 2025 17:35
@nuwang nuwang force-pushed the combine_psa_keycloak branch from fcf60d1 to fcf9637 Compare November 3, 2025 18:19
@nuwang
Copy link
Member Author

nuwang commented Nov 3, 2025

@jdavcs I changed the down_revision and rebased on dev, which seems to have solved the migration problem, but please let me know whether that's ok

@nuwang nuwang force-pushed the combine_psa_keycloak branch from fcf9637 to 870ad98 Compare November 3, 2025 18:50
@jdavcs
Copy link
Member

jdavcs commented Nov 4, 2025

@nuwang Sorry for the delayed reply!

In my opinion, the migration should be a standalone script. We've called a data migration script from a db revision (aka db schema migration) a couple of times in the past. I didn't think that was the right approach in both cases. However, in those cases the data migration was tightly coupled with the schema changes, and in the end we agreed it was the safer/more straightforward thing to do. But here, we have just the data migration with no change to the db schema. Alembic (or SQLAlchemy) documentation does not offer a definitive best practice on how to do data migrations, but there are strong arguments against doing it as a migration (in the docs https://alembic.sqlalchemy.org/en/latest/cookbook.html#data-migrations-general-techniques and in this discussion: sqlalchemy/alembic#972 (reply in thread)).

A standalone script, on the other hand, is straightforward. Here's one example: #18079. (Also, here's an example of a test for such a scenario: https://github.com/galaxyproject/galaxy/pull/18079/files#diff-8bd6ad37d9a5dd34e50119ed4467325c4c5a145629d41926c484c4c1efe9dc27 )

@nuwang
Copy link
Member Author

nuwang commented Nov 5, 2025

Thanks for the feedback @jdavcs. There is a schema change here - the CustosAuthnzToken table has been dropped. I just didn't do it in the migration itself in-order to be as non-destructive as possible. Do you think that changes anything or should we go ahead with a separate data migration script? I'm happy to do it either way, but without an automatic migration, all oidc/cilogon accounts will become dissociated, and users will need to reassociate their accounts. How would that situation be handled with a data migration script? Would that be part of the release notes or something?

@jdavcs
Copy link
Member

jdavcs commented Nov 5, 2025

There is a schema change here - the CustosAuthnzToken table has been dropped. I just didn't do it in the migration itself in-order to be as non-destructive as possible.

Definitely should be in the migration. Extra tables in the database that are not in the model will cause problems (here's a summary #20809, and here's one example of the problems that may cause #20614). There are galaxy utility scripts for that (create_table, drop_table). Also, you'd need to change the model definition: dropping the table must go together with dropping the model (CustosAuthnzToken), and, of course, all references to that model (and there are tests that will fail if the model does not mirror the db schema state as per migrations).

How would that situation be handled with a data migration script? Would that be part of the release notes or something?

Sorry, I was going to mention it in my previous comment. If we had a separate data migration script, we'd definitely mention it (very prominently) in the admin notes section of the release notes.

That said, since a table is dropped, having the data migration logic referenced from the migration script makes sense here (otherwise, to preserve the data, admins would be required to run the script before the database upgrade - which, of course, is a recipe for disaster). We try not to put the actual script in the migration (to keep it clean and testable); instead you can place it in the data_fixes directory. Here's an example.

I hope I'm not missing anything (like any requirement that makes this a special case, etc.)

@nuwang
Copy link
Member Author

nuwang commented Nov 6, 2025

Thanks @jdavcs. I've moved the migration logic to data_fixes as you suggested, dropped the CustosAuthnzToken table from the model and adjusted the upgrade/downgrade script to drop/restore the table and migrate/restore the data.

@jdavcs
Copy link
Member

jdavcs commented Nov 6, 2025

@nuwang thank you for addressing all the comments! I'll run some more tests tomorrow (I was able to break this once - so now I want to make sure my edge cases were reasonable). I suppose one potential concern is this: let's say a token is corrupted: how should the script behave? Currently, the error prevents the migration from proceeding. We could leave it as is (but maybe handle it more gracefully and print a helpful message), or we could skip the problem record. I think that depends, in part, on whom we want to accommodate more - the admin or the users.

EDIT: No, my edge case was not reasonable: statement execution broke due to invalid json, which is not going to happen with these field types, managed by SQLAlchemy.

@bgruening
Copy link
Member

@nuwang is it possible with this PR to have multiple keycloak providers?

@nuwang
Copy link
Member Author

nuwang commented Nov 7, 2025

@nuwang is it possible with this PR to have multiple keycloak providers?

No, I tried to avoid making additional changes, just so the PR doesn't balloon too much, but that item is definitely on the radar. Maybe better done in a follow up PR?

@nuwang nuwang force-pushed the combine_psa_keycloak branch from 32dc1c6 to a16d1d0 Compare November 12, 2025 10:18
@nuwang
Copy link
Member Author

nuwang commented Nov 12, 2025

I've tested upgrades and downgrades as follows:

  1. Use latest dev branch
  2. Setup a PSA idp for Google auth
  3. Setup a Custos idp for Keycloak running locally
  4. Verify that login via Google and Keycloak work
  5. Switch to combine_psa_keycloak branch
  6. run sh manage_db.sh upgrade
  7. Verify that table content is migrated correctly by inspecting db (custos table dropped, and data migrated to psa)
  8. Run Galaxy and verify that Google and Keycloak continue to work (that is, logging in again with a keycloak account does not reassociate the account, and instead, reuses the existing account)

@nuwang nuwang force-pushed the combine_psa_keycloak branch from 8028c54 to 9fba397 Compare November 12, 2025 17:59
@nuwang nuwang force-pushed the combine_psa_keycloak branch from 9fba397 to 2eef9d4 Compare November 12, 2025 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate Keycloak integration to PSA

4 participants