-
Notifications
You must be signed in to change notification settings - Fork 1.8k
chore(backend): migrate GORM v1 to v2 #12013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(backend): migrate GORM v1 to v2 #12013
Conversation
Hi @kaikaila. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
🚫 This command cannot be processed. Only organization members or owners can use the commands. |
Hi @HumairAK |
@kaikaila: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
🚫 This command cannot be processed. Only organization members or owners can use the commands. |
16600ee
to
4287d53
Compare
fc90ec9
to
fe390b9
Compare
676f98f
to
58897c4
Compare
UUID string `gorm:"column:UUID; not null; primaryKey;"` | ||
CreatedAtInSec int64 `gorm:"column:CreatedAtInSec; not null;"` | ||
Name string `gorm:"column:Name; not null; unique_index:namespace_name;"` // Index improves performance of the List ang Get queries | ||
Name string `gorm:"column:Name; not null; uniqueIndex:namespace_name; type:varchar(191);"` // Index improves performance of the List ang Get queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks unaddressed
a5f7ae7
to
e7e7f7a
Compare
@kaikaila can you also rebase your pr such that I'm not a co-oauthor, this is all your work so you should be the only author listed! |
cac2afd
to
148090c
Compare
Hi @HumairAK |
scope.Raw( | ||
"ALTER TABLE " + quotedTableName + " ADD COLUMN DisplayName VARCHAR(255) NULL;", | ||
).Exec() | ||
scope.Raw("UPDATE " + quotedTableName + " SET DisplayName = Name").Exec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need this update code so that users upgrading from say 2.4 to this version will have the DisplayName column filled in since it's a required field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mprahl, thanks for pointing that out. I didn't realize the function also backfills the DisplayName column. I’ve restored the addDisplayName function.
Q1: Is it worth adding a unit test for it using sqlmock?
Q2: In GORM v1, addDisplayName first checked whether the user had already added a DisplayName column.
If the user had already customized this column, we allowed it to be nullable — which differs from the GORM tag that requires DisplayName to be NOT NULL.
Could you confirm whether we should enforce NOT NULL in this case?
If the user already has a DisplayName column, should we fill any null values with the Name value and then set the column to NOT NULL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the context is that only Name
existed. DisplayName
got added as a required column but to do so, you first need to create the DisplayName
column as nullable, copy the values from Name
to existing rows, and then make DisplayName
not nullable.
So we just need to keep that flow. No need to add additional test coverage unless it's easy to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I’ve restored the flow to: ADD as NULL → UPDATE from Name → enforce NOT NULL.
One follow-up on edge cases the v1 logic didn’t cover:
today we only run addDisplayName if the column is missing. If an installation already has a user-created DisplayName column (possibly nullable, with some NULLs), the legacy code does nothing — no backfill and no NOT NULL enforcement.
Question: what’s our policy for that case?
• Do we want to enforce consistency with the current model (i.e., still run UPDATE … WHERE DisplayName IS NULL and then set the column to NOT NULL, even if the column already exists)?
• Or do we leave user-created columns as-is and only enforce non-null on new writes at the API layer?
Right now I can implement the first option safely (backfill NULLs then set NOT NULL). Please advise which direction we prefer.
Separately, note that DropAllConstraintsAndIndexes drops all FKs/UNIQUE/non-primary indexes. That will also remove user-created indexes and AutoMigrate will only recreate the GORM-tagged ones. Is that acceptable, or should we limit drops to KFP-managed objects only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaikaila we can assume the user didn't manually add a column and if they did, the migration should fail. 😄
return fmt.Errorf("failed to backfill experiment UUID in run_details table: %s", err) | ||
} | ||
|
||
if err := db.Migrator().AlterColumn(&model.Pipeline{}, "Description"); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment explaining why this is not handled in AutoMigrate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re right, it is unnecessary. I removed the AlterColumn.
} | ||
|
||
// Step 3: drop all indexes and constraints except primary key which blocks shrinking columns | ||
if err := DropAllConstraintsAndIndexes(db, dialect.Name); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you considered dropping only the constraints required for us to migrate without error instead of dropping all of them?
If this is feasible, that would be ideal - the concern is that if a user has a large number of runs it may take a while to rebuild the indexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion — I implemented dropLegacyIndexes
function which only drops the specific indexes that block the migration.
Also, I have a follow-up question: since KFP has never officially supported pgx before, is it reasonable to handle pgx only in the fresh install path, and not cover the legacy upgrade path for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that's fine, maybe we can just do the legacy check anyways but if the driverName is pgx we throw a meaningful error instead of proceeding with migration
e5418bb
to
350781c
Compare
Here’s the SQL I used to simulate a very old schema for testing the legacy upgrade workflow (might be handy for your verification) // switch to master branch (gorm v1)
// launch api server in master branch
USE mlpipeline;
// setup to test backfilling pipeline_version
DROP TABLE pipeline_versions;
// setup to test dropLegacyIndexes()
CREATE UNIQUE INDEX Name ON experiments (Name);
CREATE UNIQUE INDEX Name ON pipelines (Name);
// setup to test addDisplayNameColumn()
ALTER TABLE pipelines DROP COLUMN DisplayName;
// insert dummy data to set up for test initPipelineVersionsFromPipelines()
INSERT INTO pipelines (UUID, CreatedAtInSec, Name, Description, Parameters, Status, DefaultVersionId, Namespace)
VALUES
('pipe-uuid-1', UNIX_TIMESTAMP(), 'pipeline1', 'Dummy pipeline 1', NULL, 'READY', NULL, 'default'),
('pipe-uuid-2', UNIX_TIMESTAMP(), 'pipeline2', 'Dummy pipeline 2', NULL, 'READY', NULL, 'default');
// switch to chore/gorm-v2-migration branch
// launch api server again
// there should be 2 rows in pipeline_versions |
Since KFP hasn’t officially supported pgx before, would it be acceptable for InitDBClient to handle pgx only for fresh installs and skip it in the legacy upgrade path? |
350781c
to
6d15489
Compare
/retest |
@kaikaila could you please squash your commits? Then I'll lgtm it! |
Key changes: - Enforce stricter string length limits (API + DB schema) to ensure MySQL indexability. - Add preflight scan to block upgrade if legacy data violates limits. - Cleanup/normalize legacy MySQL indexes, drop/rename duplicates. - Split InitDBClient into legacy upgrade vs non-legacy autoMigrate paths. - Centralize length specs for both API validation and DDL shrink. - Replace GORM v1 APIs with v2 Migrator and struct tags. Signed-off-by: kaikaila <[email protected]>
6d15489
to
8e41c88
Compare
Squashed! 🎉 Thanks @mprahl — glad we’re almost there. |
/retest |
/lgtm great work! |
Tested and verified. /lgtm Amazing work @kaikaila 🥳 🥇 🎉 !!! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: HumairAK The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
This PR migrates Kubeflow Pipelines backend from GORM v1 (github.com/jinzhu/gorm) to GORM v2 (gorm.io/gorm). It covers all ORM models and the execution cache subsystem.
Breaking Changes
Stricter Field Length Constraints
backend/src/apiserver/validation/length.go
.Upgrade Guardrails for Legacy Data
(Name, Namespace) Unique Index Deduplication (pipelines)
namespace_name
(from tag) andname_namespace_index
(manual).namespace_name
only. If both exist, the legacy one is removed (or renamed to namespace_name when safe).Migration/Upgrade Behavior
Internal Refactors
validation/length.go
drive both API guards and DDL shrink to prevent drift.dialect.go
Unit Tests