chore(backend): migrate GORM v1 to v2 #12013

kaikaila · 2025-06-25T07:37:07Z

Summary

This PR migrates Kubeflow Pipelines backend from GORM v1 (github.com/jinzhu/gorm) to GORM v2 (gorm.io/gorm). It covers all ORM models and the execution cache subsystem.

Breaking Changes

Stricter Field Length Constraints
- Certain string fields now have tighter length limits to ensure indexability across MySQL (e.g., utf8mb4 key length). See the complete list of constrained fields in backend/src/apiserver/validation/length.go.
- API layer enforces these limits: overlong inputs are rejected with HTTP 400 and clear messages.
Upgrade Guardrails for Legacy Data
- During upgrade, a preflight scan aborts migration if existing rows violate the new limits. Users must shorten those values before retrying.
(Name, Namespace) Unique Index Deduplication (pipelines)
- Historically there were two equivalent unique indexes on (Name, Namespace): namespace_name (from tag) and name_namespace_index (manual).
- We now keep namespace_name only. If both exist, the legacy one is removed (or renamed to namespace_name when safe).

Migration/Upgrade Behavior

Legacy schema (pre-2.15) detected → run legacy upgrade flow:

1. Run preflight check to ensure if existing rows comply with the new length limits
   2. Drop foreign key constraints only (minimal DDL needed to shrink indexed columns).
3. Targeted legacy index cleanup (MySQL only):
- Drop single-column indexes on experiments and pipelines (legacy residues). Reference [here](https://github.com/kubeflow/pipelines/blob/cdc85ce90db7b821ad25cfee925b21dc2b22bbbe/backend/src/apiserver/client_manager/client_manager.go#L420C1-L428C3).
- Drop composite unique index idx_pipeline_version_uuid_name on pipeline_versions (historical). Reference [here](https://github.com/kubeflow/pipelines/blob/cdc85ce90db7b821ad25cfee925b21dc2b22bbbe/backend/src/apiserver/client_manager/client_manager.go#L518).
- Normalize (Name, Namespace) unique index on pipelines to namespace_name (keep/rename/drop as needed). 
4. Shrink columns per new limits; then AutoMigrate re-applies constraints and indexes from tags.
5. Backfill DisplayName for pipelines / pipeline_versions where needed, and ExperimentUUID in run_details.

Non-legacy schema (KFP >=2.15): run autoMigrate for both first-time installs and upgrades between >=2.15 versions.

Internal Refactors

Migrated to GORM v2 Migrator:
Replaced v1 APIs (AddIndex, RemoveIndex, AddForeignKey, AddUniqueIndex, ModifyColumn) with v2 equivalents and struct tags.
Index/constraint creation now lives in tags; legacy hand-crafted index DDL in InitDBClient is removed or minimized.
InitDBClient flow split: Clear separation of legacy upgrade paths and non-legacy schema for readability and safety.
Unified validation source of truth: Centralized length specs validation/length.go drive both API guards and DDL shrink to prevent drift.
Abstract dialect related syntax from InitDBClient to dialect.go

Unit Tests

API-level length validation (pass/fail).
Preflight length scan (blocks on violations).
Idempotent legacy index cleanup (no-op when already normalized).

google-oss-prow · 2025-06-25T07:37:17Z

Hi @kaikaila. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

github-actions · 2025-06-25T07:37:29Z

🚫 This command cannot be processed. Only organization members or owners can use the commands.

kaikaila · 2025-06-25T07:38:28Z

Hi @HumairAK
I wanna bring this to your attention. In backend/src/apiserver/model/task.go line 26, the field name RunId ≠ column name RunUUID. This may cause confusion when using foreignKey: in GORM v2. Happy to refactor if we want to align them.

google-oss-prow · 2025-06-25T08:02:53Z

@kaikaila: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

github-actions · 2025-06-25T08:03:03Z

🚫 This command cannot be processed. Only organization members or owners can use the commands.

mysql-service.yaml

backend/src/apiserver/storage/db_fake.go

backend/src/apiserver/client_manager/client_manager.go

backend/src/cache/client_manager.go

backend/src/apiserver/client_manager/client_manager.go

backend/src/apiserver/model/pipeline.go

backend/src/apiserver/model/pipeline_spec.go

backend/src/apiserver/model/pipeline_version.go

backend/src/apiserver/model/resource_reference.go

backend/src/apiserver/client_manager/client_manager.go

backend/src/apiserver/model/pipeline.go

backend/test/integration/db_test.go

HumairAK · 2025-07-07T16:15:12Z

backend/src/apiserver/model/pipeline.go

+	UUID           string `gorm:"column:UUID; not null; primaryKey;"`
 	CreatedAtInSec int64  `gorm:"column:CreatedAtInSec; not null;"`
-	Name           string `gorm:"column:Name; not null; unique_index:namespace_name;"` // Index improves performance of the List ang Get queries
+	Name           string `gorm:"column:Name; not null; uniqueIndex:namespace_name; type:varchar(191);"` // Index improves performance of the List ang Get queries


this looks unaddressed

backend/src/apiserver/model/pipeline_version.go

backend/src/apiserver/model/common.go

backend/src/apiserver/model/task.go

backend/src/apiserver/model/pipeline.go

HumairAK · 2025-08-07T21:13:54Z

@kaikaila can you also rebase your pr such that I'm not a co-oauthor, this is all your work so you should be the only author listed!

kaikaila · 2025-08-07T23:13:17Z

Hi @HumairAK
Thanks for your comments. I’ve addressed all of them and reviewed the schema diff to confirm it matches the expected changes. For now, I’ve roughly split the changes into 3 commits for my own convenience in case we need to make further updates. If Matt still needs to review, I’ll keep them as 3 commits; if not, I'm happy to squash them into a single commit.

mprahl · 2025-08-08T15:02:41Z

backend/src/apiserver/client_manager/client_manager.go

-		scope.Raw(
-			"ALTER TABLE " + quotedTableName + " ADD COLUMN DisplayName VARCHAR(255) NULL;",
-		).Exec()
-		scope.Raw("UPDATE " + quotedTableName + " SET DisplayName = Name").Exec()


We still need this update code so that users upgrading from say 2.4 to this version will have the DisplayName column filled in since it's a required field.

Hi @mprahl, thanks for pointing that out. I didn't realize the function also backfills the DisplayName column. I’ve restored the addDisplayName function.

Q1: Is it worth adding a unit test for it using sqlmock?

Q2: In GORM v1, addDisplayName first checked whether the user had already added a DisplayName column.
If the user had already customized this column, we allowed it to be nullable — which differs from the GORM tag that requires DisplayName to be NOT NULL.

Could you confirm whether we should enforce NOT NULL in this case?
If the user already has a DisplayName column, should we fill any null values with the Name value and then set the column to NOT NULL?

So the context is that only Name existed. DisplayName got added as a required column but to do so, you first need to create the DisplayName column as nullable, copy the values from Name to existing rows, and then make DisplayName not nullable.

So we just need to keep that flow. No need to add additional test coverage unless it's easy to do.

Thanks, I’ve restored the flow to: ADD as NULL → UPDATE from Name → enforce NOT NULL.

One follow-up on edge cases the v1 logic didn’t cover:
today we only run addDisplayName if the column is missing. If an installation already has a user-created DisplayName column (possibly nullable, with some NULLs), the legacy code does nothing — no backfill and no NOT NULL enforcement.

Question: what’s our policy for that case?
• Do we want to enforce consistency with the current model (i.e., still run UPDATE … WHERE DisplayName IS NULL and then set the column to NOT NULL, even if the column already exists)?
• Or do we leave user-created columns as-is and only enforce non-null on new writes at the API layer?

Right now I can implement the first option safely (backfill NULLs then set NOT NULL). Please advise which direction we prefer.

Separately, note that DropAllConstraintsAndIndexes drops all FKs/UNIQUE/non-primary indexes. That will also remove user-created indexes and AutoMigrate will only recreate the GORM-tagged ones. Is that acceptable, or should we limit drops to KFP-managed objects only?

@kaikaila we can assume the user didn't manually add a column and if they did, the migration should fail. 😄

mprahl · 2025-08-08T15:56:42Z

backend/src/apiserver/client_manager/client_manager.go

+		return fmt.Errorf("failed to backfill experiment UUID in run_details table: %s", err)
+	}
+
+	if err := db.Migrator().AlterColumn(&model.Pipeline{}, "Description"); err != nil {


Could you add a comment explaining why this is not handled in AutoMigrate?

You’re right, it is unnecessary. I removed the AlterColumn.

HumairAK · 2025-08-08T18:12:51Z

backend/src/apiserver/client_manager/client_manager.go

+	}
+
+	// Step 3: drop all indexes and constraints except primary key which blocks shrinking columns
+	if err := DropAllConstraintsAndIndexes(db, dialect.Name); err != nil {


have you considered dropping only the constraints required for us to migrate without error instead of dropping all of them?

If this is feasible, that would be ideal - the concern is that if a user has a large number of runs it may take a while to rebuild the indexes

Thanks for the suggestion — I implemented dropLegacyIndexes function which only drops the specific indexes that block the migration.

Also, I have a follow-up question: since KFP has never officially supported pgx before, is it reasonable to handle pgx only in the fresh install path, and not cover the legacy upgrade path for it?

Yeah I think that's fine, maybe we can just do the legacy check anyways but if the driverName is pgx we throw a meaningful error instead of proceeding with migration

kaikaila · 2025-08-11T09:01:04Z

Here’s the SQL I used to simulate a very old schema for testing the legacy upgrade workflow (might be handy for your verification)

// switch to master branch (gorm v1)
// launch api server in master branch

USE mlpipeline;

// setup to test backfilling pipeline_version
DROP TABLE pipeline_versions;

// setup to test dropLegacyIndexes()
CREATE UNIQUE INDEX Name ON experiments (Name);
CREATE UNIQUE INDEX Name ON pipelines (Name);

// setup to test addDisplayNameColumn()
ALTER TABLE pipelines DROP COLUMN DisplayName; 

// insert dummy data to set up for test initPipelineVersionsFromPipelines()
INSERT INTO pipelines (UUID, CreatedAtInSec, Name, Description, Parameters, Status, DefaultVersionId, Namespace)
VALUES
('pipe-uuid-1', UNIX_TIMESTAMP(), 'pipeline1', 'Dummy pipeline 1', NULL, 'READY', NULL, 'default'),
('pipe-uuid-2', UNIX_TIMESTAMP(), 'pipeline2', 'Dummy pipeline 2', NULL, 'READY', NULL, 'default');

// switch to chore/gorm-v2-migration branch 
// launch api server again
// there should be 2 rows in pipeline_versions

kaikaila · 2025-08-11T09:01:47Z

Since KFP hasn’t officially supported pgx before, would it be acceptable for InitDBClient to handle pgx only for fresh installs and skip it in the legacy upgrade path?

backend/src/apiserver/client_manager/client_manager.go

kaikaila · 2025-08-13T21:53:38Z

/retest

mprahl · 2025-08-14T15:02:49Z

@kaikaila could you please squash your commits? Then I'll lgtm it!

Key changes: - Enforce stricter string length limits (API + DB schema) to ensure MySQL indexability. - Add preflight scan to block upgrade if legacy data violates limits. - Cleanup/normalize legacy MySQL indexes, drop/rename duplicates. - Split InitDBClient into legacy upgrade vs non-legacy autoMigrate paths. - Centralize length specs for both API validation and DDL shrink. - Replace GORM v1 APIs with v2 Migrator and struct tags. Signed-off-by: kaikaila <[email protected]>

kaikaila · 2025-08-14T19:15:37Z

Squashed! 🎉 Thanks @mprahl — glad we’re almost there.

kaikaila · 2025-08-14T20:33:45Z

/retest

mprahl · 2025-08-15T13:07:57Z

/lgtm great work!

HumairAK · 2025-08-15T15:24:08Z

Tested and verified.

/lgtm
/approve

Amazing work @kaikaila 🥳 🥇 🎉 !!!

google-oss-prow · 2025-08-15T15:24:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HumairAK

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [HumairAK]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow bot requested review from HumairAK, droctothorpe and hbelmiro June 25, 2025 07:37

google-oss-prow bot added needs-ok-to-test size/XL labels Jun 25, 2025

kaikaila mentioned this pull request Jun 25, 2025

chore(backend): migrate GORM v1 to v2 #11929

Closed

HumairAK added ok-to-test and removed needs-ok-to-test labels Jun 25, 2025

kaikaila force-pushed the chore/gorm-v2-migration branch 2 times, most recently from 16600ee to 4287d53 Compare June 25, 2025 23:29

HumairAK requested changes Jun 27, 2025

View reviewed changes

google-oss-prow bot assigned HumairAK Jun 27, 2025

kaikaila force-pushed the chore/gorm-v2-migration branch 3 times, most recently from fc90ec9 to fe390b9 Compare July 2, 2025 08:50

google-oss-prow bot added size/XXL and removed size/XL labels Jul 5, 2025

kaikaila force-pushed the chore/gorm-v2-migration branch from 676f98f to 58897c4 Compare July 5, 2025 00:04

google-oss-prow bot added size/XL and removed size/XXL labels Jul 5, 2025

HumairAK requested changes Jul 7, 2025

View reviewed changes

kaikaila force-pushed the chore/gorm-v2-migration branch 4 times, most recently from a5f7ae7 to e7e7f7a Compare July 10, 2025 01:30

HumairAK changed the title ~~[wip]chore(backend): migrate GORM v1 to v2~~ chore(backend): migrate GORM v1 to v2 Aug 7, 2025

google-oss-prow bot removed the do-not-merge/work-in-progress label Aug 7, 2025

kaikaila force-pushed the chore/gorm-v2-migration branch from cac2afd to 148090c Compare August 7, 2025 23:01

mprahl reviewed Aug 8, 2025

View reviewed changes

HumairAK reviewed Aug 8, 2025

View reviewed changes

kaikaila force-pushed the chore/gorm-v2-migration branch 3 times, most recently from e5418bb to 350781c Compare August 11, 2025 08:44

HumairAK reviewed Aug 13, 2025

View reviewed changes

backend/src/apiserver/client_manager/client_manager.go Show resolved Hide resolved

mprahl reviewed Aug 13, 2025

View reviewed changes

backend/src/apiserver/client_manager/client_manager.go Outdated Show resolved Hide resolved

mprahl reviewed Aug 13, 2025

View reviewed changes

backend/src/apiserver/client_manager/client_manager.go Show resolved Hide resolved

kaikaila force-pushed the chore/gorm-v2-migration branch from 350781c to 6d15489 Compare August 13, 2025 20:11

kaikaila force-pushed the chore/gorm-v2-migration branch from 6d15489 to 8e41c88 Compare August 14, 2025 19:13

google-oss-prow bot added the lgtm label Aug 15, 2025

google-oss-prow bot added the approved label Aug 15, 2025

google-oss-prow bot merged commit 2af42c3 into kubeflow:master Aug 15, 2025
82 of 86 checks passed

kaikaila deleted the chore/gorm-v2-migration branch August 15, 2025 21:29

chore(backend): migrate GORM v1 to v2 #12013

chore(backend): migrate GORM v1 to v2 #12013

Uh oh!

Conversation

kaikaila commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Breaking Changes

Migration/Upgrade Behavior

Internal Refactors

Unit Tests

Uh oh!

google-oss-prow bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

kaikaila commented Jun 25, 2025

Uh oh!

google-oss-prow bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HumairAK commented Aug 7, 2025

Uh oh!

kaikaila commented Aug 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaikaila Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaikaila Aug 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kaikaila commented Aug 11, 2025 • edited by HumairAK Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaikaila commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kaikaila commented Aug 13, 2025

Uh oh!

mprahl commented Aug 14, 2025

kaikaila commented Jun 25, 2025 •

edited

Loading

kaikaila Aug 8, 2025 •

edited

Loading

kaikaila Aug 10, 2025 •

edited

Loading

kaikaila commented Aug 11, 2025 •

edited by HumairAK

Loading