-
Notifications
You must be signed in to change notification settings - Fork 52
Upsert primary key #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Upsert primary key #666
Conversation
WalkthroughUpdates SQL adapter upsert path: deterministically orders columns, constructs a new $columns list, assigns _id from the old document’s sequence (default null), preserves tenant propagation via _tenant, rebuilds batch keys and bind values accordingly, and invokes upsert with the updated parameters. No public API changes. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Database Adapter Caller
participant SQL as SQL Adapter
participant DB as SQL Database
Caller->>SQL: upsert(changes)
SQL->>SQL: sort attributes (ksort)
SQL->>SQL: build $columns, $batchKeys, $bindValues
SQL->>SQL: set _id from old sequence (or null)
SQL->>SQL: include _tenant when sharedTables
SQL->>DB: EXECUTE UPSERT ($columns, $batchKeys, $bindValues)
DB-->>SQL: result
SQL-->>Caller: upsert result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
src/Database/Adapter/SQL.php (2)
1972-2028: Batch upsert builds columns per-row (last one wins) — leads to invalid SQL when rows differ; build a single, deterministic column set for the whole batch.$columns is recomputed for each change and only the last one is used for the prepared statement while $batchKeys collects per-row placeholders. If the attribute set differs between rows, VALUES tuples won’t match the final $columns list, causing SQL errors or misaligned bindings. Deterministic ordering per row (ksort) isn’t sufficient; the column list must be globally consistent across the batch.
Refactor to:
- Pre-scan changes, compute a union of attribute keys (including internal keys), sort them deterministically once, and build $columns exactly once.
- For each row, bind values in that single global order, defaulting missing attributes to null.
- Only include _id if safe (see next comment) and pass the final attribute name list to getUpsertStatement, not the last row’s $attributes.
Proposed diff:
- $attributes = []; $bindIndex = 0; $batchKeys = []; $bindValues = []; - $columns = []; + // Build a deterministic, global column list across the whole batch + $includeId = true; + $unionKeys = []; + foreach ($changes as $i => $c) { + $d = $c->getNew(); + $row = $d->getAttributes(); + $row['_uid'] = $d->getId(); + $row['_createdAt'] = $d->getCreatedAt(); + $row['_updatedAt'] = $d->getUpdatedAt(); + $row['_permissions'] = \json_encode($d->getPermissions()); + if ($this->sharedTables) { + $row['_tenant'] = $d->getTenant(); + } + $oldId = $c->getOld()->getSequence(); + if (!empty($oldId)) { + $row['_id'] = $oldId; + } else { + $includeId = false; + } + foreach (\array_keys($row) as $k) { + $unionKeys[$k] = true; + } + } + if (!$includeId) { + unset($unionKeys['_id']); + } + $attributeNames = \array_keys($unionKeys); + \sort($attributeNames, \SORT_STRING); + $columns = '(' . \implode(', ', \array_map(fn ($attr) => $this->quote($this->filter($attr)), $attributeNames)) . ')'; foreach ($changes as $change) { - $document = $change->getNew(); - $attributes = $document->getAttributes(); - $attributes['_uid'] = $document->getId(); - $attributes['_createdAt'] = $document->getCreatedAt(); - $attributes['_updatedAt'] = $document->getUpdatedAt(); - $attributes['_permissions'] = \json_encode($document->getPermissions()); - - $attributes['_id'] = null; - - if (!empty($change->getOld()->getSequence())) { - $attributes['_id'] = $change->getOld()->getSequence(); - } - -// if (!empty($document->getSequence())) { -// $attributes['_id'] = $document->getSequence(); -// } - - if ($this->sharedTables) { - $attributes['_tenant'] = $document->getTenant(); - } - - \ksort($attributes); - - $columns = []; - foreach (\array_keys($attributes) as $key => $attr) { - /** - * @var string $attr - */ - $columns[$key] = "{$this->quote($this->filter($attr))}"; - } - $columns = '(' . \implode(', ', $columns) . ')'; - + $document = $change->getNew(); + $row = $document->getAttributes(); + // Internal attributes + $row['_uid'] = $document->getId(); + $row['_createdAt'] = $document->getCreatedAt(); + $row['_updatedAt'] = $document->getUpdatedAt(); + $row['_permissions'] = \json_encode($document->getPermissions()); + if ($this->sharedTables) { + $row['_tenant'] = $document->getTenant(); + } + if ($includeId) { + // All rows have an old sequence by construction + $row['_id'] = $change->getOld()->getSequence(); + } $bindKeys = []; - - foreach ($attributes as $attrValue) { - if (\is_array($attrValue)) { - $attrValue = \json_encode($attrValue); + // Bind values in the same deterministic order for every row + foreach ($attributeNames as $attr) { + $value = $row[$attr] ?? null; + if (\is_array($value)) { + $value = \json_encode($value); } - $attrValue = (\is_bool($attrValue)) ? (int)$attrValue : $attrValue; + $value = (\is_bool($value)) ? (int)$value : $value; $bindKey = 'key_' . $bindIndex; $bindKeys[] = ':' . $bindKey; - $bindValues[$bindKey] = $attrValue; + $bindValues[$bindKey] = $value; $bindIndex++; } $batchKeys[] = '(' . \implode(', ', $bindKeys) . ')'; } - $stmt = $this->getUpsertStatement($name, $columns, $batchKeys, $attributes, $bindValues, $attribute); + $stmt = $this->getUpsertStatement($name, $columns, $batchKeys, $attributeNames, $bindValues, $attribute);This makes the insert list and each VALUES tuple consistent, preventing runtime SQL errors and aligning with the stated goal of deterministic column ordering.
I can adapt the same approach for any driver-specific getUpsertStatement implementations if needed.
2000-2009: Per-row ksort does not guarantee batch-wide column alignment.Sorting keys per row is not enough if rows have different keys. Use a single sorted union (see first comment) to generate one $columns list for all rows.
🧹 Nitpick comments (1)
src/Database/Adapter/SQL.php (1)
1968-1976: Add unit tests to cover mixed old-sequence/no-sequence upsert batches.Tests should assert:
- All rows with old sequences: _id preserved.
- No rows with old sequences: _id omitted and auto-generated by DB.
- Mixed rows: either split into two batches or verify behavior per driver.
I can scaffold tests once we confirm driver semantics for DEFAULT vs omission.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/Database/Adapter/SQL.php(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/Database/Adapter/SQL.php (2)
src/Database/Change.php (1)
getOld(13-16)src/Database/Document.php (1)
getSequence(71-80)
🪛 GitHub Actions: Linter
src/Database/Adapter/SQL.php
[error] 1-1: PSR-12 lint error: statement_indentation.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Setup & Build Docker Image
🔇 Additional comments (2)
src/Database/Adapter/SQL.php (2)
2027-2028: Pass the attribute name list used to build $columns to getUpsertStatement.Derived implementations typically need the column names to construct the ON DUPLICATE/ON CONFLICT update clauses. Passing the last row’s $attributes can desynchronize update projections. Prefer passing the global, deterministic attribute name list.
See proposed change in the main diff to pass $attributeNames instead of $attributes.
I can update driver implementations if they expect a different shape for the $attributes argument (names vs map).
1992-1995: Outdated comment – no legacygetSequencecode remainsI searched
src/Database/Adapter/SQL.phpfor any occurrences ofgetSequence(commented or not) and found none. The suggested diff to remove those lines no longer applies. Please re-run the PSR-12 linter to pinpoint the actual indentation failure.Likely an incorrect or invalid review comment.
|
|
||
| if (!empty($document->getSequence())) { | ||
| $attributes['_id'] = $document->getSequence(); | ||
| $attributes['_id'] = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Autoincrement will work when setting to null, this way we always have a key
Summary by CodeRabbit