Skip to content

Conversation

@fogelito
Copy link
Contributor

@fogelito fogelito commented Aug 17, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Corrects record ID handling during updates to preserve existing histories and references.
    • Ensures tenant association is reliably retained during upserts in shared tables, preventing cross-tenant data mix-ups.
  • Refactor
    • Standardizes database write ordering for more deterministic, reliable upserts.
    • Improves consistency of bulk updates/imports by aligning column mapping and value binding.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 17, 2025

Walkthrough

Updates SQL adapter upsert path: deterministically orders columns, constructs a new $columns list, assigns _id from the old document’s sequence (default null), preserves tenant propagation via _tenant, rebuilds batch keys and bind values accordingly, and invokes upsert with the updated parameters. No public API changes.

Changes

Cohort / File(s) Summary of changes
SQL upsert path refactor
src/Database/Adapter/SQL.php
- Introduce local $columns for insert list used by upsert
- Assign _id from old document sequence; remove reliance on new sequence
- Deterministic column ordering via ksort on attributes
- Rebuild column projection, batch placeholders, $batchKeys, and $bindValues
- Maintain sharedTables _tenant propagation
- Upsert call updated to use new ordering and values

Sequence Diagram(s)

sequenceDiagram
  participant Caller as Database Adapter Caller
  participant SQL as SQL Adapter
  participant DB as SQL Database

  Caller->>SQL: upsert(changes)
  SQL->>SQL: sort attributes (ksort)
  SQL->>SQL: build $columns, $batchKeys, $bindValues
  SQL->>SQL: set _id from old sequence (or null)
  SQL->>SQL: include _tenant when sharedTables
  SQL->>DB: EXECUTE UPSERT ($columns, $batchKeys, $bindValues)
  DB-->>SQL: result
  SQL-->>Caller: upsert result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • abnegate

Poem

I sorted the fields with a hop and a grin,
Old IDs in paw, I tucked them back in.
Columns in order, placeholders neat—
Upserts now march with a predictable beat.
Tenant tracks clear on the burrowed trail—
Carrots compiled, no queries fail. 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch upsert-primary-key

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
src/Database/Adapter/SQL.php (2)

1972-2028: Batch upsert builds columns per-row (last one wins) — leads to invalid SQL when rows differ; build a single, deterministic column set for the whole batch.

$columns is recomputed for each change and only the last one is used for the prepared statement while $batchKeys collects per-row placeholders. If the attribute set differs between rows, VALUES tuples won’t match the final $columns list, causing SQL errors or misaligned bindings. Deterministic ordering per row (ksort) isn’t sufficient; the column list must be globally consistent across the batch.

Refactor to:

  • Pre-scan changes, compute a union of attribute keys (including internal keys), sort them deterministically once, and build $columns exactly once.
  • For each row, bind values in that single global order, defaulting missing attributes to null.
  • Only include _id if safe (see next comment) and pass the final attribute name list to getUpsertStatement, not the last row’s $attributes.

Proposed diff:

-            $attributes = [];
             $bindIndex = 0;
             $batchKeys = [];
             $bindValues = [];
-            $columns = [];
+            // Build a deterministic, global column list across the whole batch
+            $includeId = true;
+            $unionKeys = [];
+            foreach ($changes as $i => $c) {
+                $d = $c->getNew();
+                $row = $d->getAttributes();
+                $row['_uid'] = $d->getId();
+                $row['_createdAt'] = $d->getCreatedAt();
+                $row['_updatedAt'] = $d->getUpdatedAt();
+                $row['_permissions'] = \json_encode($d->getPermissions());
+                if ($this->sharedTables) {
+                    $row['_tenant'] = $d->getTenant();
+                }
+                $oldId = $c->getOld()->getSequence();
+                if (!empty($oldId)) {
+                    $row['_id'] = $oldId;
+                } else {
+                    $includeId = false;
+                }
+                foreach (\array_keys($row) as $k) {
+                    $unionKeys[$k] = true;
+                }
+            }
+            if (!$includeId) {
+                unset($unionKeys['_id']);
+            }
+            $attributeNames = \array_keys($unionKeys);
+            \sort($attributeNames, \SORT_STRING);
+            $columns = '(' . \implode(', ', \array_map(fn ($attr) => $this->quote($this->filter($attr)), $attributeNames)) . ')';
 
             foreach ($changes as $change) {
-                $document = $change->getNew();
-                $attributes = $document->getAttributes();
-                $attributes['_uid'] = $document->getId();
-                $attributes['_createdAt'] = $document->getCreatedAt();
-                $attributes['_updatedAt'] = $document->getUpdatedAt();
-                $attributes['_permissions'] = \json_encode($document->getPermissions());
-
-                $attributes['_id'] = null;
-
-                if (!empty($change->getOld()->getSequence())) {
-                    $attributes['_id'] = $change->getOld()->getSequence();
-                }
-
-//                if (!empty($document->getSequence())) {
-//                    $attributes['_id'] = $document->getSequence();
-//                }
-
-                if ($this->sharedTables) {
-                    $attributes['_tenant'] = $document->getTenant();
-                }
-
-                \ksort($attributes);
-
-                $columns = [];
-                foreach (\array_keys($attributes) as $key => $attr) {
-                    /**
-                     * @var string $attr
-                     */
-                    $columns[$key] = "{$this->quote($this->filter($attr))}";
-                }
-                $columns = '(' . \implode(', ', $columns) . ')';
-
+                $document = $change->getNew();
+                $row = $document->getAttributes();
+                // Internal attributes
+                $row['_uid'] = $document->getId();
+                $row['_createdAt'] = $document->getCreatedAt();
+                $row['_updatedAt'] = $document->getUpdatedAt();
+                $row['_permissions'] = \json_encode($document->getPermissions());
+                if ($this->sharedTables) {
+                    $row['_tenant'] = $document->getTenant();
+                }
+                if ($includeId) {
+                    // All rows have an old sequence by construction
+                    $row['_id'] = $change->getOld()->getSequence();
+                }
                 $bindKeys = [];
-
-                foreach ($attributes as $attrValue) {
-                    if (\is_array($attrValue)) {
-                        $attrValue = \json_encode($attrValue);
+                // Bind values in the same deterministic order for every row
+                foreach ($attributeNames as $attr) {
+                    $value = $row[$attr] ?? null;
+                    if (\is_array($value)) {
+                        $value = \json_encode($value);
                     }
-                    $attrValue = (\is_bool($attrValue)) ? (int)$attrValue : $attrValue;
+                    $value = (\is_bool($value)) ? (int)$value : $value;
                     $bindKey = 'key_' . $bindIndex;
                     $bindKeys[] = ':' . $bindKey;
-                    $bindValues[$bindKey] = $attrValue;
+                    $bindValues[$bindKey] = $value;
                     $bindIndex++;
                 }
 
                 $batchKeys[] = '(' . \implode(', ', $bindKeys) . ')';
             }
 
-            $stmt = $this->getUpsertStatement($name, $columns, $batchKeys, $attributes, $bindValues, $attribute);
+            $stmt = $this->getUpsertStatement($name, $columns, $batchKeys, $attributeNames, $bindValues, $attribute);

This makes the insert list and each VALUES tuple consistent, preventing runtime SQL errors and aligning with the stated goal of deterministic column ordering.

I can adapt the same approach for any driver-specific getUpsertStatement implementations if needed.


2000-2009: Per-row ksort does not guarantee batch-wide column alignment.

Sorting keys per row is not enough if rows have different keys. Use a single sorted union (see first comment) to generate one $columns list for all rows.

🧹 Nitpick comments (1)
src/Database/Adapter/SQL.php (1)

1968-1976: Add unit tests to cover mixed old-sequence/no-sequence upsert batches.

Tests should assert:

  • All rows with old sequences: _id preserved.
  • No rows with old sequences: _id omitted and auto-generated by DB.
  • Mixed rows: either split into two batches or verify behavior per driver.

I can scaffold tests once we confirm driver semantics for DEFAULT vs omission.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b9ba929 and 31e8b4a.

📒 Files selected for processing (1)
  • src/Database/Adapter/SQL.php (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/Database/Adapter/SQL.php (2)
src/Database/Change.php (1)
  • getOld (13-16)
src/Database/Document.php (1)
  • getSequence (71-80)
🪛 GitHub Actions: Linter
src/Database/Adapter/SQL.php

[error] 1-1: PSR-12 lint error: statement_indentation.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Setup & Build Docker Image
🔇 Additional comments (2)
src/Database/Adapter/SQL.php (2)

2027-2028: Pass the attribute name list used to build $columns to getUpsertStatement.

Derived implementations typically need the column names to construct the ON DUPLICATE/ON CONFLICT update clauses. Passing the last row’s $attributes can desynchronize update projections. Prefer passing the global, deterministic attribute name list.

See proposed change in the main diff to pass $attributeNames instead of $attributes.

I can update driver implementations if they expect a different shape for the $attributes argument (names vs map).


1992-1995: Outdated comment – no legacy getSequence code remains

I searched src/Database/Adapter/SQL.php for any occurrences of getSequence (commented or not) and found none. The suggested diff to remove those lines no longer applies. Please re-run the PSR-12 linter to pinpoint the actual indentation failure.

Likely an incorrect or invalid review comment.


if (!empty($document->getSequence())) {
$attributes['_id'] = $document->getSequence();
$attributes['_id'] = null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Autoincrement will work when setting to null, this way we always have a key

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants