Skip to content

Comments

Clarify uncertainty error keys and document randomized anchor key tuning#22814

Open
rmloveland wants to merge 2 commits intomainfrom
20260223-DOC-15172-DOC-14735-randomized-anchor-keys
Open

Clarify uncertainty error keys and document randomized anchor key tuning#22814
rmloveland wants to merge 2 commits intomainfrom
20260223-DOC-15172-DOC-14735-randomized-anchor-keys

Conversation

@rmloveland
Copy link
Contributor

@rmloveland rmloveland commented Feb 23, 2026

Clarify uncertainty error keys and document randomized anchor key tuning

Fixes:

Summary of changes:

  • v25.4 and below, and v26.1 and later:

    • Update ReadWithinUncertaintyIntervalError docs to:
      • Extend the example to include a larger meta={key=/Table/...} fragment
    • Add randomized anchor key tuning guidance to the shared performance/reduce-contention.md
      include:
      • Describe when to consider transaction.randomized_anchor_key.enabled for workloads with
        large concurrent UPDATE/INSERT batches that create transaction record (anchor) hotspots
      • Emphasize that this setting randomizes anchor placement (not user data) to spread txn
        records across ranges
      • In v26.1, tie this guidance to the improved observability: contention events
        and logs already report the actual contention key, so anchor randomization is a
        secondary knob once true conflict locations are understood
  • v25.4 and below only:

    • Update ReadWithinUncertaintyIntervalError docs to:
      • Add an "Interpreting log messages" callout explaining that the logged key is the
        transaction record (anchor) key, not necessarily the actual conflict key
      • Note that SERIALIZATION_CONFLICT contention events recorded when
        sql.contention.record_serialization_conflicts.enabled is true also use this anchor key
  • v26.1 and later only:

    • Update ReadWithinUncertaintyIntervalError docs to:
      • Add an "Interpreting log messages" callout clarifying that in 26.1+ the logged key
        represents the actual contention key, and that earlier versions reported the anchor key
      • Document that contention events now use this contention key when the
        sql.contention.record_serialization_conflicts.enabled setting is enabled
    • Update 'Troubleshoot lock contention' page to include a new section on using randomized
      anchor keys

@rmloveland rmloveland marked this pull request as draft February 23, 2026 20:55
@netlify
Copy link

netlify bot commented Feb 23, 2026

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit ee10154
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/699e08b3a68d6c0008462822

@netlify
Copy link

netlify bot commented Feb 23, 2026

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit ee10154
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/699e08b3c60ca50008ddd1d6

@netlify
Copy link

netlify bot commented Feb 23, 2026

Netlify Preview

Name Link
🔨 Latest commit ee10154
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/699e08b38830070008822bba
😎 Deploy Preview https://deploy-preview-22814--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@rmloveland rmloveland force-pushed the 20260223-DOC-15172-DOC-14735-randomized-anchor-keys branch 2 times, most recently from c2829cd to 1ada635 Compare February 24, 2026 16:20
@rmloveland rmloveland marked this pull request as ready for review February 24, 2026 16:20
@rmloveland rmloveland force-pushed the 20260223-DOC-15172-DOC-14735-randomized-anchor-keys branch from 1ada635 to 8988c8a Compare February 24, 2026 16:35
Fixes:

- DOC-14735
- DOC-15172

Summary of changes:

- v25.4 and below, and v26.1 and later:

  - Update ReadWithinUncertaintyIntervalError docs to:
    - Extend the example to include a larger `meta={key=/Table/...}` fragment
  - Add randomized anchor key tuning guidance to the shared `performance/reduce-contention.md`
    include:
    - Describe when to consider `transaction.randomized_anchor_key.enabled` for workloads with
      large concurrent UPDATE/INSERT batches that create transaction record (anchor) hotspots
    - Emphasize that this setting randomizes anchor placement (not user data) to spread txn
      records across ranges
    - In v26.1, tie this guidance to the improved observability: contention events
      and logs already report the actual contention key, so anchor randomization is a
      secondary knob once true conflict locations are understood

- v25.4 and below only:

  - Update ReadWithinUncertaintyIntervalError docs to:
    - Add an "Interpreting log messages" callout explaining that the logged key is the
      transaction record (anchor) key, not necessarily the actual conflict key
    - Note that SERIALIZATION_CONFLICT contention events recorded when
      `sql.contention.record_serialization_conflicts.enabled` is true also use this anchor key

- v26.1 and later only:

  - Update ReadWithinUncertaintyIntervalError docs to:
    - Add an "Interpreting log messages" callout clarifying that in 26.1+ the logged key
      represents the actual contention key, and that earlier versions reported the anchor key
    - Document that contention events now use this contention key when the
      `sql.contention.record_serialization_conflicts.enabled` setting is enabled
  - Update 'Troubleshoot lock contention' page to include a new section on using randomized
    anchor keys
@rmloveland rmloveland force-pushed the 20260223-DOC-15172-DOC-14735-randomized-anchor-keys branch from 8988c8a to db794d9 Compare February 24, 2026 16:38

1. `ReadWithinUncertaintyIntervalError` errors are only returned in rare cases that can be avoided by adjusting the [result buffer size](#result-buffer-size).

**Interpreting log messages:**
Copy link
Contributor Author

@rmloveland rmloveland Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angles-n-daemons this describes the pre-v26.1 behavior which you are changing in cockroachdb/cockroach#164157 - please take a look

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is close, but I believe this comment (and the following) are subtly incorrect. Each transaction error carries a TxnMeta which holds a key. This key is always the anchor key, both before and after 26.1. Therefore, anywhere you see a message logged with a key belonging to some meta (meta{... key=...}}, the key will always be an anchor key. The conflict key is only ever used in contention events, and not these log messages.

A second point is that ReadWithinUncertaintyIntervalError will never have a conflict key. Only the error types TransactionRetryError, WriteTooOld and ExclusionViolationError will possibly add a conflicting key to a contention event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this feedback! I've updated both pages as follows:

  1. moved this info to a separate heading that is more generic, since it's not really about the uncertainty error
  2. updated both pages to say the meta=... key is the anchor key
  3. re: the conflict key, updated v25.4 to say it's still the anchor key, and v26.1 doc to say it's the conflict key (which is the new thing)
  4. also updated both re: the list of error types that may add a conflicting key to a contention event

PTAL!


**Interpreting log messages:**

In CockroachDB {{ page.version.version }}, the `meta={... key=/Table/...}` field in log output for `ReadWithinUncertaintyIntervalError` and related serialization conflicts identifies the **actual contention key** (the key where the conflicting read or write occurred). Earlier versions could instead report the transaction's [anchor key]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#transaction-records), which made it harder to locate the true point of conflict. Contention events that are recorded when [`sql.contention.record_serialization_conflicts.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-contention-record-serialization-conflicts-enabled) is `true` use this contention key when populating the recorded conflict.
Copy link
Contributor Author

@rmloveland rmloveland Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angles-n-daemons this is meant to describe the new v26.1+ behavior going forward, please let me know what you think

- Historical queries operate below [closed timestamps]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps) and therefore have perfect concurrency characteristics - they never wait on anything and never block anything.
- Historical queries have the best possible performance, since they are served by the nearest [replica]({% link {{ page.version.version }}/architecture/glossary.md %}#replica).

### Randomize transaction anchor keys for large batched updates or inserts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arulajmani can you please review this section re: using the randomized anchor keys section?


- If applicable to your workload, assign [column families]({% link {{ page.version.version }}/column-families.md %}#default-behavior) and separate columns that are frequently read and written into separate columns. Transactions will operate on disjoint column families and reduce the likelihood of conflicts.

- For workloads where large [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`INSERT`]({% link {{ page.version.version }}/insert.md %}) transactions run concurrently over similar key ranges, watch for [transaction record]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#transaction-records) anchor hotspots (for example, many concurrent transactions with [records]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#transaction-records) on the same [range]({% link {{ page.version.version }}/architecture/glossary.md %}#range)). In these cases, consider enabling the [`transaction.randomized_anchor_key.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-transaction-randomized-anchor-key-enabled) cluster setting to randomize the location of transaction anchor keys. This can spread transaction records across ranges and reduce hotspotting. Only use this setting after confirming anchor hotspots via contention and range-level observability.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arulajmani can you please review this advice re: using the randomized anchor key setting (here and in the v25.4 include above)?

@rmloveland
Copy link
Contributor Author

@arulajmani pinged you for review since this came out of your KV on-call a while ago (details in https://cockroachlabs.atlassian.net/browse/DOC-15172)

@angles-n-daemons pinging you since we discussed your changes to the txn record key in v26.1 and later in Slack (docs issue is https://cockroachlabs.atlassian.net/browse/DOC-14735)

cc @michae2, i already asked @angles-n-daemons to review the txn record key stuff, but FYI this closes out DOC-14735 which was prompted by you answering a q in slack a long time ago :-)

Copy link

@angles-n-daemons angles-n-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the changes, looks good on my side :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants