Clarify uncertainty error keys and document randomized anchor key tuning#22814
Clarify uncertainty error keys and document randomized anchor key tuning#22814rmloveland wants to merge 2 commits intomainfrom
Conversation
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
Files changed:
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
c2829cd to
1ada635
Compare
1ada635 to
8988c8a
Compare
Fixes: - DOC-14735 - DOC-15172 Summary of changes: - v25.4 and below, and v26.1 and later: - Update ReadWithinUncertaintyIntervalError docs to: - Extend the example to include a larger `meta={key=/Table/...}` fragment - Add randomized anchor key tuning guidance to the shared `performance/reduce-contention.md` include: - Describe when to consider `transaction.randomized_anchor_key.enabled` for workloads with large concurrent UPDATE/INSERT batches that create transaction record (anchor) hotspots - Emphasize that this setting randomizes anchor placement (not user data) to spread txn records across ranges - In v26.1, tie this guidance to the improved observability: contention events and logs already report the actual contention key, so anchor randomization is a secondary knob once true conflict locations are understood - v25.4 and below only: - Update ReadWithinUncertaintyIntervalError docs to: - Add an "Interpreting log messages" callout explaining that the logged key is the transaction record (anchor) key, not necessarily the actual conflict key - Note that SERIALIZATION_CONFLICT contention events recorded when `sql.contention.record_serialization_conflicts.enabled` is true also use this anchor key - v26.1 and later only: - Update ReadWithinUncertaintyIntervalError docs to: - Add an "Interpreting log messages" callout clarifying that in 26.1+ the logged key represents the actual contention key, and that earlier versions reported the anchor key - Document that contention events now use this contention key when the `sql.contention.record_serialization_conflicts.enabled` setting is enabled - Update 'Troubleshoot lock contention' page to include a new section on using randomized anchor keys
8988c8a to
db794d9
Compare
|
|
||
| 1. `ReadWithinUncertaintyIntervalError` errors are only returned in rare cases that can be avoided by adjusting the [result buffer size](#result-buffer-size). | ||
|
|
||
| **Interpreting log messages:** |
There was a problem hiding this comment.
@angles-n-daemons this describes the pre-v26.1 behavior which you are changing in cockroachdb/cockroach#164157 - please take a look
There was a problem hiding this comment.
This is close, but I believe this comment (and the following) are subtly incorrect. Each transaction error carries a TxnMeta which holds a key. This key is always the anchor key, both before and after 26.1. Therefore, anywhere you see a message logged with a key belonging to some meta (meta{... key=...}}, the key will always be an anchor key. The conflict key is only ever used in contention events, and not these log messages.
A second point is that ReadWithinUncertaintyIntervalError will never have a conflict key. Only the error types TransactionRetryError, WriteTooOld and ExclusionViolationError will possibly add a conflicting key to a contention event.
There was a problem hiding this comment.
thanks for this feedback! I've updated both pages as follows:
- moved this info to a separate heading that is more generic, since it's not really about the uncertainty error
- updated both pages to say the
meta=...key is the anchor key - re: the conflict key, updated v25.4 to say it's still the anchor key, and v26.1 doc to say it's the conflict key (which is the new thing)
- also updated both re: the list of error types that may add a conflicting key to a contention event
PTAL!
|
|
||
| **Interpreting log messages:** | ||
|
|
||
| In CockroachDB {{ page.version.version }}, the `meta={... key=/Table/...}` field in log output for `ReadWithinUncertaintyIntervalError` and related serialization conflicts identifies the **actual contention key** (the key where the conflicting read or write occurred). Earlier versions could instead report the transaction's [anchor key]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#transaction-records), which made it harder to locate the true point of conflict. Contention events that are recorded when [`sql.contention.record_serialization_conflicts.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-sql-contention-record-serialization-conflicts-enabled) is `true` use this contention key when populating the recorded conflict. |
There was a problem hiding this comment.
@angles-n-daemons this is meant to describe the new v26.1+ behavior going forward, please let me know what you think
| - Historical queries operate below [closed timestamps]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps) and therefore have perfect concurrency characteristics - they never wait on anything and never block anything. | ||
| - Historical queries have the best possible performance, since they are served by the nearest [replica]({% link {{ page.version.version }}/architecture/glossary.md %}#replica). | ||
|
|
||
| ### Randomize transaction anchor keys for large batched updates or inserts |
There was a problem hiding this comment.
@arulajmani can you please review this section re: using the randomized anchor keys section?
|
|
||
| - If applicable to your workload, assign [column families]({% link {{ page.version.version }}/column-families.md %}#default-behavior) and separate columns that are frequently read and written into separate columns. Transactions will operate on disjoint column families and reduce the likelihood of conflicts. | ||
|
|
||
| - For workloads where large [`UPDATE`]({% link {{ page.version.version }}/update.md %}) or [`INSERT`]({% link {{ page.version.version }}/insert.md %}) transactions run concurrently over similar key ranges, watch for [transaction record]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#transaction-records) anchor hotspots (for example, many concurrent transactions with [records]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#transaction-records) on the same [range]({% link {{ page.version.version }}/architecture/glossary.md %}#range)). In these cases, consider enabling the [`transaction.randomized_anchor_key.enabled`]({% link {{ page.version.version }}/cluster-settings.md %}#setting-kv-transaction-randomized-anchor-key-enabled) cluster setting to randomize the location of transaction anchor keys. This can spread transaction records across ranges and reduce hotspotting. Only use this setting after confirming anchor hotspots via contention and range-level observability. |
There was a problem hiding this comment.
@arulajmani can you please review this advice re: using the randomized anchor key setting (here and in the v25.4 include above)?
|
@arulajmani pinged you for review since this came out of your KV on-call a while ago (details in https://cockroachlabs.atlassian.net/browse/DOC-15172) @angles-n-daemons pinging you since we discussed your changes to the txn record key in v26.1 and later in Slack (docs issue is https://cockroachlabs.atlassian.net/browse/DOC-14735) cc @michae2, i already asked @angles-n-daemons to review the txn record key stuff, but FYI this closes out DOC-14735 which was prompted by you answering a q in slack a long time ago :-) |
angles-n-daemons
left a comment
There was a problem hiding this comment.
thanks for the changes, looks good on my side :)
Clarify uncertainty error keys and document randomized anchor key tuning
Fixes:
Summary of changes:
v25.4 and below, and v26.1 and later:
meta={key=/Table/...}fragmentperformance/reduce-contention.mdinclude:
transaction.randomized_anchor_key.enabledfor workloads withlarge concurrent UPDATE/INSERT batches that create transaction record (anchor) hotspots
records across ranges
and logs already report the actual contention key, so anchor randomization is a
secondary knob once true conflict locations are understood
v25.4 and below only:
transaction record (anchor) key, not necessarily the actual conflict key
sql.contention.record_serialization_conflicts.enabledis true also use this anchor keyv26.1 and later only:
represents the actual contention key, and that earlier versions reported the anchor key
sql.contention.record_serialization_conflicts.enabledsetting is enabledanchor keys