HIVE-28578: Concurrency issue in updateTableColumnStatistics #6159

dengzhhu653 · 2025-10-30T08:02:50Z

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Tested on Postgres 17.2, MariaDB 10.3.39-MariaDB-1, MySQL 9.1.0-1.el9 and 5.7.44 and Oracle 23

deniskuzZ · 2025-11-12T17:39:14Z

It is still using pessimistic locking. how about

Transaction A: UPDATE version = version + 1 (starts at v=5)
Transaction B: UPDATE version = version + 1 (starts at v=5)

Database MVCC:
├─ Transaction A gets version 5, increments to 6, commits
├─ Transaction B sees old version 5 (MVCC snapshot)
├─ When B tries to commit:
│ ├─ Detects conflict (row changed by A)
│ ├─ updCount = 0 (WHERE clause fails - version is now 6, not 5)
│ └─ Returns null to signal conflict

 // ✅ OPTIMISTIC LOCKING: Read current version, increment, and prepare for atomic check
String currentVersionStr = table.getParameters().get(versionParamKey);
long currentVersion = (currentVersionStr != null ? Long.parseLong(currentVersionStr) : 0L);
long newVersion = currentVersion + 1;
newParams.put(versionParamKey, String.valueOf(newVersion));
        
oldt.setParameters(newParams);
        
 // ✅ Atomically increment version with conflict detection
// This UPDATE will fail if another transaction changed the version
int updCount = incrementTableVersionAtomic(mTable.getId(), versionParamKey, currentVersion, newVersion);
        
if (updCount == 0) {
   // Concurrent modification detected - retry
   LOG.debug("Table {}.{} was modified by another transaction (version {} changed), retrying...", dbname, name, currentVersion);
   throw new RetryingExecutor.RetryException(
              new MetaException("Optimistic lock failure - table version changed"));
}
        
LOG.debug("Successfully updated table {}.{} version: {} -> {}", dbname, name, currentVersion, newVersion);

.............

  private int incrementTableVersionAtomic(long tblId, String versionParamKey, 
      long expectedVersion, long newVersion) throws MetaException {
    
    try {
      // First, try to UPDATE with optimistic lock check
      String updateSQL = "UPDATE \"TABLE_PARAMS\" " +
          "SET \"PARAM_VALUE\" = '" + newVersion + "' " +
          "WHERE \"TBL_ID\" = " + tblId + 
          " AND \"PARAM_KEY\" = '" + versionParamKey + "' " +
          " AND \"PARAM_VALUE\" = '" + expectedVersion + "'";
      
      int updCount = executePlainSQLUpdate(updateSQL);
      
      if (updCount == 1) {
        // Success - version was incremented
        return 1;
      }

dengzhhu653 · 2025-11-13T00:56:49Z

Thank you @deniskuzZ for the comment.

 String updateSQL = "UPDATE \"TABLE_PARAMS\" " +   "SET \"PARAM_VALUE\" = '" + newVersion + "' " +
          "WHERE \"TBL_ID\" = " + tblId + " AND \"PARAM_KEY\" = '" + versionParamKey + "' " +
          AND \"PARAM_VALUE\" = '" + expectedVersion + "'";  // ✅ CHECK SNAPSHOT!

The result of this query seems important to the example, let's say there is a row(TBL_ID, PARAM_KEY, PARAM_VALUE) (1, hive.metastore.table.version, 1) on the table TABLE_PARAMS, if transaction A and B happens to execute the update(set hive.metastore.table.version = 2) at the same time, say if A takes the row lock, then B needs to wait for A committing or rollbacking to release the row before B is allowed to update this row, then B re-evaluates the where condition and see there is no row matched, then return 0.

If there are more transactions to update this row, then they are piled up to get a change to take over the row lock. In my opinion, this is similar to the s4u way I proposed in the old PR.

dengzhhu653 · 2025-11-13T01:13:11Z

I tried the similar update on MySQL, the black transaction is waiting until "Lock wait timeout exceeded",

deniskuzZ · 2025-11-13T10:01:21Z

I tried the similar update on MySQL, the black transaction is waiting until "Lock wait timeout exceeded",

that is 100% true, however, MVCC is better because:

Faster claim (1ms vs 10ms)
Parallel claim attempts (database resolves conflicts)
Simpler code (no savepoints) Database uses MVCC to serialize at commit time
No locks held during work phase

PS: we already use MVCC in ObjectStore: updateParameterWithExpectedValue()

updated patch:

Subject: [PATCH] DRAFT
---
Index: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
--- a/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java	(revision c729ea19807c0c0ca6f1df4870fff49660e95a85)
+++ b/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java	(date 1763028067727)
@@ -9171,21 +9171,39 @@
       int maxRetries = MetastoreConf.getIntVar(conf, ConfVars.METASTORE_S4U_NOWAIT_MAX_RETRIES);
       long sleepInterval = MetastoreConf.getTimeVar(conf,
           ConfVars.METASTORE_S4U_NOWAIT_RETRY_SLEEP_INTERVAL, TimeUnit.MILLISECONDS);
+      
+      final String versionParamKey = "hive.metastore.table.version";
+      
       Map<String, String> result = new RetryingExecutor<>(maxRetries, () -> {
-        Ref<Exception> exceptionRef = new Ref<>();
-        String savePoint = "uts_" + ThreadLocalRandom.current().nextInt(10000) + "_" + System.nanoTime();
-        setTransactionSavePoint(savePoint);
-        executePlainSQL(
-            sqlGenerator.addForUpdateNoWait("SELECT \"TBL_ID\" FROM \"TBLS\" WHERE \"TBL_ID\" = " + mTable.getId()),
-            exception -> {
-              rollbackTransactionToSavePoint(savePoint);
-              exceptionRef.t = exception;
-            });
-        if (exceptionRef.t != null) {
-          throw new RetryingExecutor.RetryException(exceptionRef.t);
-        }
         pm.refresh(mTable);
         Table table = convertToTable(mTable);
+        String dbname = table.getDbName();
+        String name = table.getTableName();
+        
+        // ✅ STEP 1: Read current version snapshot from TABLE_PARAMS
+        String expectedVersionStr = table.getParameters().get(versionParamKey);
+        if (expectedVersionStr == null) {
+          expectedVersionStr = "0";
+        }
+        long newVersion = Long.parseLong(expectedVersionStr) + 1;
+        String newVersionStr = String.valueOf(newVersion);
+        
+        // ✅ STEP 2: Atomically claim the version using existing MVCC API
+        // This uses UPDATE with WHERE clause to check snapshot hasn't changed
+        long affectedRows = updateParameterWithExpectedValue(table, versionParamKey, expectedVersionStr, newVersionStr);
+        
+        if (affectedRows != 1) {
+          // Version conflict - PARAM_VALUE changed since we read it (concurrent modification)
+          LOG.debug("Table {}.{} version conflict (expected={}), retrying...", dbname, name, expectedVersionStr);
+          throw new RetryingExecutor.RetryException(
+              new MetaException("The table has been modified. The parameter value for key '" + 
+                  versionParamKey + "' is different"));
+        }
+        
+        // ✅ STEP 3: Successfully claimed version - now do the work
+        LOG.debug("Claimed table {}.{} version {} -> {}, proceeding with stats update", 
+            dbname, name, expectedVersionStr, newVersion);
+        
         List<String> colNames = new ArrayList<>();
         for (ColumnStatisticsObj statsObj : statsObjs) {
           colNames.add(statsObj.getColName());
@@ -9201,17 +9219,14 @@
           MTableColumnStatistics mStatsObj = StatObjectConverter.convertToMTableColumnStatistics(mTable, statsDesc,
               statsObj, colStats.getEngine());
           writeMTableColumnStatistics(table, mStatsObj, oldStats.get(statsObj.getColName()));
-          // There is no need to add colname again, otherwise we will get duplicate colNames.
         }
 
         // TODO: (HIVE-20109) ideally the col stats stats should be in colstats, not in the table!
         // Set the table properties
-        // No need to check again if it exists.
-        String dbname = table.getDbName();
-        String name = table.getTableName();
         MTable oldt = mTable;
         Map<String, String> newParams = new HashMap<>(table.getParameters());
         StatsSetupConst.setColumnStatsState(newParams, colNames);
+        
         boolean isTxn = TxnUtils.isTransactionalTable(oldt.getParameters());
         if (isTxn) {
           if (!areTxnStatsSupported) {
@@ -9230,7 +9245,11 @@
             oldt.setWriteId(writeId);
           }
         }
+        
+        // ✅ STEP 4: Add the new version to params (already updated in DB via directSql)
+        newParams.put(versionParamKey, newVersionStr);
         oldt.setParameters(newParams);
+
         return newParams;
       }).onRetry(e -> e instanceof RetryingExecutor.RetryException)
         .commandName("updateTableColumnStatistics").sleepInterval(sleepInterval, interval ->

deniskuzZ · 2025-11-13T10:07:35Z

in any case, solution in this PR is acceptable and OK to merge, but please consider using MVCC (i.e. updateParameterWithExpectedValue) since it's already used (please see comment above).

Another question: since we’re adding RetryingExecutor here, what’s the role of RetryingMetaStoreClient? Doesn’t it already handle the same functionality?

dengzhhu653 · 2025-11-13T10:50:28Z

in any case, solution in this PR is acceptable and OK to merge, but please consider using MVCC (i.e. updateParameterWithExpectedValue) since it's already used (please see comment above).

Another question: since we’re adding RetryingExecutor here, what’s the role of RetryingMetaStoreClient? Doesn’t it already handle the same functionality?

RetryingMetaStoreClient basically retries the call on thrift lawyer exception, such as service shutdown(connection refused or timeout), or incompatible protocol(connect to an old HMS which thrift method hasn't been introduced).

RetryingExecutor doesn't need a tcp round-trip, it's more efficient than RetryingMetaStoreClient, i.e, no need to serialize or de-serialize the thrift message and convey the message over the connection.

IMO in the MVCC the transaction still holds the row lock in case of affected rows = 1, it might be rolled back in the middle of the transaction, so other updates need this transaction to commit or rollback to get the right version of this row(avoid to read dirty).

deniskuzZ · 2025-11-13T12:33:43Z

MVCC the transaction still holds the row lock in case of affected rows = 1, it might be rolled back in the middle of the transaction, so other updates need this transaction to commit or rollback to get the right version of this row(avoid to read dirty)

Performance Comparison (Updates Only)

Aspect	MVCC + Retry	S4U NOWAIT + Retry
Locking behavior	Row-level lock acquired at commit; version checks allow other transactions to proceed	Row-level lock attempted immediately; fails if locked
Conflict detection	At commit; may do speculative work before detecting conflict	Immediately at lock acquisition; no speculative work
Retries	Only if affected row changed; may retry heavier work	On immediate lock failure; retries are lightweight
Throughput under low contention	High	High
Throughput under high contention	Medium (retries on conflicts; speculative work adds overhead)	Lower (more frequent immediate failures, but less wasted work)
CPU overhead	Higher under high contention due to speculative work and version checks	Can be lower per transaction, but frequent retries increase CPU
Latency	Slightly variable; may spike if many conflicts	Immediate failures add retry latency

deniskuzZ · 2025-11-13T12:49:23Z

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java

  private boolean areTxnStatsSupported = false;
  private PropertyStore propertyStore;

-  private static Striped<Lock> tablelocks;


why remove this optimization? it can safe some resources when concurrency happens on HMS process level

Now we can depend on a shared outer lock solely, if we worry about the CPU overhead it brought to the database, I can add it back.

Another point is the retry is random but at least 30ms, and the row is located by the primary key, so I guess the CPU overhead might not be so high as we think

probably you are right

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java

deniskuzZ · 2025-11-13T13:20:29Z

...tastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DatabaseProduct.java

+          return selectStatement + " for update NOWAIT";
+        } else {
+          int selectLength = "select".length();
+          return selectStatement.trim().substring(0, selectLength) + " /*+ MAX_EXECUTION_TIME(300) */ " +


how reliable is that? it won't take row-level lock

it takes the row-level lock as well, and maximum wait time for the lock is 300ms, otherwise throws:

Caused by: java.sql.SQLException: Query execution was interrupted, maximum statement execution time exceeded

Checked MySQL 5.7, using mysql-connector-j-8.0.32.jar and mysql-connector-java-5.1.49.jar

deniskuzZ

+1, pending tests

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java

deniskuzZ · 2025-11-13T14:18:15Z

Since the retries are handled by server-side via RetryingExecutor, do we actually need S4U NOWAIT? I initially assumed that the client would be responsible for retrying.
Now #5929 makes sense.

dengzhhu653 · 2025-11-13T14:39:10Z

Since the retries are handled by server-side via RetryingExecutor, do we actually need S4U NOWAIT? I initially assumed that the client would be responsible for retrying. Now #5929 makes sense.

This is a more optimistic way compared to S4U, NOWAIT helps prevent the long hang for waiting for the lock, reduce the hot contention, personally I like this way.

sonarqubecloud · 2025-11-14T00:16:35Z

Quality Gate passed

Issues
11 New issues
0 Accepted issues

Measures
2 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

dengzhhu653 · 2025-11-14T06:25:16Z

Filed another jira HIVE-29316 to fix the similar issue on updatePartitionColumnStatistics

dengzhhu653 · 2025-11-14T06:26:19Z

Thank you @deniskuzZ for the review!

asf-ci-hive added the tests pending label Oct 30, 2025

dengzhhu653 mentioned this pull request Oct 30, 2025

HIVE-28578: Concurrency issue in updateTableColumnStatistics #5929

Closed

asf-ci-hive added tests unstable tests pending and removed tests pending tests unstable labels Oct 30, 2025

dengzhhu653 force-pushed the HIVE-28578-optimistic branch from c26894e to 65ac4a0 Compare October 31, 2025 07:02

dengzhhu653 requested a review from deniskuzZ October 31, 2025 07:05

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending tests passed labels Oct 31, 2025

dengzhhu653 requested review from nrg4878 and saihemanth-cloudera November 4, 2025 14:34

asf-ci-hive added tests pending tests unstable and removed tests passed tests pending tests unstable labels Nov 12, 2025

dengzhhu653 force-pushed the HIVE-28578-optimistic branch from 7472b31 to c729ea1 Compare November 12, 2025 13:35

asf-ci-hive removed the tests unstable label Nov 12, 2025

asf-ci-hive added tests unstable and removed tests pending labels Nov 12, 2025

deniskuzZ reviewed Nov 13, 2025

View reviewed changes

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java Show resolved Hide resolved

deniskuzZ reviewed Nov 13, 2025

View reviewed changes

deniskuzZ approved these changes Nov 13, 2025

View reviewed changes

deniskuzZ reviewed Nov 13, 2025

View reviewed changes

...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java Outdated Show resolved Hide resolved

dengzhhu653 added 4 commits November 14, 2025 07:01

HIVE-28578: Concurrency issue in updateTableColumnStatistics

fc21f62

fix mariadb if using mysql driver

1739aca

retry policy

ff2e707

debug log and increase the retry

4911fd7

dengzhhu653 force-pushed the HIVE-28578-optimistic branch from c729ea1 to 4911fd7 Compare November 13, 2025 23:01

asf-ci-hive added tests pending and removed tests unstable labels Nov 13, 2025

asf-ci-hive added tests passed and removed tests pending labels Nov 14, 2025

dengzhhu653 merged commit a947925 into apache:master Nov 14, 2025
3 of 4 checks passed

dengzhhu653 deleted the HIVE-28578-optimistic branch November 14, 2025 06:26

HIVE-28578: Concurrency issue in updateTableColumnStatistics #6159

HIVE-28578: Concurrency issue in updateTableColumnStatistics #6159

Uh oh!

Conversation

dengzhhu653 commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

deniskuzZ commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dengzhhu653 commented Nov 13, 2025

Uh oh!

dengzhhu653 commented Nov 13, 2025

Uh oh!

deniskuzZ commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deniskuzZ commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dengzhhu653 commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deniskuzZ commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Comparison (Updates Only)

Uh oh!

deniskuzZ Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

dengzhhu653 Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dengzhhu653 Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

deniskuzZ Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deniskuzZ Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dengzhhu653 Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deniskuzZ commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dengzhhu653 commented Nov 13, 2025

Uh oh!

sonarqubecloud bot commented Nov 14, 2025

Quality Gate passed

Uh oh!

dengzhhu653 commented Nov 14, 2025

Uh oh!

Uh oh!

dengzhhu653 commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dengzhhu653 commented Oct 30, 2025 •

edited

Loading

deniskuzZ commented Nov 12, 2025 •

edited

Loading

deniskuzZ commented Nov 13, 2025 •

edited

Loading

deniskuzZ commented Nov 13, 2025 •

edited

Loading

dengzhhu653 commented Nov 13, 2025 •

edited

Loading

deniskuzZ commented Nov 13, 2025 •

edited

Loading

dengzhhu653 Nov 13, 2025 •

edited

Loading

deniskuzZ Nov 13, 2025 •

edited

Loading

dengzhhu653 Nov 13, 2025 •

edited

Loading

deniskuzZ commented Nov 13, 2025 •

edited

Loading