Skip to content

Conversation

dr-m
Copy link
Contributor

@dr-m dr-m commented Aug 27, 2025

  • The Jira issue number for this PR is: MDEV-37482

Description

The number of hash table cells in the InnoDB adaptive hash index was fixed on the initial innodb_buffer_pool_size and insufficient for some workloads, leading to excessively long hash bucket chains.

Furthermore, btr_sea::partition::insert() and btr_sea::partition::erase() operations will be optimized to prefer a combination of a shared latch and a page_hash_lock that is pushed down to the hash table. In this way, these operations can run concurrently with each other as well as searches on other parts of the hash table of the same btr_sea::partition.

Release Notes

We introduce the parameter innodb_adaptive_hash_index_cells that can be configured with SET GLOBAL. The specified value will be effectively multiplied by innodb_adaptive_hash_index_parts, because each partition will contain its own hash table.

How can this PR be tested?

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@dr-m dr-m requested a review from vlad-lesin August 27, 2025 11:49
@dr-m dr-m self-assigned this Aug 27, 2025
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment on lines 3769 to 3773
static MYSQL_SYSVAR_UINT(adaptive_hash_index_cells, btr_search.n_cells,
PLUGIN_VAR_RQCMDARG,
"Number of adaptive hash table cells",
nullptr, innodb_adaptive_hash_index_cells_update, 0, 0, UINT_MAX, 0);
#endif /* BTR_CUR_HASH_ADAPT */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iMineLink pointed out that this forms an artificial limitation on really large buffer pools. This would have to be size_t instead of uint.

Furthermore, I’d like to see if we could repurpose innodb_adaptive_hash_index_parts. That is, hard-wire the partitions to 1, and make that parameter control the hash table size. Possibly, it could be feasible to replace the btr_sea::partition::latch with latches that are embedded in the hash array, like we do it for buf_pool.page_hash and lock_sys.rec_hash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the number of partitions becomes hardwired to 1, I believe that having a size_t variable for the number of cells may be less interesting as the CRC32C does not provide enough bits of entropy to distribute the values over more than 232-1 cells, making uint wide enough. If wider hash would be implemented (xxHash could be an interesting candidate), it would make more sense to switch to size_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I did not realize that the uint actually corresponds to the available entropy of CRC-32C. Anything larger than that definitely is useless.

We actually have an implementation of xxHash in the rocksdb submodule. For some reason, its AVX-512 implementation failed to compile in my environment, and I had commented it out:

diff --git a/util/xxhash.h b/util/xxhash.h
index 9846861b7..d38c83ef5 100644
--- a/util/xxhash.h
+++ b/util/xxhash.h
@@ -2745,7 +2745,7 @@ enum XXH_VECTOR_TYPE /* fake enum */ {
 #endif
 
 #ifndef XXH_VECTOR    /* can be defined on command line */
-#  if defined(__AVX512F__)
+#  if 0&&defined(__AVX512F__)
 #    define XXH_VECTOR XXH_AVX512
 #  elif defined(__AVX2__)
 #    define XXH_VECTOR XXH_AVX2

It would be interesting to test the relative performance of this, compared to CRC-32C, on a few different processor implementations (one with AVX512 for both, and another comparing the AVX2 implementation to crc32c_3way()). While doing this, we must keep in mind that dtuple_fold() is currently computing a hash piecewise, one field at a time. It looks like XXH64_update() in the above code would support appending any number of bytes to the hash. It looks like we may have more comprehensive cross-ISA SIMD coverage for our crc32 implementations than for xxHash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following would avoid the compilation error without disabling the AVX512 implementation:

diff --git a/util/xxhash.h b/util/xxhash.h
index 9846861b7..a5742a157 100644
--- a/util/xxhash.h
+++ b/util/xxhash.h
@@ -3658,7 +3658,7 @@ XXH3_scrambleAcc_avx512(void* XXH_RESTRICT acc, const void* XXH_RESTRICT secret)
     }
 }
 
-XXH_FORCE_INLINE XXH_TARGET_AVX512 void
+XXH_TARGET_AVX512 void
 XXH3_initCustomSecret_avx512(void* XXH_RESTRICT customSecret, xxh_u64 seed64)
 {
     XXH_STATIC_ASSERT((XXH_SECRET_DEFAULT_SIZE & 63) == 0);

This patch is apparently only needed with CMAKE_BUILD_TYPE=Debug, not with CMAKE_BUILD_TYPE=RelWithDebInfo.
The upstream source code https://github.com/Cyan4973/xxHash is more up to date and includes a benchmark:

make -C tests/bench clean benchHash_avx512
tests/bench/benchHash_avx512

Compared to the plain benchHash, the benchHash_avx512 appears to speed up xxh3 and XXH128 only. I understood that there is a 64-bit and a 128-bit variant of XXH3, which the benchmark is referring to with the names xxh3 and XXH128.

The interface to the XXH32 function seems to be compatible with crc32 functions. I added the crc32_avx512.cc from https://github.com/dr-m/crc32_simd to the directory and patched the code to make use of it:

diff --git a/tests/bench/Makefile b/tests/bench/Makefile
index ec5d56f..974a2f7 100644
--- a/tests/bench/Makefile
+++ b/tests/bench/Makefile
@@ -36,7 +36,7 @@ CXXFLAGS ?= -O3
 LDFLAGS  += $(MOREFLAGS)
 
 
-OBJ_LIST  = main.o bhDisplay.o benchHash.o benchfn.o timefn.o
+OBJ_LIST  = main.o bhDisplay.o benchHash.o benchfn.o timefn.o crc32_avx512.o
 
 
 default: benchHash
diff --git a/tests/bench/hashes.h b/tests/bench/hashes.h
index 39f6e0f..d4f3c8f 100644
--- a/tests/bench/hashes.h
+++ b/tests/bench/hashes.h
@@ -66,10 +66,12 @@
 #define XXH_INLINE_ALL
 #include "xxhash.h"
 
+unsigned crc32c_refl_vpclmulqdq(unsigned crc, const void *buf, size_t size);
+
 size_t XXH32_wrapper(const void* src, size_t srcSize, void* dst, size_t dstCapacity, void* customPayload)
 {
     (void)dst; (void)dstCapacity; (void)customPayload;
-    return (size_t) XXH32(src, srcSize, 0);
+    return (size_t) crc32c_refl_vpclmulqdq(0, src, srcSize);
 }
 

I also worked around a warning from GCC 15.2.0 elsewhere:

diff --git a/tests/test_alias.c b/tests/test_alias.c
index af1065f..0ed4925 100644
--- a/tests/test_alias.c
+++ b/tests/test_alias.c
@@ -7,7 +7,7 @@
 int main() {
 	// it seems this has to be exactly 24 bytes.
 	union {
-		char x[24];
+		char x[24] __attribute__((nonstring));
 		// force 8-byte alignment without making
 		// aliasable with uint64_t.
 		void *y[3];

The numbers that are output by this single-threaded test seem to be proportional to the execution time. Below is the output from the above patched program for an AMD Ryzen AI 9 HX 370:

 ===  benchmarking 4 hash functions  === 
benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27) 
xxh3   , 74182, 84666, 83672, 89621, 92826, 94608, 95333, 95214, 95421, 95539, 95428, 92181, 89884, 91424, 93322, 74003, 61454, 55054, 54935
XXH32  , 50964, 59345, 66989, 74916, 76056, 76814, 77486, 77244, 77221, 75709, 78064, 77477, 76820, 76725, 77119, 64710, 65586, 67875, 61154
XXH64  , 23776, 25692, 26763, 27317, 27243, 27885, 28019, 27526, 27998, 27915, 26876, 27732, 27498, 27587, 27589, 26237, 26248, 26817, 27648
XXH128 , 47036, 67880, 77207, 84151, 90032, 86994, 90539, 91385, 90459, 93035, 93205, 86592, 85137, 86459, 86074, 65815, 62117, 58866, 52468
Throughput small inputs of fixed size (from 1 to 30 bytes): 
xxh3   , 684966002, 660267932, 686242542, 591918279, 593796881, 653788552, 691280352, 693917070, 796122351, 799086322, 606649579, 607106262, 743145838, 770496269, 773394061, 777579358, 538555002, 532533910, 552009941, 548027883, 543334100, 540782406, 531313570, 531370730, 536021827, 535807626, 533730883, 507836101, 529489272, 527372639
XXH32  , 391192227, 394170674, 392500119, 320799639, 325310066, 300092059, 325021742, 325166174, 319339892, 324440142, 320935576, 319710875, 318605694, 320033480, 318768959, 321140626, 266827038, 269546291, 270451691, 271340836, 271642933, 268264601, 270806888, 263607977, 267720899, 268822868, 266933926, 253647371, 269115020, 269411779
XXH64  , 531855115, 500845328, 445383908, 560522481, 466170811, 401734154, 353106295, 465769876, 387646727, 348885765, 313642848, 373218102, 330846547, 292677619, 263810140, 379560978, 331417038, 284102612, 262724111, 314605881, 275802854, 247115565, 227006968, 313150520, 271040908, 250206062, 235322131, 264888019, 239329223, 218043430
XXH128 , 600416903, 598992972, 597492462, 597645528, 600791325, 598841509, 600595595, 602564178, 543290108, 551507908, 559672652, 567830015, 572610884, 573292836, 461124110, 568812453, 441289653, 443379326, 449714793, 445896308, 443586459, 441661817, 452889973, 451724824, 355037442, 454808260, 453913239, 455387883, 445214930, 434933402
benchmarking random size inputs [1-N] : 
xxh3   , 536254141, 537572868, 538169107, 530058591, 554132343, 567626107, 574962298, 574064510, 575911003, 512575947, 733187750, 730542641, 717504120, 745609269, 744037097, 743641148, 700682822, 680405902, 658648507, 654970094, 643835364, 633237371, 650883337, 651155217, 645069308, 643558465, 641581970, 633572643, 632410158, 635136736
XXH32  , 357112193, 380404113, 393745929, 279481611, 328882531, 327669975, 326555116, 323169043, 304338634, 296146461, 284067481, 285193023, 261837568, 261916119, 271470126, 286578410, 297919080, 310922451, 317438909, 308153526, 310695050, 237882917, 306093451, 292329079, 304391960, 292037006, 301689552, 294104782, 297568491, 288948386
XXH64  , 586983878, 540011057, 505108609, 515782962, 506867672, 482203869, 458474345, 460857782, 454959882, 444087636, 436830029, 426223145, 414820438, 404221782, 389029293, 392782728, 383694624, 373786265, 372277845, 369071752, 358965895, 355644767, 343765355, 345120575, 332244312, 330058777, 327488924, 324763345, 328342498, 316925302
XXH128 , 600494980, 601024972, 600815707, 709364018, 721656467, 717393849, 728587806, 727638953, 714389847, 640170770, 704909059, 691095633, 677935985, 671790219, 664878820, 661493496, 634991187, 617963404, 614205735, 600351968, 591558571, 582670037, 572910083, 570938552, 561659226, 552558359, 551863000, 543461642, 544238249, 540090042
Latency for small inputs of fixed size : 
xxh3   , 232758925, 231240688, 235893537, 218715797, 218702972, 217121209, 215070339, 216124609, 228496964, 227321150, 220489768, 216466240, 225549489, 222822477, 218329570, 228193366, 199722428, 202750981, 207964315, 205000677, 211305233, 204392876, 203269104, 208302695, 207763940, 205726991, 203989265, 192116024, 202467348, 205083400
XXH32  , 141043491, 140301324, 140486628,  98556109,  98245408,  98513733,  98804565,  98542229,  96771917,  96485345,  97056348,  94577797,  99446605,  99849828,  98193147, 105409603,  85836960,  86520583,  85472248,  85250394,  86339994,  85767916,  86837601,  86207302,  86573186,  86054076,  84703667,  85890112,  85643206,  85543224
XXH64  , 192012704, 162931329, 140704020, 186484149, 155708976, 135887600, 119828833, 161083460, 136889778, 118022610, 108682607, 134063193, 119996471, 107955849,  95983286, 134099504, 119074090,  93820245,  96247846, 116644599, 104156252,  95846157,  86348288, 114379316, 105948087,  95571192,  87200480, 101708869,  92938617,  85751257
XXH128 , 236321318, 236073631, 236102127, 208280715, 208310530, 210537461, 207243248, 210688513, 194382849, 193552942, 191795246, 192578612, 193381769, 194508669, 192642081, 194463399, 203166631, 203207549, 194373178, 195655373, 192597790, 200018834, 198876272, 202741660, 200240055, 193788601, 199949428, 196226837, 193538519, 202656164
Latency for small inputs of random size [1-N] : 
xxh3   , 235249462, 235093356, 235020196, 230137534, 226608695, 225257538, 223406822, 222481072, 220464520, 220366994, 220493278, 220447660, 221115749, 221203118, 220894785, 222059533, 220315846, 219357725, 218859180, 219266622, 218436838, 217102632, 217388236, 214020849, 216201591, 215447083, 214364303, 215011443, 214583615, 213633500
XXH32  , 138574687, 138200659, 138254096, 127152154, 118652610, 116125087, 113038971, 111430621, 109049625, 107047720, 106949323, 105718789, 104509137, 104077495, 103943496, 103957514, 102501662, 101947719, 101586218, 100532803,  99238931,  98585253,  97934961,  96996536,  96221852,  95664977,  95513748,  95093612,  95705998,  94256848
XXH64  , 191493620, 173300634, 163516791, 167001254, 166128454, 158375539, 152604393, 153758844, 151630496, 147949428, 144272111, 141345000, 138796754, 136163707, 132040948, 133526774, 131023326, 128485376, 128065667, 127330599, 126536183, 124805157, 121684005, 122473701, 119697993, 117956051, 117532869, 116771538, 117506207, 114484195
XXH128 , 235632251, 236305541, 235991633, 228243862, 223842736, 222420190, 219975757, 218645414, 214840004, 210478170, 210057345, 207610543, 205595771, 204385614, 203421740, 202734423, 201489640, 201351797, 202091043, 201638052, 201245985, 201648995, 200984775, 200011769, 199330525, 199704794, 198743103, 198247503, 199099688, 198201541

For the first output, the CRC-32C-disguised-as-XXH32 is reporting 3 to 6 times the numbers that the genuine XXH32 would report. While this easily beats the unaccelerated (?) XXH64, the 64-bit xxh3 seems to be a clear winner.

That is, it could be worth the effort to implement the 64-bit xxh3() as an alternative for rec_fold() and dtuple_fold() in a separate piece of work. However, we must keep in mind that the above benchmark is not at all covering cases where a hash is being computed piecewise, like it is the case in dtuple_fold() where we invoke my_crc32c() on each index field separately. In fact, it is dtuple_fold() that is being executed most of the time. That the rec_fold() is tail-calling my_crc32c() on a contiguous buffer only benefits AHI modifications, not key lookups.

@dr-m dr-m marked this pull request as draft September 9, 2025 13:58
@dr-m
Copy link
Contributor Author

dr-m commented Sep 9, 2025

5ff7fec is a start of improving this further. It needs to be combined with the previous changes, that is, introducing innodb_adaptive_hash_index_cells. I realize that we cannot easily replace btr_sea::partition::latch. What we can do is to make the hash tables much like lock_sys.rec_hash, that is, use a combination of a shared btr_sea::partition::latch and an individual rw-lock inside the btr_sea::partition::table array. This should allow us to use mostly a shared btr_sea::partition::latch in places where we currently hold an exclusive one.

To reduce contention between insert, erase and search, let us mimic
commit b08448d (MDEV-20612).
That is, btr_sea::partition::insert() and btr_sea::partition::erase()
will use a combination of a shared btr_sea::partition::latch and a
tiny page_hash_latch that is pushed down to the btr_sea::hash_table::array.

An exclusive btr_sea::partition::latch will be used in the final part of
btr_search_drop_page_hash_index(), where we must guarantee that all
entries will be removed, as well as in operations that affect an entire
adaptive hash index partition.

btr_sea::hash_chain: Chain of ahi_node hash buckets.

btr_sea::hash_table: A hash table that includes page_hash_latch
interleaved with hash_chain.

page_hash_latch::try_lock(): Attempt to acquire an exclusive latch
without waiting.

btr_search_guess_on_hash(): Acquire also the page_hash_latch in order
to prevent a concurrent modification of the hash bucket chain that our
lookup is traversing.

btr_sea::partition::erase(): Add template<bool ex> for indicating whether
an exclusive or a shared btr_sea::partition::latch is being held.
If the ex=false operation fails to free the memory,
btr_search_update_hash_on_delete() will retry with ex=true.

btr_sea::partition::cleanup_after_erase(): Add an overload for the case
where instead of holding an exclusive latch, we hold a shared latch
along with a page_hash_latch. When not holding an exclusve latch,
we may fail to free the memory, and the caller has to retry with an
exclusive latch.

btr_sea::partition::cleanup_after_erase_start(),
btr_sea::partition::cleanup_after_erase_finish(): Split from
cleanup_after_erase() to reduce the amount of code duplication.

btr_sea::partition::block_mutex: Protect only the linked list of blocks.
The spare block will exclusively be updated via Atomic_relaxed::exchange().

btr_sea::partition::rollback_insert(): Free the spare block in the unlikely
event that the adaptive hash index has been disabled after our invocation
of btr_sea::partition::prepare_insert().

ha_remove_all_nodes_to_page(): Merged to the only caller
btr_search_drop_page_hash_index().
SET GLOBAL innodb_adaptive_hash_index_cells may be executed
while the server is running. This parameter will be effectively
multiplied by innodb_adaptive_hash_index_parts, because each partition will
contain its own hash table.

Previously, the number of hash table cells in the InnoDB adaptive hash index
dependended on the initial innodb_buffer_pool_size and was insufficient
for some workloads, leading to excessively long hash bucket chains.
@dr-m dr-m changed the title MDEV-37482: Introduce innodb_adaptive_hash_index_cells MDEV-37482: Contention on btr_sea::partition::latch; introduce innodb_adaptive_hash_index_cells Sep 17, 2025
@dr-m dr-m marked this pull request as ready for review September 17, 2025 07:56
ha_insert_for_fold(): Remove. Invoke btr_sea::partition::insert() directly.
xtrabackup_backup_func(): Invoke btr_search_sys_create(), because
innodb_shutdown() assumes that it will have been called.

srv_boot(): Invoke btr_search_sys_create(). This fixes assertion failures
in the test innodb.temporary_table.

btr_sea::create(): Do not invoke enable().

buf_pool_t::create(): Instead of invoking btr_sea::create(),
invoke btr_sea::enable() when needed.
btr_search_build_page_hash_index(): Downgrade to shared part.latch
before starting to insert records into the adaptive hash index.
In multi-batch operation, preserve the last fr[] value in order to
ensure the correct operation when buf_block_t::LEFT_SIDE is not set.

btr_search_drop_page_hash_index(): Avoid a redundant condition in case
we are holding exclusive part.latch.

ssux_lock_impl::wr_rd_downgrade(): Downgrade an X latch to S.

srw_lock_debug::wr_rd_downgrade(), srw_lock_impl::wr_rd_downgrade():
Downgrade from exclusive to shared. This operation is unavailable
if _WIN32 or SUX_LOCK_GENERIC is defined.
Let the user specify innodb_adaptive_hash_index_cells directly,
without invoking the inaccurate function ut_find_prime().
This (mostly) reverts commit 656daca.
The root cause of the observed crash was likely fixed by
commit 8aafa20.

# if defined _WIN32 || defined SUX_LOCK_GENERIC
# else
void srw_lock_debug::wr_rd_downgrade

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look if this deadlock issue is related to this branch
commt id:-origin/MDEV-37482 e0a4a1dc3cfa262266f38871869027c8359e9e4f

# 2025-09-23T08:33:35 [965079] Thread 4 (Thread 0x7f9abfdf2640 (LWP 997119)):
# 2025-09-23T08:33:35 [965079] #0  0x00007f9aecd0fc9b in sched_yield () at ../sysdeps/unix/syscall-template.S:120
# 2025-09-23T08:33:35 [965079] #1  0x0000556f151d4ee2 in __gthread_yield () at /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:693
# 2025-09-23T08:33:35 [965079] #2  std::this_thread::yield () at /usr/include/c++/11/bits/std_thread.h:329
# 2025-09-23T08:33:35 [965079] #3  purge_sys_t::wait_FTS (this=this@entry=0x556f16a4f140 <purge_sys>, also_sys=also_sys@entry=false) at /data/Server/MDEV-37482A/storage/innobase/trx/trx0purge.cc:1080
# 2025-09-23T08:33:35 [965079] #4  0x0000556f151d7352 in purge_sys_t::close_and_reopen (this=this@entry=0x556f16a4f140 <purge_sys>, id=<optimized out>, thd=thd@entry=0x556f17edf958, mdl=mdl@entry=0x7f9abfdf19a8) at /data/Server/MDEV-37482A/storage/innobase/trx/trx0purge.cc:1188
# 2025-09-23T08:33:35 [965079] #5  0x0000556f151dad5e in trx_purge_attach_undo_recs (thd=thd@entry=0x556f17edf958, n_work_items=n_work_items@entry=0x7f9abfdf1ac8) at /data/Server/MDEV-37482A/storage/innobase/trx/trx0purge.cc:1270
# 2025-09-23T08:33:35 [965079] #6  0x0000556f151db560 in trx_purge (n_tasks=<optimized out>, n_tasks@entry=4, history_size=375) at /data/Server/MDEV-37482A/storage/innobase/trx/trx0purge.cc:1388
# 2025-09-23T08:33:35 [965079] #7  0x0000556f151c47ee in purge_coordinator_state::do_purge (this=this@entry=0x556f16a4e3a0 <purge_state>) at /data/Server/MDEV-37482A/storage/innobase/srv/srv0srv.cc:1423
# 2025-09-23T08:33:35 [965079] #8  0x0000556f151c3e8c in purge_coordinator_callback () at /data/Server/MDEV-37482A/storage/innobase/srv/srv0srv.cc:1507
# 2025-09-23T08:33:35 [965079] #9  0x0000556f153d26b8 in tpool::task_group::execute (this=0x556f16a4e1c0 <purge_coordinator_task_group>, t=t@entry=0x556f16a4e120 <purge_coordinator_task>) at /data/Server/MDEV-37482A/tpool/task_group.cc:73
# 2025-09-23T08:33:35 [965079] #10 0x0000556f153d2a8b in tpool::task::execute (this=0x556f16a4e120 <purge_coordinator_task>) at /data/Server/MDEV-37482A/tpool/task.cc:32
# 2025-09-23T08:33:35 [965079] #11 0x0000556f153cefbd in tpool::thread_pool_generic::worker_main (this=0x556f17b236b0, thread_var=0x556f17b23b20) at /data/Server/MDEV-37482A/tpool/tpool_generic.cc:529
# 2025-09-23T08:33:35 [965079] #12 0x0000556f153cf215 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/11/bits/invoke.h:74
# 2025-09-23T08:33:35 [965079] #13 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/11/bits/invoke.h:96
# 2025-09-23T08:33:35 [965079] #14 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:259
# 2025-09-23T08:33:35 [965079] #15 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:266
# 2025-09-23T08:33:35 [965079] #16 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:211
# 2025-09-23T08:33:35 [965079] #17 0x00007f9aed015253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
# 2025-09-23T08:33:35 [965079] #18 0x00007f9aecc9bac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
# 2025-09-23T08:33:35 [965079] #19 0x00007f9aecd2d850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Core dump is present on pluto:-
/data/results/1758612370/001355

ut_ad(lk < WRITER);
u_unlock();
}
void wr_rd_downgrade() noexcept { wr_u_downgrade(); u_rd_downgrade(); }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if this crash is related:-

# 2025-09-23T07:53:22 [3655283] | #0  0x0000000070000002 in syscall_traced ()
# 2025-09-23T07:53:22 [3655283] | #1  0x00007f9cc4608525 in _raw_syscall () at /home/ubuntu/rr/src/preload/raw_syscall.S:120
# 2025-09-23T07:53:22 [3655283] | #2  0x00007f9cc4601949 in traced_raw_syscall (call=0x7f9c8ca13fa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:350
# 2025-09-23T07:53:22 [3655283] | #3  0x00007f9cc46059f7 in sys_futex (call=<optimized out>) at /home/ubuntu/rr/src/preload/syscallbuf.c:2040
# 2025-09-23T07:53:22 [3655283] | #4  syscall_hook_internal (call=0x7f9c8ca13fa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:4097
# 2025-09-23T07:53:22 [3655283] | #5  syscall_hook (call=0x7f9c8ca13fa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:4274
# 2025-09-23T07:53:22 [3655283] | #6  0x00007f9cc4601353 in _syscall_hook_trampoline () at /home/ubuntu/rr/src/preload/syscall_hook.S:308
# 2025-09-23T07:53:22 [3655283] | #7  0x00007f9cc46013bd in __morestack () at /home/ubuntu/rr/src/preload/syscall_hook.S:443
# 2025-09-23T07:53:22 [3655283] | #8  0x00007f9cc46013c4 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff () at /home/ubuntu/rr/src/preload/syscall_hook.S:457
# 2025-09-23T07:53:22 [3655283] | #9  0x00007f9cc4188893 in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
# 2025-09-23T07:53:22 [3655283] | #10 0x000055be6fc3168d in srw_mutex_impl<false>::wait (lk=37, this=0x55be70b93690 <dict_sys+80>) at /data/Server/MDEV-37482A/storage/innobase/sync/srw_lock.cc:252
# 2025-09-23T07:53:22 [3655283] | #11 srw_mutex_impl<false>::wait_and_lock (this=this@entry=0x55be70b93690 <dict_sys+80>) at /data/Server/MDEV-37482A/storage/innobase/sync/srw_lock.cc:313
# 2025-09-23T07:53:22 [3655283] | #12 0x000055be6fc2f618 in srw_mutex_impl<false>::wr_lock (this=0x55be70b93690 <dict_sys+80>) at /data/Server/MDEV-37482A/storage/innobase/include/srw_lock.h:162
# 2025-09-23T07:53:22 [3655283] | #13 srw_lock_debug::have_rd (this=this@entry=0x55be70b93680 <dict_sys+64>) at /data/Server/MDEV-37482A/storage/innobase/sync/srw_lock.cc:732
# 2025-09-23T07:53:22 [3655283] | #14 0x000055be6fc2fb23 in srw_lock_debug::have_any (this=this@entry=0x55be70b93680 <dict_sys+64>) at /data/Server/MDEV-37482A/storage/innobase/sync/srw_lock.cc:755
# 2025-09-23T07:53:22 [3655283] | #15 0x000055be6fc31473 in srw_lock_debug::rd_lock (this=this@entry=0x55be70b93680 <dict_sys+64>) at /data/Server/MDEV-37482A/storage/innobase/sync/srw_lock.cc:708
# 2025-09-23T07:53:22 [3655283] | #16 0x000055be6fd2d391 in dict_sys_t::freeze (this=0x55be70b93640 <dict_sys>) at /data/Server/MDEV-37482A/storage/innobase/include/dict0dict.h:1460
# 2025-09-23T07:53:22 [3655283] | #17 dict_table_open_on_name (table_name=table_name@entry=0x7f9cb80a1ea0 "test/table10_innodb_key_pk_parts_2_int_autoinc#P#p1", dict_locked=dict_locked@entry=false, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY) at /data/Server/MDEV-37482A/storage/innobase/dict/dict0dict.cc:1032
# 2025-09-23T07:53:22 [3655283] | #18 0x000055be6f9ce3e0 in ha_innobase::open_dict_table (norm_name=norm_name@entry=0x7f9cb80a1ea0 "test/table10_innodb_key_pk_parts_2_int_autoinc#P#p1", is_partition=is_partition@entry=true, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY) at /data/Server/MDEV-37482A/storage/innobase/handler/ha_innodb.cc:6191
# 2025-09-23T07:53:22 [3655283] | #19 0x000055be6f9eef99 in ha_innobase::open (this=0x7f9c39be7660, name=0x7f9cb80a21c0 "./test/table10_innodb_key_pk_parts_2_int_autoinc#P#p1") at /data/Server/MDEV-37482A/storage/innobase/handler/ha_innodb.cc:5896
# 2025-09-23T07:53:22 [3655283] | #20 0x000055be6f660042 in handler::ha_open (this=0x7f9c39be7660, table_arg=<optimized out>, name=name@entry=0x7f9cb80a21c0 "./test/table10_innodb_key_pk_parts_2_int_autoinc#P#p1", mode=2, test_if_locked=1042, mem_root=mem_root@entry=0x0, partitions_to_open=0x0) at /data/Server/MDEV-37482A/sql/handler.cc:3673
# 2025-09-23T07:53:22 [3655283] | #21 0x000055be6f95e7b3 in ha_partition::open_read_partitions (this=this@entry=0x7f9c39be6510, name_buff=name_buff@entry=0x7f9cb80a21c0 "./test/table10_innodb_key_pk_parts_2_int_autoinc#P#p1", name_buff_size=name_buff_size@entry=513) at /data/Server/MDEV-37482A/sql/ha_partition.cc:8964
# 2025-09-23T07:53:22 [3655283] | #22 0x000055be6f95f1e4 in ha_partition::open (this=0x7f9c39be6510, name=0x7f9c6c2cecf8 "./test/table10_innodb_key_pk_parts_2_int_autoinc", mode=<optimized out>, test_if_locked=18) at /data/Server/MDEV-37482A/sql/ha_partition.cc:3940
# 2025-09-23T07:53:22 [3655283] | #23 0x000055be6f660042 in handler::ha_open (this=0x7f9c39be6510, table_arg=table_arg@entry=0x7f9c6bd27fd8, name=0x7f9c6c2cecf8 "./test/table10_innodb_key_pk_parts_2_int_autoinc", mode=2, test_if_locked=test_if_locked@entry=18, mem_root=mem_root@entry=0x0, partitions_to_open=0x0) at /data/Server/MDEV-37482A/sql/handler.cc:3673
# 2025-09-23T07:53:22 [3655283] | #24 0x000055be6f444c23 in open_table_from_share (thd=thd@entry=0x7f9c7c001448, share=share@entry=0x7f9c6c2ce630, alias=alias@entry=0x7f9c456838b8, db_stat=db_stat@entry=33, prgflag=prgflag@entry=8, ha_open_flags=18, outparam=<optimized out>, is_create_table=<optimized out>, partitions_to_open=<optimized out>) at /data/Server/MDEV-37482A/sql/table.cc:4683
# 2025-09-23T07:53:22 [3655283] | #25 0x000055be6f2706a3 in open_table (thd=thd@entry=0x7f9c7c001448, table_list=table_list@entry=0x7f9c45683870, ot_ctx=ot_ctx@entry=0x7f9cb80a2880) at /data/Server/MDEV-37482A/sql/sql_base.cc:2310
# 2025-09-23T07:53:22 [3655283] | #26 0x000055be6f27189f in open_and_process_table (thd=thd@entry=0x7f9c7c001448, tables=tables@entry=0x7f9c45683870, counter=counter@entry=0x7f9cb80a291c, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f9cb80a2ac0, has_prelocking_list=has_prelocking_list@entry=false, ot_ctx=0x7f9cb80a2880) at /data/Server/MDEV-37482A/sql/sql_base.cc:4210
# 2025-09-23T07:53:22 [3655283] | #27 0x000055be6f272a52 in open_tables (thd=thd@entry=0x7f9c7c001448, options=..., start=start@entry=0x7f9cb80a2908, counter=counter@entry=0x7f9cb80a291c, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f9cb80a2ac0) at /data/Server/MDEV-37482A/sql/sql_base.cc:4731
# 2025-09-23T07:53:22 [3655283] | #28 0x000055be6f27315f in open_and_lock_tables (thd=thd@entry=0x7f9c7c001448, options=..., tables=<optimized out>, tables@entry=0x7f9c7c018448, derived=derived@entry=true, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f9cb80a2ac0) at /data/Server/MDEV-37482A/sql/sql_base.cc:5718
# 2025-09-23T07:53:22 [3655283] | #29 0x000055be6f2b7121 in open_and_lock_tables (flags=0, derived=true, tables=0x7f9c7c018448, thd=0x7f9c7c001448) at /data/Server/MDEV-37482A/sql/sql_base.h:537
# 2025-09-23T07:53:22 [3655283] | #30 mysql_insert (thd=thd@entry=0x7f9c7c001448, table_list=0x7f9c7c018448, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0) at /data/Server/MDEV-37482A/sql/sql_insert.cc:787
# 2025-09-23T07:53:22 [3655283] | #31 0x000055be6f2fd614 in mysql_execute_command (thd=thd@entry=0x7f9c7c001448, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /data/Server/MDEV-37482A/sql/sql_parse.cc:4480
# 2025-09-23T07:53:22 [3655283] | #32 0x000055be6f302957 in mysql_parse (thd=thd@entry=0x7f9c7c001448, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7f9cb80a3330) at /data/Server/MDEV-37482A/sql/sql_parse.cc:7905
# 2025-09-23T07:53:22 [3655283] | #33 0x000055be6f304eec in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f9c7c001448, packet=packet@entry=0x7f9c7c00b809 " INSERT INTO `table1_innodb` ( `col_decimal_key` ) VALUES ( 'good' )  /* E_R Thread1 QNO 383 CON_ID 15 */ ", packet_length=packet_length@entry=106, blocking=blocking@entry=true) at /data/Server/MDEV-37482A/sql/sql_parse.cc:1903
# 2025-09-23T07:53:22 [3655283] | #34 0x000055be6f306ec3 in do_command (thd=thd@entry=0x7f9c7c001448, blocking=blocking@entry=true) at /data/Server/MDEV-37482A/sql/sql_parse.cc:1416
# 2025-09-23T07:53:22 [3655283] | #35 0x000055be6f48baed in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55be73855f08, put_in_cache=put_in_cache@entry=true) at /data/Server/MDEV-37482A/sql/sql_connect.cc:1415
# 2025-09-23T07:53:22 [3655283] | #36 0x000055be6f48bd2d in handle_one_connection (arg=0x55be73855f08) at /data/Server/MDEV-37482A/sql/sql_connect.cc:1327
# 2025-09-23T07:53:22 [3655283] | #37 0x00007f9cc40feac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
# 2025-09-23T07:53:22 [3655283] | #38 0x00007f9cc418fa04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

RR trace is present on pluto:-
/data/results/1758612370/000425

buf_dblwr.init();
srv_thread_pool_init();
trx_pool_init();
btr_search_sys_create();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if this shutdown failure is related :-
commit:- origin/MDEV-37482 e0a4a1dc3cfa262266f38871869027c8359e9e4f

0  syscall_traced ()
#1  _raw_syscall () at /home/ubuntu/rr/src/preload/raw_syscall.S:120
#2  traced_raw_syscall (call=) at /home/ubuntu/rr/src/preload/syscallbuf.c:350
#3  sys_futex (call=<optimized out>) at /home/ubuntu/rr/src/preload/syscallbuf.c:2012
#4  syscall_hook_internal (call=) at /home/ubuntu/rr/src/preload/syscallbuf.c:4097
#5  syscall_hook (call=) at /home/ubuntu/rr/src/preload/syscallbuf.c:4274
#6  _syscall_hook_trampoline () at /home/ubuntu/rr/src/preload/syscall_hook.S:308
#7  __morestack () at /home/ubuntu/rr/src/preload/syscall_hook.S:443
#8  _syscall_hook_trampoline_48_3d_00_f0_ff_ff () at /home/ubuntu/rr/src/preload/syscall_hook.S:462
#9  futex_wait (private=0, expected=2, futex_word=<LOCK_timer+40>) at ../sysdeps/nptl/futex-internal.h:146
#10 __GI___lll_lock_wait (futex=futex@entry=<LOCK_timer+40>, private=0) at ./nptl/lowlevellock.c:49
#11 ___pthread_mutex_lock (mutex=<LOCK_timer+40>) at ./nptl/pthread_mutex_lock.c:145
#12 safe_mutex_lock (mp=mp@entry=<LOCK_timer>, my_flags=my_flags@entry=0, file=file@entry="/data/Server/MDEV-37482A/mysys/thr_timer.c", line=line@entry=228) at /data/Server/MDEV-37482A/mysys/thr_mutex.c:286
#13 inline_mysql_mutex_lock (src_line=228, src_file="/data/Server/MDEV-37482A/mysys/thr_timer.c", that=<LOCK_timer>) at /data/Server/MDEV-37482A/include/mysql/psi/mysql_thread.h:750
#14 thr_timer_end (timer_data=) at /data/Server/MDEV-37482A/mysys/thr_timer.c:228
#15 tpool::thread_pool_generic::timer_generic::disarm (this=) at /data/Server/MDEV-37482A/tpool/tpool_generic.cc:372
#16 tpool::thread_pool_generic::~thread_pool_generic (this=, __in_chrg=<optimized out>) at /data/Server/MDEV-37482A/tpool/tpool_generic.cc:911
#17 tpool::thread_pool_generic::~thread_pool_generic (this=, __in_chrg=<optimized out>) at /data/Server/MDEV-37482A/tpool/tpool_generic.cc:928
#18 srv_thread_pool_end () at /data/Server/MDEV-37482A/storage/innobase/srv/srv0srv.cc:554
#19 srv_free () at /data/Server/MDEV-37482A/storage/innobase/srv/srv0srv.cc:592
#20 innodb_shutdown () at /data/Server/MDEV-37482A/storage/innobase/srv/srv0start.cc:2130
#21 innobase_end () at /data/Server/MDEV-37482A/storage/innobase/handler/ha_innodb.cc:4386
#22 ha_finalize_handlerton (plugin_=) at /data/Server/MDEV-37482A/sql/handler.cc:601
#23 plugin_deinitialize (plugin=, ref_check=ref_check@entry=true) at /data/Server/MDEV-37482A/sql/sql_plugin.cc:1274
#24 reap_plugins () at /data/Server/MDEV-37482A/sql/sql_plugin.cc:1345
#25 plugin_shutdown () at /data/Server/MDEV-37482A/sql/sql_plugin.cc:2086
#26 clean_up (print_message=print_message@entry=true) at /data/Server/MDEV-37482A/sql/mysqld.cc:2012
#27 mysqld_main (argc=<optimized out>, argv=<optimized out>) at /data/Server/MDEV-37482A/sql/mysqld.cc:6186
#28 main (argc=<optimized out>, argv=<optimized out>) at /data/Server/MDEV-37482A/sql/main.cc:34


RR trace is present on pluto:-/data/results/1758612370/002733

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants