-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-37482: Contention on btr_sea::partition::latch; introduce innodb_adaptive_hash_index_cells #4264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11.8
Are you sure you want to change the base?
Conversation
|
static MYSQL_SYSVAR_UINT(adaptive_hash_index_cells, btr_search.n_cells, | ||
PLUGIN_VAR_RQCMDARG, | ||
"Number of adaptive hash table cells", | ||
nullptr, innodb_adaptive_hash_index_cells_update, 0, 0, UINT_MAX, 0); | ||
#endif /* BTR_CUR_HASH_ADAPT */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iMineLink pointed out that this forms an artificial limitation on really large buffer pools. This would have to be size_t
instead of uint
.
Furthermore, I’d like to see if we could repurpose innodb_adaptive_hash_index_parts
. That is, hard-wire the partitions to 1, and make that parameter control the hash table size. Possibly, it could be feasible to replace the btr_sea::partition::latch
with latches that are embedded in the hash array, like we do it for buf_pool.page_hash
and lock_sys.rec_hash
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the number of partitions becomes hardwired to 1, I believe that having a size_t
variable for the number of cells may be less interesting as the CRC32C does not provide enough bits of entropy to distribute the values over more than 232-1 cells, making uint
wide enough. If wider hash would be implemented (xxHash could be an interesting candidate), it would make more sense to switch to size_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I did not realize that the uint
actually corresponds to the available entropy of CRC-32C. Anything larger than that definitely is useless.
We actually have an implementation of xxHash
in the rocksdb
submodule. For some reason, its AVX-512 implementation failed to compile in my environment, and I had commented it out:
diff --git a/util/xxhash.h b/util/xxhash.h
index 9846861b7..d38c83ef5 100644
--- a/util/xxhash.h
+++ b/util/xxhash.h
@@ -2745,7 +2745,7 @@ enum XXH_VECTOR_TYPE /* fake enum */ {
#endif
#ifndef XXH_VECTOR /* can be defined on command line */
-# if defined(__AVX512F__)
+# if 0&&defined(__AVX512F__)
# define XXH_VECTOR XXH_AVX512
# elif defined(__AVX2__)
# define XXH_VECTOR XXH_AVX2
It would be interesting to test the relative performance of this, compared to CRC-32C, on a few different processor implementations (one with AVX512 for both, and another comparing the AVX2 implementation to crc32c_3way()
). While doing this, we must keep in mind that dtuple_fold()
is currently computing a hash piecewise, one field at a time. It looks like XXH64_update()
in the above code would support appending any number of bytes to the hash. It looks like we may have more comprehensive cross-ISA SIMD coverage for our crc32
implementations than for xxHash
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following would avoid the compilation error without disabling the AVX512 implementation:
diff --git a/util/xxhash.h b/util/xxhash.h
index 9846861b7..a5742a157 100644
--- a/util/xxhash.h
+++ b/util/xxhash.h
@@ -3658,7 +3658,7 @@ XXH3_scrambleAcc_avx512(void* XXH_RESTRICT acc, const void* XXH_RESTRICT secret)
}
}
-XXH_FORCE_INLINE XXH_TARGET_AVX512 void
+XXH_TARGET_AVX512 void
XXH3_initCustomSecret_avx512(void* XXH_RESTRICT customSecret, xxh_u64 seed64)
{
XXH_STATIC_ASSERT((XXH_SECRET_DEFAULT_SIZE & 63) == 0);
This patch is apparently only needed with CMAKE_BUILD_TYPE=Debug
, not with CMAKE_BUILD_TYPE=RelWithDebInfo
.
The upstream source code https://github.com/Cyan4973/xxHash is more up to date and includes a benchmark:
make -C tests/bench clean benchHash_avx512
tests/bench/benchHash_avx512
Compared to the plain benchHash
, the benchHash_avx512
appears to speed up xxh3
and XXH128
only. I understood that there is a 64-bit and a 128-bit variant of XXH3, which the benchmark is referring to with the names xxh3
and XXH128
.
The interface to the XXH32
function seems to be compatible with crc32
functions. I added the crc32_avx512.cc
from https://github.com/dr-m/crc32_simd to the directory and patched the code to make use of it:
diff --git a/tests/bench/Makefile b/tests/bench/Makefile
index ec5d56f..974a2f7 100644
--- a/tests/bench/Makefile
+++ b/tests/bench/Makefile
@@ -36,7 +36,7 @@ CXXFLAGS ?= -O3
LDFLAGS += $(MOREFLAGS)
-OBJ_LIST = main.o bhDisplay.o benchHash.o benchfn.o timefn.o
+OBJ_LIST = main.o bhDisplay.o benchHash.o benchfn.o timefn.o crc32_avx512.o
default: benchHash
diff --git a/tests/bench/hashes.h b/tests/bench/hashes.h
index 39f6e0f..d4f3c8f 100644
--- a/tests/bench/hashes.h
+++ b/tests/bench/hashes.h
@@ -66,10 +66,12 @@
#define XXH_INLINE_ALL
#include "xxhash.h"
+unsigned crc32c_refl_vpclmulqdq(unsigned crc, const void *buf, size_t size);
+
size_t XXH32_wrapper(const void* src, size_t srcSize, void* dst, size_t dstCapacity, void* customPayload)
{
(void)dst; (void)dstCapacity; (void)customPayload;
- return (size_t) XXH32(src, srcSize, 0);
+ return (size_t) crc32c_refl_vpclmulqdq(0, src, srcSize);
}
I also worked around a warning from GCC 15.2.0 elsewhere:
diff --git a/tests/test_alias.c b/tests/test_alias.c
index af1065f..0ed4925 100644
--- a/tests/test_alias.c
+++ b/tests/test_alias.c
@@ -7,7 +7,7 @@
int main() {
// it seems this has to be exactly 24 bytes.
union {
- char x[24];
+ char x[24] __attribute__((nonstring));
// force 8-byte alignment without making
// aliasable with uint64_t.
void *y[3];
The numbers that are output by this single-threaded test seem to be proportional to the execution time. Below is the output from the above patched program for an AMD Ryzen AI 9 HX 370:
=== benchmarking 4 hash functions ===
benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27)
xxh3 , 74182, 84666, 83672, 89621, 92826, 94608, 95333, 95214, 95421, 95539, 95428, 92181, 89884, 91424, 93322, 74003, 61454, 55054, 54935
XXH32 , 50964, 59345, 66989, 74916, 76056, 76814, 77486, 77244, 77221, 75709, 78064, 77477, 76820, 76725, 77119, 64710, 65586, 67875, 61154
XXH64 , 23776, 25692, 26763, 27317, 27243, 27885, 28019, 27526, 27998, 27915, 26876, 27732, 27498, 27587, 27589, 26237, 26248, 26817, 27648
XXH128 , 47036, 67880, 77207, 84151, 90032, 86994, 90539, 91385, 90459, 93035, 93205, 86592, 85137, 86459, 86074, 65815, 62117, 58866, 52468
Throughput small inputs of fixed size (from 1 to 30 bytes):
xxh3 , 684966002, 660267932, 686242542, 591918279, 593796881, 653788552, 691280352, 693917070, 796122351, 799086322, 606649579, 607106262, 743145838, 770496269, 773394061, 777579358, 538555002, 532533910, 552009941, 548027883, 543334100, 540782406, 531313570, 531370730, 536021827, 535807626, 533730883, 507836101, 529489272, 527372639
XXH32 , 391192227, 394170674, 392500119, 320799639, 325310066, 300092059, 325021742, 325166174, 319339892, 324440142, 320935576, 319710875, 318605694, 320033480, 318768959, 321140626, 266827038, 269546291, 270451691, 271340836, 271642933, 268264601, 270806888, 263607977, 267720899, 268822868, 266933926, 253647371, 269115020, 269411779
XXH64 , 531855115, 500845328, 445383908, 560522481, 466170811, 401734154, 353106295, 465769876, 387646727, 348885765, 313642848, 373218102, 330846547, 292677619, 263810140, 379560978, 331417038, 284102612, 262724111, 314605881, 275802854, 247115565, 227006968, 313150520, 271040908, 250206062, 235322131, 264888019, 239329223, 218043430
XXH128 , 600416903, 598992972, 597492462, 597645528, 600791325, 598841509, 600595595, 602564178, 543290108, 551507908, 559672652, 567830015, 572610884, 573292836, 461124110, 568812453, 441289653, 443379326, 449714793, 445896308, 443586459, 441661817, 452889973, 451724824, 355037442, 454808260, 453913239, 455387883, 445214930, 434933402
benchmarking random size inputs [1-N] :
xxh3 , 536254141, 537572868, 538169107, 530058591, 554132343, 567626107, 574962298, 574064510, 575911003, 512575947, 733187750, 730542641, 717504120, 745609269, 744037097, 743641148, 700682822, 680405902, 658648507, 654970094, 643835364, 633237371, 650883337, 651155217, 645069308, 643558465, 641581970, 633572643, 632410158, 635136736
XXH32 , 357112193, 380404113, 393745929, 279481611, 328882531, 327669975, 326555116, 323169043, 304338634, 296146461, 284067481, 285193023, 261837568, 261916119, 271470126, 286578410, 297919080, 310922451, 317438909, 308153526, 310695050, 237882917, 306093451, 292329079, 304391960, 292037006, 301689552, 294104782, 297568491, 288948386
XXH64 , 586983878, 540011057, 505108609, 515782962, 506867672, 482203869, 458474345, 460857782, 454959882, 444087636, 436830029, 426223145, 414820438, 404221782, 389029293, 392782728, 383694624, 373786265, 372277845, 369071752, 358965895, 355644767, 343765355, 345120575, 332244312, 330058777, 327488924, 324763345, 328342498, 316925302
XXH128 , 600494980, 601024972, 600815707, 709364018, 721656467, 717393849, 728587806, 727638953, 714389847, 640170770, 704909059, 691095633, 677935985, 671790219, 664878820, 661493496, 634991187, 617963404, 614205735, 600351968, 591558571, 582670037, 572910083, 570938552, 561659226, 552558359, 551863000, 543461642, 544238249, 540090042
Latency for small inputs of fixed size :
xxh3 , 232758925, 231240688, 235893537, 218715797, 218702972, 217121209, 215070339, 216124609, 228496964, 227321150, 220489768, 216466240, 225549489, 222822477, 218329570, 228193366, 199722428, 202750981, 207964315, 205000677, 211305233, 204392876, 203269104, 208302695, 207763940, 205726991, 203989265, 192116024, 202467348, 205083400
XXH32 , 141043491, 140301324, 140486628, 98556109, 98245408, 98513733, 98804565, 98542229, 96771917, 96485345, 97056348, 94577797, 99446605, 99849828, 98193147, 105409603, 85836960, 86520583, 85472248, 85250394, 86339994, 85767916, 86837601, 86207302, 86573186, 86054076, 84703667, 85890112, 85643206, 85543224
XXH64 , 192012704, 162931329, 140704020, 186484149, 155708976, 135887600, 119828833, 161083460, 136889778, 118022610, 108682607, 134063193, 119996471, 107955849, 95983286, 134099504, 119074090, 93820245, 96247846, 116644599, 104156252, 95846157, 86348288, 114379316, 105948087, 95571192, 87200480, 101708869, 92938617, 85751257
XXH128 , 236321318, 236073631, 236102127, 208280715, 208310530, 210537461, 207243248, 210688513, 194382849, 193552942, 191795246, 192578612, 193381769, 194508669, 192642081, 194463399, 203166631, 203207549, 194373178, 195655373, 192597790, 200018834, 198876272, 202741660, 200240055, 193788601, 199949428, 196226837, 193538519, 202656164
Latency for small inputs of random size [1-N] :
xxh3 , 235249462, 235093356, 235020196, 230137534, 226608695, 225257538, 223406822, 222481072, 220464520, 220366994, 220493278, 220447660, 221115749, 221203118, 220894785, 222059533, 220315846, 219357725, 218859180, 219266622, 218436838, 217102632, 217388236, 214020849, 216201591, 215447083, 214364303, 215011443, 214583615, 213633500
XXH32 , 138574687, 138200659, 138254096, 127152154, 118652610, 116125087, 113038971, 111430621, 109049625, 107047720, 106949323, 105718789, 104509137, 104077495, 103943496, 103957514, 102501662, 101947719, 101586218, 100532803, 99238931, 98585253, 97934961, 96996536, 96221852, 95664977, 95513748, 95093612, 95705998, 94256848
XXH64 , 191493620, 173300634, 163516791, 167001254, 166128454, 158375539, 152604393, 153758844, 151630496, 147949428, 144272111, 141345000, 138796754, 136163707, 132040948, 133526774, 131023326, 128485376, 128065667, 127330599, 126536183, 124805157, 121684005, 122473701, 119697993, 117956051, 117532869, 116771538, 117506207, 114484195
XXH128 , 235632251, 236305541, 235991633, 228243862, 223842736, 222420190, 219975757, 218645414, 214840004, 210478170, 210057345, 207610543, 205595771, 204385614, 203421740, 202734423, 201489640, 201351797, 202091043, 201638052, 201245985, 201648995, 200984775, 200011769, 199330525, 199704794, 198743103, 198247503, 199099688, 198201541
For the first output, the CRC-32C-disguised-as-XXH32 is reporting 3 to 6 times the numbers that the genuine XXH32 would report. While this easily beats the unaccelerated (?) XXH64, the 64-bit xxh3
seems to be a clear winner.
That is, it could be worth the effort to implement the 64-bit xxh3()
as an alternative for rec_fold()
and dtuple_fold()
in a separate piece of work. However, we must keep in mind that the above benchmark is not at all covering cases where a hash is being computed piecewise, like it is the case in dtuple_fold()
where we invoke my_crc32c()
on each index field separately. In fact, it is dtuple_fold()
that is being executed most of the time. That the rec_fold()
is tail-calling my_crc32c()
on a contiguous buffer only benefits AHI modifications, not key lookups.
5ff7fec is a start of improving this further. It needs to be combined with the previous changes, that is, introducing |
To reduce contention between insert, erase and search, let us mimic commit b08448d (MDEV-20612). That is, btr_sea::partition::insert() and btr_sea::partition::erase() will use a combination of a shared btr_sea::partition::latch and a tiny page_hash_latch that is pushed down to the btr_sea::hash_table::array. An exclusive btr_sea::partition::latch will be used in the final part of btr_search_drop_page_hash_index(), where we must guarantee that all entries will be removed, as well as in operations that affect an entire adaptive hash index partition. btr_sea::hash_chain: Chain of ahi_node hash buckets. btr_sea::hash_table: A hash table that includes page_hash_latch interleaved with hash_chain. page_hash_latch::try_lock(): Attempt to acquire an exclusive latch without waiting. btr_search_guess_on_hash(): Acquire also the page_hash_latch in order to prevent a concurrent modification of the hash bucket chain that our lookup is traversing. btr_sea::partition::erase(): Add template<bool ex> for indicating whether an exclusive or a shared btr_sea::partition::latch is being held. If the ex=false operation fails to free the memory, btr_search_update_hash_on_delete() will retry with ex=true. btr_sea::partition::cleanup_after_erase(): Add an overload for the case where instead of holding an exclusive latch, we hold a shared latch along with a page_hash_latch. When not holding an exclusve latch, we may fail to free the memory, and the caller has to retry with an exclusive latch. btr_sea::partition::cleanup_after_erase_start(), btr_sea::partition::cleanup_after_erase_finish(): Split from cleanup_after_erase() to reduce the amount of code duplication. btr_sea::partition::block_mutex: Protect only the linked list of blocks. The spare block will exclusively be updated via Atomic_relaxed::exchange(). btr_sea::partition::rollback_insert(): Free the spare block in the unlikely event that the adaptive hash index has been disabled after our invocation of btr_sea::partition::prepare_insert(). ha_remove_all_nodes_to_page(): Merged to the only caller btr_search_drop_page_hash_index().
SET GLOBAL innodb_adaptive_hash_index_cells may be executed while the server is running. This parameter will be effectively multiplied by innodb_adaptive_hash_index_parts, because each partition will contain its own hash table. Previously, the number of hash table cells in the InnoDB adaptive hash index dependended on the initial innodb_buffer_pool_size and was insufficient for some workloads, leading to excessively long hash bucket chains.
xtrabackup_backup_func(): Invoke btr_search_sys_create(), because innodb_shutdown() assumes that it will have been called. srv_boot(): Invoke btr_search_sys_create(). This fixes assertion failures in the test innodb.temporary_table. btr_sea::create(): Do not invoke enable(). buf_pool_t::create(): Instead of invoking btr_sea::create(), invoke btr_sea::enable() when needed.
btr_search_build_page_hash_index(): Downgrade to shared part.latch before starting to insert records into the adaptive hash index. In multi-batch operation, preserve the last fr[] value in order to ensure the correct operation when buf_block_t::LEFT_SIDE is not set. btr_search_drop_page_hash_index(): Avoid a redundant condition in case we are holding exclusive part.latch. ssux_lock_impl::wr_rd_downgrade(): Downgrade an X latch to S. srw_lock_debug::wr_rd_downgrade(), srw_lock_impl::wr_rd_downgrade(): Downgrade from exclusive to shared. This operation is unavailable if _WIN32 or SUX_LOCK_GENERIC is defined.
|
||
# if defined _WIN32 || defined SUX_LOCK_GENERIC | ||
# else | ||
void srw_lock_debug::wr_rd_downgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look if this deadlock issue is related to this branch
commt id:-origin/MDEV-37482 e0a4a1dc3cfa262266f38871869027c8359e9e4f
# 2025-09-23T08:33:35 [965079] Thread 4 (Thread 0x7f9abfdf2640 (LWP 997119)):
# 2025-09-23T08:33:35 [965079] #0 0x00007f9aecd0fc9b in sched_yield () at ../sysdeps/unix/syscall-template.S:120
# 2025-09-23T08:33:35 [965079] #1 0x0000556f151d4ee2 in __gthread_yield () at /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:693
# 2025-09-23T08:33:35 [965079] #2 std::this_thread::yield () at /usr/include/c++/11/bits/std_thread.h:329
# 2025-09-23T08:33:35 [965079] #3 purge_sys_t::wait_FTS (this=this@entry=0x556f16a4f140 <purge_sys>, also_sys=also_sys@entry=false) at /data/Server/MDEV-37482A/storage/innobase/trx/trx0purge.cc:1080
# 2025-09-23T08:33:35 [965079] #4 0x0000556f151d7352 in purge_sys_t::close_and_reopen (this=this@entry=0x556f16a4f140 <purge_sys>, id=<optimized out>, thd=thd@entry=0x556f17edf958, mdl=mdl@entry=0x7f9abfdf19a8) at /data/Server/MDEV-37482A/storage/innobase/trx/trx0purge.cc:1188
# 2025-09-23T08:33:35 [965079] #5 0x0000556f151dad5e in trx_purge_attach_undo_recs (thd=thd@entry=0x556f17edf958, n_work_items=n_work_items@entry=0x7f9abfdf1ac8) at /data/Server/MDEV-37482A/storage/innobase/trx/trx0purge.cc:1270
# 2025-09-23T08:33:35 [965079] #6 0x0000556f151db560 in trx_purge (n_tasks=<optimized out>, n_tasks@entry=4, history_size=375) at /data/Server/MDEV-37482A/storage/innobase/trx/trx0purge.cc:1388
# 2025-09-23T08:33:35 [965079] #7 0x0000556f151c47ee in purge_coordinator_state::do_purge (this=this@entry=0x556f16a4e3a0 <purge_state>) at /data/Server/MDEV-37482A/storage/innobase/srv/srv0srv.cc:1423
# 2025-09-23T08:33:35 [965079] #8 0x0000556f151c3e8c in purge_coordinator_callback () at /data/Server/MDEV-37482A/storage/innobase/srv/srv0srv.cc:1507
# 2025-09-23T08:33:35 [965079] #9 0x0000556f153d26b8 in tpool::task_group::execute (this=0x556f16a4e1c0 <purge_coordinator_task_group>, t=t@entry=0x556f16a4e120 <purge_coordinator_task>) at /data/Server/MDEV-37482A/tpool/task_group.cc:73
# 2025-09-23T08:33:35 [965079] #10 0x0000556f153d2a8b in tpool::task::execute (this=0x556f16a4e120 <purge_coordinator_task>) at /data/Server/MDEV-37482A/tpool/task.cc:32
# 2025-09-23T08:33:35 [965079] #11 0x0000556f153cefbd in tpool::thread_pool_generic::worker_main (this=0x556f17b236b0, thread_var=0x556f17b23b20) at /data/Server/MDEV-37482A/tpool/tpool_generic.cc:529
# 2025-09-23T08:33:35 [965079] #12 0x0000556f153cf215 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/11/bits/invoke.h:74
# 2025-09-23T08:33:35 [965079] #13 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/11/bits/invoke.h:96
# 2025-09-23T08:33:35 [965079] #14 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:259
# 2025-09-23T08:33:35 [965079] #15 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:266
# 2025-09-23T08:33:35 [965079] #16 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:211
# 2025-09-23T08:33:35 [965079] #17 0x00007f9aed015253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
# 2025-09-23T08:33:35 [965079] #18 0x00007f9aecc9bac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
# 2025-09-23T08:33:35 [965079] #19 0x00007f9aecd2d850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Core dump is present on pluto:-
/data/results/1758612370/001355
ut_ad(lk < WRITER); | ||
u_unlock(); | ||
} | ||
void wr_rd_downgrade() noexcept { wr_u_downgrade(); u_rd_downgrade(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if this crash is related:-
# 2025-09-23T07:53:22 [3655283] | #0 0x0000000070000002 in syscall_traced ()
# 2025-09-23T07:53:22 [3655283] | #1 0x00007f9cc4608525 in _raw_syscall () at /home/ubuntu/rr/src/preload/raw_syscall.S:120
# 2025-09-23T07:53:22 [3655283] | #2 0x00007f9cc4601949 in traced_raw_syscall (call=0x7f9c8ca13fa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:350
# 2025-09-23T07:53:22 [3655283] | #3 0x00007f9cc46059f7 in sys_futex (call=<optimized out>) at /home/ubuntu/rr/src/preload/syscallbuf.c:2040
# 2025-09-23T07:53:22 [3655283] | #4 syscall_hook_internal (call=0x7f9c8ca13fa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:4097
# 2025-09-23T07:53:22 [3655283] | #5 syscall_hook (call=0x7f9c8ca13fa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:4274
# 2025-09-23T07:53:22 [3655283] | #6 0x00007f9cc4601353 in _syscall_hook_trampoline () at /home/ubuntu/rr/src/preload/syscall_hook.S:308
# 2025-09-23T07:53:22 [3655283] | #7 0x00007f9cc46013bd in __morestack () at /home/ubuntu/rr/src/preload/syscall_hook.S:443
# 2025-09-23T07:53:22 [3655283] | #8 0x00007f9cc46013c4 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff () at /home/ubuntu/rr/src/preload/syscall_hook.S:457
# 2025-09-23T07:53:22 [3655283] | #9 0x00007f9cc4188893 in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
# 2025-09-23T07:53:22 [3655283] | #10 0x000055be6fc3168d in srw_mutex_impl<false>::wait (lk=37, this=0x55be70b93690 <dict_sys+80>) at /data/Server/MDEV-37482A/storage/innobase/sync/srw_lock.cc:252
# 2025-09-23T07:53:22 [3655283] | #11 srw_mutex_impl<false>::wait_and_lock (this=this@entry=0x55be70b93690 <dict_sys+80>) at /data/Server/MDEV-37482A/storage/innobase/sync/srw_lock.cc:313
# 2025-09-23T07:53:22 [3655283] | #12 0x000055be6fc2f618 in srw_mutex_impl<false>::wr_lock (this=0x55be70b93690 <dict_sys+80>) at /data/Server/MDEV-37482A/storage/innobase/include/srw_lock.h:162
# 2025-09-23T07:53:22 [3655283] | #13 srw_lock_debug::have_rd (this=this@entry=0x55be70b93680 <dict_sys+64>) at /data/Server/MDEV-37482A/storage/innobase/sync/srw_lock.cc:732
# 2025-09-23T07:53:22 [3655283] | #14 0x000055be6fc2fb23 in srw_lock_debug::have_any (this=this@entry=0x55be70b93680 <dict_sys+64>) at /data/Server/MDEV-37482A/storage/innobase/sync/srw_lock.cc:755
# 2025-09-23T07:53:22 [3655283] | #15 0x000055be6fc31473 in srw_lock_debug::rd_lock (this=this@entry=0x55be70b93680 <dict_sys+64>) at /data/Server/MDEV-37482A/storage/innobase/sync/srw_lock.cc:708
# 2025-09-23T07:53:22 [3655283] | #16 0x000055be6fd2d391 in dict_sys_t::freeze (this=0x55be70b93640 <dict_sys>) at /data/Server/MDEV-37482A/storage/innobase/include/dict0dict.h:1460
# 2025-09-23T07:53:22 [3655283] | #17 dict_table_open_on_name (table_name=table_name@entry=0x7f9cb80a1ea0 "test/table10_innodb_key_pk_parts_2_int_autoinc#P#p1", dict_locked=dict_locked@entry=false, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY) at /data/Server/MDEV-37482A/storage/innobase/dict/dict0dict.cc:1032
# 2025-09-23T07:53:22 [3655283] | #18 0x000055be6f9ce3e0 in ha_innobase::open_dict_table (norm_name=norm_name@entry=0x7f9cb80a1ea0 "test/table10_innodb_key_pk_parts_2_int_autoinc#P#p1", is_partition=is_partition@entry=true, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_FK_NOKEY) at /data/Server/MDEV-37482A/storage/innobase/handler/ha_innodb.cc:6191
# 2025-09-23T07:53:22 [3655283] | #19 0x000055be6f9eef99 in ha_innobase::open (this=0x7f9c39be7660, name=0x7f9cb80a21c0 "./test/table10_innodb_key_pk_parts_2_int_autoinc#P#p1") at /data/Server/MDEV-37482A/storage/innobase/handler/ha_innodb.cc:5896
# 2025-09-23T07:53:22 [3655283] | #20 0x000055be6f660042 in handler::ha_open (this=0x7f9c39be7660, table_arg=<optimized out>, name=name@entry=0x7f9cb80a21c0 "./test/table10_innodb_key_pk_parts_2_int_autoinc#P#p1", mode=2, test_if_locked=1042, mem_root=mem_root@entry=0x0, partitions_to_open=0x0) at /data/Server/MDEV-37482A/sql/handler.cc:3673
# 2025-09-23T07:53:22 [3655283] | #21 0x000055be6f95e7b3 in ha_partition::open_read_partitions (this=this@entry=0x7f9c39be6510, name_buff=name_buff@entry=0x7f9cb80a21c0 "./test/table10_innodb_key_pk_parts_2_int_autoinc#P#p1", name_buff_size=name_buff_size@entry=513) at /data/Server/MDEV-37482A/sql/ha_partition.cc:8964
# 2025-09-23T07:53:22 [3655283] | #22 0x000055be6f95f1e4 in ha_partition::open (this=0x7f9c39be6510, name=0x7f9c6c2cecf8 "./test/table10_innodb_key_pk_parts_2_int_autoinc", mode=<optimized out>, test_if_locked=18) at /data/Server/MDEV-37482A/sql/ha_partition.cc:3940
# 2025-09-23T07:53:22 [3655283] | #23 0x000055be6f660042 in handler::ha_open (this=0x7f9c39be6510, table_arg=table_arg@entry=0x7f9c6bd27fd8, name=0x7f9c6c2cecf8 "./test/table10_innodb_key_pk_parts_2_int_autoinc", mode=2, test_if_locked=test_if_locked@entry=18, mem_root=mem_root@entry=0x0, partitions_to_open=0x0) at /data/Server/MDEV-37482A/sql/handler.cc:3673
# 2025-09-23T07:53:22 [3655283] | #24 0x000055be6f444c23 in open_table_from_share (thd=thd@entry=0x7f9c7c001448, share=share@entry=0x7f9c6c2ce630, alias=alias@entry=0x7f9c456838b8, db_stat=db_stat@entry=33, prgflag=prgflag@entry=8, ha_open_flags=18, outparam=<optimized out>, is_create_table=<optimized out>, partitions_to_open=<optimized out>) at /data/Server/MDEV-37482A/sql/table.cc:4683
# 2025-09-23T07:53:22 [3655283] | #25 0x000055be6f2706a3 in open_table (thd=thd@entry=0x7f9c7c001448, table_list=table_list@entry=0x7f9c45683870, ot_ctx=ot_ctx@entry=0x7f9cb80a2880) at /data/Server/MDEV-37482A/sql/sql_base.cc:2310
# 2025-09-23T07:53:22 [3655283] | #26 0x000055be6f27189f in open_and_process_table (thd=thd@entry=0x7f9c7c001448, tables=tables@entry=0x7f9c45683870, counter=counter@entry=0x7f9cb80a291c, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f9cb80a2ac0, has_prelocking_list=has_prelocking_list@entry=false, ot_ctx=0x7f9cb80a2880) at /data/Server/MDEV-37482A/sql/sql_base.cc:4210
# 2025-09-23T07:53:22 [3655283] | #27 0x000055be6f272a52 in open_tables (thd=thd@entry=0x7f9c7c001448, options=..., start=start@entry=0x7f9cb80a2908, counter=counter@entry=0x7f9cb80a291c, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f9cb80a2ac0) at /data/Server/MDEV-37482A/sql/sql_base.cc:4731
# 2025-09-23T07:53:22 [3655283] | #28 0x000055be6f27315f in open_and_lock_tables (thd=thd@entry=0x7f9c7c001448, options=..., tables=<optimized out>, tables@entry=0x7f9c7c018448, derived=derived@entry=true, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f9cb80a2ac0) at /data/Server/MDEV-37482A/sql/sql_base.cc:5718
# 2025-09-23T07:53:22 [3655283] | #29 0x000055be6f2b7121 in open_and_lock_tables (flags=0, derived=true, tables=0x7f9c7c018448, thd=0x7f9c7c001448) at /data/Server/MDEV-37482A/sql/sql_base.h:537
# 2025-09-23T07:53:22 [3655283] | #30 mysql_insert (thd=thd@entry=0x7f9c7c001448, table_list=0x7f9c7c018448, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0) at /data/Server/MDEV-37482A/sql/sql_insert.cc:787
# 2025-09-23T07:53:22 [3655283] | #31 0x000055be6f2fd614 in mysql_execute_command (thd=thd@entry=0x7f9c7c001448, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /data/Server/MDEV-37482A/sql/sql_parse.cc:4480
# 2025-09-23T07:53:22 [3655283] | #32 0x000055be6f302957 in mysql_parse (thd=thd@entry=0x7f9c7c001448, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7f9cb80a3330) at /data/Server/MDEV-37482A/sql/sql_parse.cc:7905
# 2025-09-23T07:53:22 [3655283] | #33 0x000055be6f304eec in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f9c7c001448, packet=packet@entry=0x7f9c7c00b809 " INSERT INTO `table1_innodb` ( `col_decimal_key` ) VALUES ( 'good' ) /* E_R Thread1 QNO 383 CON_ID 15 */ ", packet_length=packet_length@entry=106, blocking=blocking@entry=true) at /data/Server/MDEV-37482A/sql/sql_parse.cc:1903
# 2025-09-23T07:53:22 [3655283] | #34 0x000055be6f306ec3 in do_command (thd=thd@entry=0x7f9c7c001448, blocking=blocking@entry=true) at /data/Server/MDEV-37482A/sql/sql_parse.cc:1416
# 2025-09-23T07:53:22 [3655283] | #35 0x000055be6f48baed in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55be73855f08, put_in_cache=put_in_cache@entry=true) at /data/Server/MDEV-37482A/sql/sql_connect.cc:1415
# 2025-09-23T07:53:22 [3655283] | #36 0x000055be6f48bd2d in handle_one_connection (arg=0x55be73855f08) at /data/Server/MDEV-37482A/sql/sql_connect.cc:1327
# 2025-09-23T07:53:22 [3655283] | #37 0x00007f9cc40feac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
# 2025-09-23T07:53:22 [3655283] | #38 0x00007f9cc418fa04 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
RR trace is present on pluto:-
/data/results/1758612370/000425
buf_dblwr.init(); | ||
srv_thread_pool_init(); | ||
trx_pool_init(); | ||
btr_search_sys_create(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if this shutdown failure is related :-
commit:- origin/MDEV-37482 e0a4a1dc3cfa262266f38871869027c8359e9e4f
0 syscall_traced ()
#1 _raw_syscall () at /home/ubuntu/rr/src/preload/raw_syscall.S:120
#2 traced_raw_syscall (call=) at /home/ubuntu/rr/src/preload/syscallbuf.c:350
#3 sys_futex (call=<optimized out>) at /home/ubuntu/rr/src/preload/syscallbuf.c:2012
#4 syscall_hook_internal (call=) at /home/ubuntu/rr/src/preload/syscallbuf.c:4097
#5 syscall_hook (call=) at /home/ubuntu/rr/src/preload/syscallbuf.c:4274
#6 _syscall_hook_trampoline () at /home/ubuntu/rr/src/preload/syscall_hook.S:308
#7 __morestack () at /home/ubuntu/rr/src/preload/syscall_hook.S:443
#8 _syscall_hook_trampoline_48_3d_00_f0_ff_ff () at /home/ubuntu/rr/src/preload/syscall_hook.S:462
#9 futex_wait (private=0, expected=2, futex_word=<LOCK_timer+40>) at ../sysdeps/nptl/futex-internal.h:146
#10 __GI___lll_lock_wait (futex=futex@entry=<LOCK_timer+40>, private=0) at ./nptl/lowlevellock.c:49
#11 ___pthread_mutex_lock (mutex=<LOCK_timer+40>) at ./nptl/pthread_mutex_lock.c:145
#12 safe_mutex_lock (mp=mp@entry=<LOCK_timer>, my_flags=my_flags@entry=0, file=file@entry="/data/Server/MDEV-37482A/mysys/thr_timer.c", line=line@entry=228) at /data/Server/MDEV-37482A/mysys/thr_mutex.c:286
#13 inline_mysql_mutex_lock (src_line=228, src_file="/data/Server/MDEV-37482A/mysys/thr_timer.c", that=<LOCK_timer>) at /data/Server/MDEV-37482A/include/mysql/psi/mysql_thread.h:750
#14 thr_timer_end (timer_data=) at /data/Server/MDEV-37482A/mysys/thr_timer.c:228
#15 tpool::thread_pool_generic::timer_generic::disarm (this=) at /data/Server/MDEV-37482A/tpool/tpool_generic.cc:372
#16 tpool::thread_pool_generic::~thread_pool_generic (this=, __in_chrg=<optimized out>) at /data/Server/MDEV-37482A/tpool/tpool_generic.cc:911
#17 tpool::thread_pool_generic::~thread_pool_generic (this=, __in_chrg=<optimized out>) at /data/Server/MDEV-37482A/tpool/tpool_generic.cc:928
#18 srv_thread_pool_end () at /data/Server/MDEV-37482A/storage/innobase/srv/srv0srv.cc:554
#19 srv_free () at /data/Server/MDEV-37482A/storage/innobase/srv/srv0srv.cc:592
#20 innodb_shutdown () at /data/Server/MDEV-37482A/storage/innobase/srv/srv0start.cc:2130
#21 innobase_end () at /data/Server/MDEV-37482A/storage/innobase/handler/ha_innodb.cc:4386
#22 ha_finalize_handlerton (plugin_=) at /data/Server/MDEV-37482A/sql/handler.cc:601
#23 plugin_deinitialize (plugin=, ref_check=ref_check@entry=true) at /data/Server/MDEV-37482A/sql/sql_plugin.cc:1274
#24 reap_plugins () at /data/Server/MDEV-37482A/sql/sql_plugin.cc:1345
#25 plugin_shutdown () at /data/Server/MDEV-37482A/sql/sql_plugin.cc:2086
#26 clean_up (print_message=print_message@entry=true) at /data/Server/MDEV-37482A/sql/mysqld.cc:2012
#27 mysqld_main (argc=<optimized out>, argv=<optimized out>) at /data/Server/MDEV-37482A/sql/mysqld.cc:6186
#28 main (argc=<optimized out>, argv=<optimized out>) at /data/Server/MDEV-37482A/sql/main.cc:34
RR trace is present on pluto:-/data/results/1758612370/002733
Description
The number of hash table cells in the InnoDB adaptive hash index was fixed on the initial
innodb_buffer_pool_size
and insufficient for some workloads, leading to excessively long hash bucket chains.Furthermore,
btr_sea::partition::insert()
andbtr_sea::partition::erase()
operations will be optimized to prefer a combination of a sharedlatch
and apage_hash_lock
that is pushed down to the hash table. In this way, these operations can run concurrently with each other as well as searches on other parts of the hash table of the samebtr_sea::partition
.Release Notes
We introduce the parameter
innodb_adaptive_hash_index_cells
that can be configured withSET GLOBAL
. The specified value will be effectively multiplied byinnodb_adaptive_hash_index_parts
, because each partition will contain its own hash table.How can this PR be tested?
Basing the PR against the correct MariaDB version
main
branch.PR quality check