I have 3 Andrenidae nodes in the 'Hymenoptera (recent)' collection in a database named naturkundemuseum_test_2026_04_27 (available here).
@bronwyncombs did some research on this after there were multiple "Lock wait timeout exceeded" messages received:
Andrenidae (0,0) | ID 341182 | created 2015-08-13 | source catalog of life
49 child genus records (all with children)
query family > name, family>ID family>ts created and modified = 3042 results/children
Andrenidae (0,8602) | ID 887847 | created 2026-04-24 | source none
36 child genus records (23 with children)
same query structure = 370 results/children
Andrenidae (1398) | ID 887983 | created 2026-04-22 | source local_db
0 child nodes
same query structure = 1 result
merge 341182 into 887847 = fail 504 timeout
Merge 887847 into 341182 = fail 504 timeout
merge 341182 into 887983 = fail (fast; 500 operational error try starting transaction)
merge 887847 into 887983 = fail (slower; same operational message)
merge 887983 into 887847 = SUCCESS
merge 887847 into 341182? = fail
merge 341182 into 887847? = fail
Error:
{
"type": "invalidResponseCode",
"statusText": [
"Invalid response code 500.",
"\n This error may indicate a misconfiguration or a bug in Specify. Please\n double check your configuration and report this issue.\n ",
"Response:"
],
"responseText": "{\"exception\": \"OperationalError\", \"message\": \"1205\", \"data\": \"Lock wait timeout exceeded; try restarting transaction\", \"traceback\": \"Traceback (most recent call last):\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/utils.py\\\", line 89, in _execute\\n return self.cursor.execute(sql, params)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/mysql/base.py\\\", line 75, in execute\\n return self.cursor.execute(query, args)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/cursors.py\\\", line 206, in execute\\n res = self._query(query)\\n ^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/cursors.py\\\", line 319, in _query\\n db.query(q)\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/connections.py\\\", line 254, in query\\n _mysql.connection.query(self, query)\\nMySQLdb.OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')\\n\\nThe above exception was the direct cause of the following exception:\\n\\nTraceback (most recent call last):\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/core/handlers/base.py\\\", line 197, in _get_response\\n response = wrapped_callback(request, *callback_args, **callback_kwargs)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/specifyweb/specify/views.py\\\", line 32, in wrapped\\n return view(request, *args, **kwargs)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/views/decorators/http.py\\\", line 43, in inner\\n return func(request, *args, **kwargs)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/usr/lib/python3.12/contextlib.py\\\", line 81, in inner\\n return func(*args, **kwds)\\n ^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/specifyweb/backend/trees/views.py\\\", line 63, in wrapper\\n mutation(*args, **kwargs)\\n File \\\"/opt/specify7/specifyweb/backend/trees/views.py\\\", line 292, in merge\\n extras.merge(node, target, request.specify_user_agent)\\n File \\\"/opt/specify7/specifyweb/backend/trees/extras.py\\\", line 335, in merge\\n target = model.objects.select_for_update().get(id=into.id)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/models/query.py\\\", line 635, in get\\n num = len(clone)\\n ^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/models/query.py\\\", line 382, in __len__\\n self._fetch_all()\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/models/query.py\\\", line 1886, in _fetch_all\\n self._result_cache = list(self._iterable_class(self))\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/models/query.py\\\", line 93, in __iter__\\n results = compiler.execute_sql(\\n ^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/models/sql/compiler.py\\\", line 1562, in execute_sql\\n cursor.execute(sql, params)\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/utils.py\\\", line 67, in execute\\n return self._execute_with_wrappers(\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/utils.py\\\", line 80, in _execute_with_wrappers\\n return executor(sql, params, many, context)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/utils.py\\\", line 84, in _execute\\n with self.db.wrap_database_errors:\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/utils.py\\\", line 91, in __exit__\\n raise dj_exc_value.with_traceback(traceback) from exc_value\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/utils.py\\\", line 89, in _execute\\n return self.cursor.execute(sql, params)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/mysql/base.py\\\", line 75, in execute\\n return self.cursor.execute(query, args)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/cursors.py\\\", line 206, in execute\\n res = self._query(query)\\n ^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/cursors.py\\\", line 319, in _query\\n db.query(q)\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/connections.py\\\", line 254, in query\\n _mysql.connection.query(self, query)\\ndjango.db.utils.OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')\\n\"}"
}
After adding some logging in a one-off branch (tree-logging), I found the following when merging :
The merge processed 10 children (ids 360761-360769) in ~280 seconds total before the worker was killed. The key finding:
[TIMER] merge: recursive merge child id=341763 -> matched id=887848 took 280.361 seconds
Per-child cost breakdown
Each child takes ~6-7 seconds to reparent, broken down as:
| Step |
Time |
What it does |
select_for_update |
0.001s |
Lock the child row |
open_interval: shift right |
1.4-2.1s |
UPDATE taxon SET nodenumber=nodenumber+1 WHERE nn>198321 — shifts all nodes right to make room |
open_interval: expand containing |
0.4-0.7s |
Update parent's highest child node number |
move_interval |
0.4-0.7s |
Move the child's interval to the new position |
close_interval: reduce containing |
0.4-0.5s |
Update old parent's highest child node number |
close_interval: shift left |
1.4-1.7s |
UPDATE taxon SET nodenumber=nodenumber-1 WHERE nn>199024 — shifts all nodes left to close gap |
set_fullnames |
0.4-0.5s |
17-level self-join UPDATE to rebuild full names |
validate_tree_numbering |
included |
Full table scan to validate tree integrity |
| Total |
~6-7s |
|
The set_fullnames SQL is massive
The SQL query is a 17-level self-join on taxon with 17 joins to taxontreedefitem, building the full name using deeply nested concat_ws and if expressions. It runs once per child.
Root cause: O(N²) complexity
For a merge with N children, the merge does:
- 5 × N bulk UPDATEs on the
taxon table (open_interval ×2, move_interval, close_interval ×2)
- N
set_fullnames queries (each a 17-level self-join)
- N
validate_tree_numbering queries (full table scan)
With 10 children taking 280 seconds, merging a node with 100 children would take ~2800 seconds (~47 minutes).
The actual timeout
The worker was killed by gunicorn's SIGKILL with message Perhaps out of memory? — the merge was consuming too much memory, likely from holding all the select_for_update locks and the large result sets from the 17-level self-joins.
Immediate workarounds
- Increase gunicorn timeout — already
-t 300 in Dockerfile, but in this case the worker was killed by OOM, not timeout
- Increase MySQL
innodb_lock_wait_timeout — SET GLOBAL innodb_lock_wait_timeout = 120;
- Increase nginx proxy_read_timeout — maybe add
proxy_read_timeout 300s; to nginx config?
Long-term fix needed
The merge algorithm needs to be refactored to:
- Batch all child reparenting — change all children's
parentid in a single UPDATE instead of one at a time
- Run interval operations once — shift intervals once for the batch, not per child
- Run
set_fullnames once — update all affected nodes in a single query
- Run validation once — at the end, not after each child
This would reduce the complexity from O(N) to O(1) for the bulk operations.
I have 3
Andrenidaenodes in the 'Hymenoptera (recent)' collection in a database namednaturkundemuseum_test_2026_04_27(available here).@bronwyncombs did some research on this after there were multiple "Lock wait timeout exceeded" messages received:
Error:
{ "type": "invalidResponseCode", "statusText": [ "Invalid response code 500.", "\n This error may indicate a misconfiguration or a bug in Specify. Please\n double check your configuration and report this issue.\n ", "Response:" ], "responseText": "{\"exception\": \"OperationalError\", \"message\": \"1205\", \"data\": \"Lock wait timeout exceeded; try restarting transaction\", \"traceback\": \"Traceback (most recent call last):\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/utils.py\\\", line 89, in _execute\\n return self.cursor.execute(sql, params)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/mysql/base.py\\\", line 75, in execute\\n return self.cursor.execute(query, args)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/cursors.py\\\", line 206, in execute\\n res = self._query(query)\\n ^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/cursors.py\\\", line 319, in _query\\n db.query(q)\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/connections.py\\\", line 254, in query\\n _mysql.connection.query(self, query)\\nMySQLdb.OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')\\n\\nThe above exception was the direct cause of the following exception:\\n\\nTraceback (most recent call last):\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/core/handlers/base.py\\\", line 197, in _get_response\\n response = wrapped_callback(request, *callback_args, **callback_kwargs)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/specifyweb/specify/views.py\\\", line 32, in wrapped\\n return view(request, *args, **kwargs)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/views/decorators/http.py\\\", line 43, in inner\\n return func(request, *args, **kwargs)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/usr/lib/python3.12/contextlib.py\\\", line 81, in inner\\n return func(*args, **kwds)\\n ^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/specifyweb/backend/trees/views.py\\\", line 63, in wrapper\\n mutation(*args, **kwargs)\\n File \\\"/opt/specify7/specifyweb/backend/trees/views.py\\\", line 292, in merge\\n extras.merge(node, target, request.specify_user_agent)\\n File \\\"/opt/specify7/specifyweb/backend/trees/extras.py\\\", line 335, in merge\\n target = model.objects.select_for_update().get(id=into.id)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/models/query.py\\\", line 635, in get\\n num = len(clone)\\n ^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/models/query.py\\\", line 382, in __len__\\n self._fetch_all()\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/models/query.py\\\", line 1886, in _fetch_all\\n self._result_cache = list(self._iterable_class(self))\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/models/query.py\\\", line 93, in __iter__\\n results = compiler.execute_sql(\\n ^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/models/sql/compiler.py\\\", line 1562, in execute_sql\\n cursor.execute(sql, params)\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/utils.py\\\", line 67, in execute\\n return self._execute_with_wrappers(\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/utils.py\\\", line 80, in _execute_with_wrappers\\n return executor(sql, params, many, context)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/utils.py\\\", line 84, in _execute\\n with self.db.wrap_database_errors:\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/utils.py\\\", line 91, in __exit__\\n raise dj_exc_value.with_traceback(traceback) from exc_value\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/utils.py\\\", line 89, in _execute\\n return self.cursor.execute(sql, params)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/django/db/backends/mysql/base.py\\\", line 75, in execute\\n return self.cursor.execute(query, args)\\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/cursors.py\\\", line 206, in execute\\n res = self._query(query)\\n ^^^^^^^^^^^^^^^^^^\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/cursors.py\\\", line 319, in _query\\n db.query(q)\\n File \\\"/opt/specify7/ve/lib/python3.12/site-packages/MySQLdb/connections.py\\\", line 254, in query\\n _mysql.connection.query(self, query)\\ndjango.db.utils.OperationalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')\\n\"}" }After adding some logging in a one-off branch (tree-logging), I found the following when merging :
The merge processed 10 children (ids 360761-360769) in ~280 seconds total before the worker was killed. The key finding:
Per-child cost breakdown
Each child takes ~6-7 seconds to reparent, broken down as:
select_for_updateopen_interval: shift rightUPDATE taxon SET nodenumber=nodenumber+1 WHERE nn>198321— shifts all nodes right to make roomopen_interval: expand containingmove_intervalclose_interval: reduce containingclose_interval: shift leftUPDATE taxon SET nodenumber=nodenumber-1 WHERE nn>199024— shifts all nodes left to close gapset_fullnamesvalidate_tree_numberingThe
set_fullnamesSQL is massiveThe SQL query is a 17-level self-join on
taxonwith 17 joins totaxontreedefitem, building the full name using deeply nestedconcat_wsandifexpressions. It runs once per child.Root cause: O(N²) complexity
For a merge with N children, the merge does:
taxontable (open_interval ×2, move_interval, close_interval ×2)set_fullnamesqueries (each a 17-level self-join)validate_tree_numberingqueries (full table scan)With 10 children taking 280 seconds, merging a node with 100 children would take ~2800 seconds (~47 minutes).
The actual timeout
The worker was killed by gunicorn's
SIGKILLwith messagePerhaps out of memory?— the merge was consuming too much memory, likely from holding all theselect_for_updatelocks and the large result sets from the 17-level self-joins.Immediate workarounds
-t 300in Dockerfile, but in this case the worker was killed by OOM, not timeoutinnodb_lock_wait_timeout—SET GLOBAL innodb_lock_wait_timeout = 120;proxy_read_timeout 300s;to nginx config?Long-term fix needed
The merge algorithm needs to be refactored to:
parentidin a single UPDATE instead of one at a timeset_fullnamesonce — update all affected nodes in a single queryThis would reduce the complexity from O(N) to O(1) for the bulk operations.