-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
Description
Rarely, a dataset will seemingly finish processing and save to postgres (according to the logs) but it's row in postgres is not actually updated. I just saw an example of this checking a dataset for ESS-DIVE. Here is some log output:
20251020-21:13:07: [DEBUG]: Completed running quality suite. [edu.ucsb.nceas.mdqengine.Worker:222]
20251020-21:13:07: [DEBUG]: Searching for sequence id for pid: ess-dive-4c1829de1b8a2ec-20251020T211201461 [edu.ucsb.nceas.mdqengine.Worker:277]
20251020-21:13:07: [DEBUG]: Getting all runs (backward) for suiteId: ess-dive.data.center.suite-1.3.0, metadataId: ess-dive-4c1829de1b8a2ec-20251020T211201461, minDate: 2025-10-01T21:12:01.992Z, 2025-10-31T21:12:01.992Z [edu.ucsb.nceas.mdqengine.model.Run:227]
20251020-21:13:07: [DEBUG]: Getting next run: metadataId: ess-dive-4c1829de1b8a2ec-20251020T211201461, suiteId: ess-dive.data.center.suite-1.3.0 [edu.ucsb.nceas.mdqengine.model.Run:74]
20251020-21:13:07: [DEBUG]: Recursion level: 1 [edu.ucsb.nceas.mdqengine.model.Run:75]
20251020-21:13:07: [DEBUG]: Checking for next backward pid (obsoletes) [edu.ucsb.nceas.mdqengine.model.Run:149]
20251020-21:13:07: [DEBUG]: traversing backward to obsoletes: ess-dive-e9a1984660cb28a-20251020T205156799 [edu.ucsb.nceas.mdqengine.model.Run:158]
...unrelated statements for the version chain recursion...
20251020-21:13:07: [DEBUG]: Checking for next forward pid (obsoletedBy) [edu.ucsb.nceas.mdqengine.model.Run:134]
20251020-21:13:07: [DEBUG]: Reached end of forward (obsoletedBy) chain at pid: ess-dive-4c1829de1b8a2ec-20251020T211201461 [edu.ucsb.nceas.mdqengine.model.Run:145]
20251020-21:13:07: [DEBUG]: Shutting down store [edu.ucsb.nceas.mdqengine.model.Run:246]
20251020-21:13:07: [DEBUG]: Done getting all runs (in DataONE obsolescence chain) for : metadata PID: ess-dive-4c1829de1b8a2ec-20251020T211201461, suite id: ess-dive.data.center.suite-1.3.0 [edu.ucsb.nceas.mdqengine.model.Run:247]
20251020-21:13:07: [DEBUG]: Using found sequenceId: ess-dive-8c3d37b0a3679ee-20251020T182116464 [edu.ucsb.nceas.mdqengine.Worker:302]
20251020-21:13:07: [DEBUG]: Saving to persistent storage: metadata PID: ess-dive-4c1829de1b8a2ec-20251020T211201461, suite id: ess-dive.data.center.suite-1.3.0 [edu.ucsb.nceas.mdqengine.model.Run:272]
20251020-21:13:07: [DEBUG]: Done saving to persistent storage: metadata PID: ess-dive-4c1829de1b8a2ec-20251020T211201461, suite id: ess-dive.data.center.suite-1.3.0 [edu.ucsb.nceas.mdqengine.model.Run:277]
20251020-21:13:07: [DEBUG]: Updating sequenceId to ess-dive-8c3d37b0a3679ee-20251020T182116464 [edu.ucsb.nceas.mdqengine.Worker:313]
20251020-21:13:07: [DEBUG]: Updating runs with new sequence id: ess-dive-8c3d37b0a3679ee-20251020T182116464 [edu.ucsb.nceas.mdqengine.model.Run:269]
20251020-21:13:07: [DEBUG]: Updating modified runs to datastore. [edu.ucsb.nceas.mdqengine.model.Run:510]
based on this, the run should have been updated with the status of success and the run result. However, the db still shows:
metadig=> select results, suite_id, timestamp, status from runs where suite_id = 'ess-dive.data.center.suite-1.3.0' and metadata_id='ess-dive-4c1829de1b8a2ec-20251020T211201461';
results | suite_id | timestamp | status
-------------------------------------------------------------------------------------+----------------------------------+-------------------------------+------------
<?xml version="1.0" encoding="UTF-8"?> +| ess-dive.data.center.suite-1.3.0 | 2025-10-20 21:13:08.037554+00 | processing
<ns0:run xmlns:ns0="https://nceas.ucsb.edu/mdqe/v1"> +| | |
<timestamp>2025-10-20T21:13:08.037Z</timestamp> +| | |
<objectIdentifier>ess-dive-4c1829de1b8a2ec-20251020T211201461</objectIdentifier>+| | |
<suiteId>ess-dive.data.center.suite-1.3.0</suiteId> +| | |
<nodeId>urn:node:ESS_DIVE</nodeId> +| | |
<runStatus>processing</runStatus> +| | |
<errorDescription></errorDescription> +| | |
<sequenceId>ess-dive-8c3d37b0a3679ee-20251020T182116464</sequenceId> +| | |
<runCount>1</runCount> +| | |
</ns0:run> +| | |
| | |
(1 row)
So far I haven't been able to see any rhyme or reason to when this happens. Updating the dataset usually resolves the issue because the newer version runs as expected.