-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
Description
Ticket Update: December, 19 2025
Issue: Refresh of APC data for September to November 2025
Resolution: Load the dataset onto the databox
Description:
A data file containing 141 new occurrence records was received from the data provider and encountered multiple issues during ingestion. For details of the previous issues, see: Issue #1284.
Steps:
- A DwCA was downloaded from
dwca-imports(databox), containing a merged DwCA for old and new records. However, the UUID column was empty for new records, indicating that UUIDs were not generated. - To address this, the UUID column was removed, and
occurrence.csvwas tidied. - The cleaned file was uploaded to
collectory-testand pre-ingestion was run. - Both preingestion and load_small_dataset passed successfully. The UUID count was then checked:
INFO [2025-12-19 01:07:02,659+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: newUuids: 263.0, preservedUuids: 8114.0, orphanedUniqueKeys: 108.0
INFO [2025-12-19 01:07:02,659+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Percentage UUID change: 3, allowed percentage: 50, override percentage check: false
- The new UUID count confirmed successful ingestion of the new records.
- SOLR dataset indexing was run to verify the change.
- New records were confirmed in Databox: dr8128 on collections-test
Next step:
Ingest the same occurrence.csv on production to update the records.