Skip to content

Merge Data Load : APC data for September to November 2025 #1285

@cha801p

Description

@cha801p

Ticket Update: December, 19 2025

Issue: Refresh of APC data for September to November 2025

Resolution: Load the dataset onto the databox

Description:
A data file containing 141 new occurrence records was received from the data provider and encountered multiple issues during ingestion. For details of the previous issues, see: Issue #1284.

Steps:

  1. A DwCA was downloaded from dwca-imports(databox), containing a merged DwCA for old and new records. However, the UUID column was empty for new records, indicating that UUIDs were not generated.
  2. To address this, the UUID column was removed, and occurrence.csv was tidied.
  3. The cleaned file was uploaded to collectory-test and pre-ingestion was run.
  4. Both preingestion and load_small_dataset passed successfully. The UUID count was then checked:
INFO  [2025-12-19 01:07:02,659+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: newUuids: 263.0, preservedUuids: 8114.0, orphanedUniqueKeys: 108.0
INFO  [2025-12-19 01:07:02,659+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Percentage UUID change: 3, allowed percentage: 50, override percentage check: false
  1. The new UUID count confirmed successful ingestion of the new records.
  2. SOLR dataset indexing was run to verify the change.
  3. New records were confirmed in Databox: dr8128 on collections-test

Next step:
Ingest the same occurrence.csv on production to update the records.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions