Switch from INSERT INTO to COPY FROM for postgres #43

brycekbargar · 2025-09-16T16:48:46Z

Using COPY FROM is an order of magnitude faster for bulk insertions for postgres. This is the last low-hanging fruit for download performance until async/await is supported and optimized for in a future future release. DuckDB can kind of sort of do bulk operations but it isn't a priority right now, especially with sqlite still supported in this release.

Getting bytes do go directly into the database was a bit of a struggle but postgres expects a "version" byte at the beginning of any record. By doing the byte munging ourselves we skip a whole lot of conversion in ldlite and conversion/logic in psycopg which would have added points of failure and slowness. I'm going to be testing to make sure it works across 5C FOLIO data before releasing the change.

I wasn't super happy with the weird type shenanigans happening in the last MR #43 but thought that doing a loads/dumps on all the source records to keep it consistent would be too slow. I discovered orjson.Fragment which means we can just use dumps and treat both srs and non-srs as bytes. This simplified the signatures and type handling quite a bit. Streaming support for SRS was rushed because non-streaming became even more unstable under Ramsons. There was a chance (though probably small) that someone was loading source-storage endpoints that weren't the source-records one and they would have broken. This adds more consistency around which endpoints we do support with streaming and fixes any accidental breaks.

brycekbargar added 4 commits September 15, 2025 20:21

For postgres, switch to using copy to insert data

c13d466

Ingest jsonb directly for postgres

25bc039

Update changelog

d9caa47

Rename variable

1386777

brycekbargar merged commit 63e9c88 into library-data-platform:release-v3.2.0 Sep 17, 2025
5 checks passed

brycekbargar mentioned this pull request Sep 18, 2025

Increase consistency of behavior around streaming srs endpoints #44

Merged

brycekbargar deleted the pg-copyto branch December 9, 2025 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from INSERT INTO to COPY FROM for postgres #43

Switch from INSERT INTO to COPY FROM for postgres #43

Uh oh!

brycekbargar commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Switch from INSERT INTO to COPY FROM for postgres #43

Switch from INSERT INTO to COPY FROM for postgres #43

Uh oh!

Conversation

brycekbargar commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant