- BUG FIX:
multipleOfvalidation- FIX LINK
- Due to floating point errors in Python and JSONSchema,
multipleOfvalidation has been failing.
- FEATURES:
JSONSchema: anyOfSupport- Streamed
JSONSchemas which includeanyOfcombinations should now be fully supported - This allows for full support of Stitch/Singer's
DateTimestring fallbacks.
- Streamed
JSONSchema: allOf` Support- Streamed
JSONSchemas which includeallOfcombinations should now be fully supported - Columns are persisted as normal.
- This is perceived to be most useful for merging objects, and putting in place things like
maxLengthetc.
- Streamed
- BUG FIX: Buffer Flushing at frequent intervals/with small batches
- FIX LINK
- Buffer size calculations relied upon some "sophisticated" logic for determining the "size" in memory of a Python object
- The method used by Singer libraries is to simply use the size of the streamed
JSONblob - Performance Improvement seen due to batches now being far larger and interactions with the remote being far fewer.
- BUG FIX:
NULLABLEnot being implied when field is missing from streamedJSONSchema- FIX LINK
- If a field was persisted in remote, but then left out of a subsequent streamed
JSONSchema, we would fail - In this instance, the field is implied to be
NULL, but additionally, if values are present for it in the streamed data, we should persist it.
- FEATURES:
- Performance improvement for upserting data
- Saw long running queries for some
SELECT COUNT(1)...queries- Resulting in full table scans
- These queries are only being used for
is_table_empty, therefore we can use a more efficientSELECT EXISTS(...)query which only needs a single row to be fetched
- Saw long running queries for some
- Performance improvement for upserting data
- FEATURES:
- Performance improvement for upserting data
- For large or even reasonably sized tables, trying to upsert the data was prohibitively slow
- To mitigate this, we now add indexes to allow
- This change can be opted out of via the
add_upsert_indexesconfig option - NOTE: This only effects intallations post
0.2.1, and will not upgrade/migrate existing installations
- Support for latest PostgreSQL 12.0
- PostgreSQL recently released 12.0, and we now have testing around it and can confirm that
target-postgresshould function correctly for it!
- PostgreSQL recently released 12.0, and we now have testing around it and can confirm that
- Performance improvement for upserting data
- BUG FIX:
STATEmessages being sent at the wrong time- FIX LINK
STATEmessages were being output incorrectly for feeds which had many streams outputting at varying rates
- NOTE: The
minorversion bump is not expected to have much effect on folks. This was done to signal the output change from the below bug fix. It is our impression not many are using this feature yet anyways. Since this was not apatchchange, we decided to make this aminorinstead ofmajorchange to raise less concern. Thank you for your patience! - FEATURES:
- Performance improvement for creating
tmptables necessary for uploading data- PostgreSQL dialects allow for creating a table identical to a parent table in a single command
CREATE TABLE <name> (LIKE <parent-name>);- Previously we leveraged using our
upserthelpers to create new tables. This resulted in many calls to remote, of varying complexity.
- Performance improvement for creating
- BUG FIX: No
STATEMessage Wrapper necessary- FIX LINK
STATEmessages are formatted as{"value": ...}target-potgresemitted the full message- The official
singer-target-template, doesn't write out thatvalue"wrapper", and just writes the JSON blob contained in it - This fix makes
target-postgresdo the same
- BUG FIX:
canonicalize_identifierNot called on all identifiers persisted to remote- FIX LINK
- Presently, on column splits/name collisions, we add a suffix to an identifier
- Previously, we did not canonicalize these suffixes
- While this was not an issue for any
targetscurrently in production, it was an issue for some up and comingtargets. - This fix simply makes sure to call
canonicalize_identifierbefore persisting an identifier to remote
- FEATURES:
- Root Table Name Canonicalization
- The
streamname is used for the value of the root table name in Postgres streamnames are controlled exclusively by the tap and do not have to meet many standards- Previously, only
streamnames which were lowercase, alphanumeric, etc. - Now, the
targetcan canonicalize the root table name, allowing for the inputstreamname to be whatever thetapprovides.
- The
- Root Table Name Canonicalization
- Singer-Python: bumped to latest 5.6.1
- Psycopg2: bumped to latest 2.8.2
- FEATURES:
STATEMessage support- Emits message only when all records buffered before the
STATEmessage have been persisted to remote.
- Emits message only when all records buffered before the
- SSL Support for Postgres
- Added config options for enabling/supporting SSL support.
- BUG FIX:
ACTIVATE_VERSIONMessages did not flush buffer- FIX LINK
- When we issue an activate version record, we presently do not flush the buffer after writing the batch. This results in more records being written to remote than need to be.
- This results in no functionality change, and should not alleviate any known bugs.
- This should be purely performance related.
- Singer-Python: bumped to latest
- Minor housekeeping:
- Updated container versions to latest
- Updated README to reflect new versions of PostgreSQL Server
- BUG FIX: A bug was identified for de-nesting.
- ISSUE LINK
- FAILING TESTS LINK
- FIX LINK
- Subtables with subtables did not serialize column names correctly
- The column names ended up having the table names (paths) prepended on them
- Due to the denested table schema and denested records being different no information showed up in remote.
- This bug was ultimately tracked down to the core denesting logic.
- This will fix failing uploads which had nullable columns in subtables but
no data was seen populating those columns.
- The broken schema columns will still remain
- Failing schemas which had non-null columns in subtables will still be broken
- To fix will require dropping the associated tables, potentially resetting the entire
db/schema
- To fix will require dropping the associated tables, potentially resetting the entire
- BUG FIX: A bug was identified for path to column serialization.
- LINK
- A nullable properties which had multiple JSONSchema types
- ie, something like
[null, string, integer ...] - Failed to find an appropriate column in remote to persist
Nonevalues to.
- ie, something like
- Found by usage of the Hubspot Tap
- FEATURES:
- Added the
persist_empty_tablesconfig option which allows the Target to create empty tables in Remote.
- Added the
- BUG FIX: A bug was identified in 0.1.3 with stream
key_propertiesand canonicalization.- LINK
- Discovered and fixed by @mirelagrigoras
- If the
key_propertiesfor a stream changed due to canonicalization, the stream would fail to persist due to:- the
persist_csv_rowskey_propertiesvalues would remain un-canonicalized (sp?) and therefore cause issues once serialized into a SQL statement - the pre-checks for tables would break because no values could be pulled from the schema with un-canonicalized fields pulled out of the
key_properties
- the
- NOTE: the
key_propertiesmetadata is saved with raw field names.
- SCHEMA_VERSION: 1
- LINK
- Initialized a new field in remote table schemas
schema_version - A migration in
PostgresTargethandles updating this
- BUG FIX: A bug was identified in 0.1.2 with column type splitting.
- LINK
- A schema with a field of type
stringis persisted to remote- Later, the same field is of type
date-time- The values for this field will not be placed under a new column, but rather under the original
stringcolumn
- The values for this field will not be placed under a new column, but rather under the original
- Later, the same field is of type
- A schema with a field of type
date-timeis persisted to remote- Later, the same field is of type
string- The original
date-timecolumn will be madenullable - The values for this field will fail to persist
- The original
- Later, the same field is of type
- FEATURES:
- Added the
logging_levelconfig option which uses standard Python Logger Levels to configure more details about what Target-Postgres is doing- Query level logging and timing
- Table schema changes logging and timing
- Added the