Skip to content

[BUG] Missing data in targets due to key properties in events #34

@diegobatt

Description

@diegobatt

All the Events-based streams have the same problem, when dumped into a target (Postgres in my case) there is missing data. I explored the issue at seems to not be due to the data retrieval part because all the data is in the state.json outputted by the tap. Nevertheless, the schema is not properly reproduced in these streams. for instance, in feature_events, the table-keys are:

"visitor_id", "account_id", "server", "remote_ip"

But this is no how the stream really works, we can have more than one row for that combination of values (we usually do, actually), as an example the same user with the same IP in the same server could make events in two different features, but feature_id is not a table-key. The result of this is that when one event for this combination is added to the table, it blocks the rest of them due to primary key constraints, resulting in missing events.
Changing this table-keys to:

"visitor_id",  "account_id", "server", "remote_ip", "day", "feature_id"

seems to solve the problem. From the code perspective, it might be a problem to add day since at the time you define the table keys properties you don't know what the period will be since this is a class property. I'm sure there is a workaround though, maybe adding both day and hour as key properties.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions