Skip to content

Conversation

@Birddude1230
Copy link

Not using the batch API for opensearch (and likely for elasticsearch as well) is very bad for performance, easily causing event backups in moderately busy clusters. This commit updates the batch writer to make it a little more clear in naming and configuration (while also updating the one existing user, bigquery, to keep its current interface). This also adds the ability to force the opensearch sink to output events with a consistent field as their most recent timestamp, which is very useful for datastreams.

Specifically, this PR:

  • Updates the Dockerfile to pull dependencies first, speeding up local builds (when iterating).
  • Renames the (now) BufferWriter and its configuration struct, to make usage and configuration more transparent and modular -- specifically, it's now reasonable to directly reference the BufferWriterConfig struct in sink-specific configuration.
  • Update all the tests to work with BatchIntervalSeconds instead of the previous BatchInterval, which was set in ms for tests previously.
  • Update Opensearch sink to use this new BufferWriter to send events to the _bulk api.
  • Update Opensearch sink to support CombineTimestampTo, a key to set to the latest of LastTimestamp and EventTime. Datastreams require a consistent timestamp field, and using the latest timestamp makes the most sense to me, especially given our use case of not using the id, so all events are new documents.

Not using the batch API for opensearch (and likely for elasticsearch as well)
is very bad for perfomance, easliy causing event backups in moderately busy
clusters. This commit updates the batch writer to make it a little more clear
in naming and configuration (while also updating the one existing user,
bigquery, to keep its current interface). This also adds the ability to force
the opensearch sink to output events with a consistent field as their most
recent timestamp, which is very useful for datastreams.
@Birddude1230
Copy link
Author

CombineTimestampTo in particular addresses resmoio#62 .

This implementation of the buffer for opensearch can likely be copied exactly to elasticsearch as well to address resmoio#53 , but I do not have an elasticsearch cluster to test against.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants