fix: issue with timestamp comparison #248
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem:
The logstash-input-s3 plugin has a known issue regarding object timestamp logic, causing problems when using S3-compatible storage solutions other than AWS S3. This issue has been discussed in the following GitHub pull request and issue:
Pull Request: Fix object timestamp logic
Issue: sincedb file not created, files from bucket not deleted
Proposed Solution:
To address both problems, a suggested fix has been proposed in the pull request. This fix aims to make the logstash-input-s3 plugin compatible with more S3-compatible backends by improving the timestamp handling logic.
Context:
It’s important to note that the logstash-input-s3 plugin was originally designed to work only with AWS S3 and does not officially support other S3-compatible storage solutions. However, implementing the proposed fix would make the plugin suitable for a significant number of alternative S3-compatible solutions, eliminating the need for unsupported forks.
Microseconds Comparison:
The core issue lies in the comparison of timestamps with microseconds precision, which causes two main problems: the sincedb file not being created and duplicated reads of files from the S3 bucket. This issue is well-explained in the blog post titled “Time comparison in Ruby” by Railsware, which discusses the challenges and confusion associated with time comparison in Ruby. (Link)
Root Cause Uncertainty:
It’s worth noting that the root cause of the microseconds difference between file list timestamps in buckets and the last sincedb writes is still uncertain. This issue does not occur when using the logstash-input-s3 plugin with AWS S3, only on other S3 compatible backends.
Issue in question is present in Cloudfare R2 and DigitalOcean Spaces, and the fix has been tested with them as well:
Cloudflare R2: A S3-compatible storage solution provided by Cloudflare. (Link)
DigitalOcean Spaces: A S3-compatible object storage service offered by DigitalOcean. (Link)