Skip to content

Conversation

@klondikedragon
Copy link

This package is amazing and hugely popular, and has been the best package for automatic date parsing in go for years! ⭐

Thanks @araddon for crafting this package with love over the years!!

I've been using this while developing a new cloud-based log aggregation/search/visualization product, and I've found that there are three major opportunities for improvement for my particular use case:

  • The package does not strictly validate its input, leading to many false positives. This is OK if you know for sure the input matches one of the known formats, but cannot be trusted if the input could be anything and you only want a returned date/time if it definitely matches a known format.
  • While still being far more efficient than the "shotgun" parsing approach, the package currently allocates a relatively large amount of memory (several times the average input size), which can add up when parsing megabytes of date strings per second in a high-throughput microservice. It can also be relatively slow when parsing a string that doesn't match a known format and can allocate even more memory in this case, due to custom error messages that include contextual details.
  • There are a lot of unmerged one-off contributions that haven't been merged and need to be made coherent with each other.

This PR addresses all 3 opportunities:

  • Validation: it now comprehensively validates the input in the following ways: at each point in the state machine, it makes sure there are cases for all possibilities, and any invalid possibility will fail; additionally, it makes sure that the entire format string was specifically set (excluding any trailing punctuation, which can be safely ignored). False positives should be extremely rare now (hard to prove they don't exist).
  • Memory Efficiency: bytes allocated have been reduced by 90%, and parsing some formats are zero allocation. The SimpleErrorMessages option was added (off by default) that greatly speeds up the case where a string does not match a known format -- with the option on, this case is now 4x faster and produces almost no allocations.
  • Merge & Integrate Community Fixes: Fixes for nearly all of the pending issues from the community (including open pull requests) have been incorporated and adapted.

In the process of going through the state machine comprehensively for validation, redundant code/states were merged, and support was added for certain edge cases (for example, some date formats did not support being followed by times).

The example and README.md were updated to incorporate all of the newly supported formats and edge cases. More details on how to properly interpret returned location information with respect to abbreviated timezones was added.

BREAKING -- the package now requires go >= 1.20 to support memory optimizations converting from []byte to string in key places.

A huge thanks to all who posted issues and contributed PRs -- while the PRs were unable to be merged directly because the validation changes were so major, the ideas of all these contributions and the associated test cases were incorporated. Here's credit for all of the issues fixes and contributions in this PR as well as a summary of additional fixes added:

Also adds tests to verify that the following stay fixed:

arran4 and others added 30 commits February 15, 2023 15:40
* Don't just assume we were given one of the valid formats.
* Also consolidate the parsing states that occur after timePeriod.
* Add subtests to make it easier to see what fails.
* Additional tests for 4-char timezone names.
* Fix araddon#117
* Fix araddon#150
* Fix araddon#157
* Fix araddon#145
* Fix araddon#108
* Fix araddon#137
* Fix araddon#130
* Fix araddon#123
* Fix araddon#109
* Fix araddon#98
* Addresses bug in araddon#100 (comment)

Adds test cases to verify the following are already fixed:
* araddon#94
Incorporates PR araddon#133 from https://github.com/mehanizm to fix araddon#129

Adds test cases to verify the following are already fixed:
* araddon#105
Fully support the format where a TZ name is in parentheses after the
time (and possibly after an offset). This fixes the broken case where a
4 character TZ name was in parentheses after a time.
@elliot40404
Copy link

great work @klondikedragon . How can i start using this?

@klondikedragon klondikedragon deleted the branch araddon:master January 9, 2024 01:59
@klondikedragon klondikedragon deleted the master branch January 9, 2024 01:59
@klondikedragon
Copy link
Author

I'll go ahead and fork this package. I'm renaming the main branch as part of that.

@klondikedragon klondikedragon restored the master branch January 9, 2024 02:02
@klondikedragon klondikedragon reopened this Jan 9, 2024
Various other cleanup:
* Update README.md
* Update github workflows
* Add to copyright
* Add .gitignore
@klondikedragon
Copy link
Author

The fork is complete and published as v0.1.0 -- again, a huge thanks to @araddon for authoring and maintaining this package for so many years!

The fork is available using go get github.com/itlightning/dateparse -- issues and PRs are welcome.

@elliot40404 @arran4 @jmdacruz -- see what you think and how this updated package works! If this looks good and after incorporating feedback, I think I'll publish a v1.0.0 at some point soon. I'm also curious to get feedback on my log management project too, check out the site/discord if you're interested. Thanks!

elliot40404 and others added 9 commits January 10, 2024 02:29
@klondikedragon klondikedragon deleted the branch araddon:master April 12, 2025 18:55
@klondikedragon klondikedragon deleted the master branch April 12, 2025 18:55
klondikedragon and others added 2 commits April 12, 2025 15:48
Many devices send dates that do not conform to the RFCs...

Also add support for the strange "TZ-0700" variant of the "UTC-0700"
offset.

Cover all the changes with new tests.
Support for RFC3164/RFC5424 syslog formats
@arran4
Copy link

arran4 commented Apr 13, 2025

What happened?

@klondikedragon
Copy link
Author

@arran4 sorry... I was cleaning up old branches when releasing a new version of https://github.com/itlightning/dateparse (part of work to add automated config-free syslog parsing/ingestion to https://sparklogs.com/ ), and I forgot that deleting the old master branch would close this PR (I'm using main branch actively now). I've recreated the branch so this PR can remain open πŸ™‚

v0.2.1 of the github.com/itlightning/dateparse package is now available with support for additional syslog RFC3164/RFC5424 time formats (and their many variants). Dependencies were also updated. Use/development of the package continues to be active there and feedback welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment