Skip to content

Conversation

shrija-sharma
Copy link

Byte Size and Time Duration Parsers with Aggregation Support
This PR introduces comprehensive support for parsing and aggregating byte size and time duration values in Wrangler, making it easier to work with file sizes and time measurements in data pipelines.

Key Features Added

  1. Token Types and Parsing Classes
    Added BYTE_SIZE and TIME_DURATION token types to the TokenType enum
    Implemented ByteSize class to parse values like "10KB", "1.5MB", "4.2GB"
    Implemented TimeDuration class to parse values like "100ms", "2.5s", "1.5h"
  2. Unit Conversion Utilities
    Support converting between byte units (B, KB, MB, GB, TB, PB)
    Support converting between time units (ns, ms, s, m, h, d)
    Proper handling of fractional values (e.g., "1.5MB", "2.5s")
  3. Enhanced Directives
    Added aggregate-stats directive for statistical analysis of byte sizes and time durations
    Comprehensive validation and error handling for malformed inputs
    Case-insensitive unit parsing for better usability
  4. Documentation and Testing
    Comprehensive unit tests for all new functionality
    Updated documentation with examples and usage guidelines
    JavaDoc for all public methods and classes

Copy link

google-cla bot commented Apr 16, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@shrija-sharma
Copy link
Author

Assignment1 Completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant