Skip to content

Conversation

@jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Jan 6, 2025

Purpose and background context

Transformed records are recorded as serialized JSON strings under the transformed_record column for each row in a TIMDEXDataset. This new method will resemble the read methods implemented via https://mitlibraries.atlassian.net/browse/TIMX-417 , with the additional step of parsing the JSON string and yielding dictionaries of transformed records.

How can a reviewer manually see the effects of these changes?

Reviewing the new unit test should be sufficient for this PR.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

YES - These changes came from initial discussions on how TIM accesses transformed records from a TIMDEXDataset. Applications like TIM can use this method to retrieve parsed transformed records from the dataset.

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:
* Transformed records are recorded as serialized JSON strings
under the 'transformed_record' column for each row in a TIMDEXDataset.
This method allows applications for TIMDEX ETL to easily retrieve
parsed transformed records from a dataset via this library.

How this addresses that need:
* Add 'read_transformed_records_iter' method to TIMDEXDataset

Side effects of this change:
* Applications like TIM can now retrieve parsed transformed records
from dataset

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-453
@jonavellecuerdo jonavellecuerdo self-assigned this Jan 6, 2025
@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review January 6, 2025 18:53
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short and sweet, looks great to me!

@jonavellecuerdo jonavellecuerdo merged commit 0aa8b92 into main Jan 7, 2025
2 checks passed
@jonavellecuerdo jonavellecuerdo deleted the TIMX-453-read-transformed-records-from-dataset branch January 7, 2025 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants