Skip to content

Conversation

@ghukill
Copy link
Collaborator

@ghukill ghukill commented Dec 16, 2025

Purpose and background context

When writing embeddings to the TIMDEX dataset, update the column timestamp to embedding_timestamp to match a schema change in the timdex-dataset-api library.

How can a reviewer manually see the effects of these changes?

Probably not worth the effort getting everything up and running. Suggested to analyze the unit test test_create_embeddings_writes_to_timdex_dataset which shows a write to the dataset.

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

Code review

  • Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

Why these changes are being introduced:

The embeddings schema column `timestamp` was changed to `embedding_timestamp`,
requiring a change in this application that writes to it.

Additionally, we had a unit test with a TODO to revisit after some read
methods were available in TDA.

How this addresses that need:
* Updates write schema to `embedding_timestamp`
* Updates test `test_create_embeddings_writes_to_timdex_dataset`

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-288
embedding_strategy=embedding.embedding_strategy,
embedding_vector=embedding.embedding_vector,
embedding_object=json.dumps(embedding.embedding_token_weights).encode(),
embedding_timestamp=embedding.embedding_timestamp.isoformat(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the most important change: setting the renamed column embedding_timestamp on write. Formerly we weren't setting a timestamp at all, relying on defaults in the TDA DatasetEmbedding class, but this is more explicit.



@pytest.fixture
def dataset_with_records(tmp_path) -> TIMDEXDataset:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new fixture gives us a real dataset, with real records, that allows for writing embeddings associated with real records and thereby supporting read methods.

@ghukill ghukill requested a review from a team December 16, 2025 20:14
@jonavellecuerdo jonavellecuerdo self-assigned this Dec 16, 2025
@ghukill ghukill merged commit 491e04b into main Dec 16, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants