Skip to content

Conversation

@ghukill
Copy link
Contributor

@ghukill ghukill commented Aug 26, 2025

Purpose and background context

This PR turns the stubbed tag_export.py notebook into a functional notebook. This is 90% a port from the POC notebook in timdex-notebooks and 10% improvement on that. This first pass is establishing a baseline similar to the POC to continue testing in a deployed context.

There are currently no meaningful tests. I'm feeling a little uncertain what kind of tests would be valuable for such a notebook, and would like to return to that in a future PR. Having this notebook functional will unblock us to test the notebook in a deployed fashion: configurations, permissions, memory/CPU resources tuning, etc.

How can a reviewer manually see the effects of these changes?

With those caveats above, it is functional as-is!

1- Set Dev1 AWS TimdexManagers credentials in terminal and set env vars:

TDA_LOG_LEVEL=DEBUG
WARNING_ONLY_LOGGERS=asyncio,botocore,urllib3,s3transfer,boto3,MARKDOWN
TIMDEX_DATASET_LOCATION=s3://timdex-extract-dev-222053980223/dataset_scratch/prod-clone

2- Start notebook with Makefile, that opens it in edit mode:

make edit-notebook-tag-export

3- View as "app" mode which is consistent with how users will see it. Click this button in the lower-right:

Screenshot 2025-08-26 at 1 18 10 PM

4- Experiment with MARC tags (e.g. try 650,655 or 918,985,900), and limits, etc! Just be aware that omitting the limit, or setting it large, can take quite awhile. I think the full Alma "current records" is about 3.9m, and takes about 10 minutes to download and parse tags.

As I type this... realizing we may have an opportunity for caching records we've downloaded; maybe For a future ticket 😎.

5- Perform an export. Try a small set like:

Screenshot 2025-08-26 at 1 21 50 PM

Then try to export it using the "Download" UI in the lower-right:

Screenshot 2025-08-26 at 1 22 32 PM

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

Why these changes are being introduced:

This notebook repository is meant to port some POC notebooks from the repository
'timdex-notebooks' to notebooks we can launch as self-service for staff.  This
commit targets 'marimo_notebooks/marc_tag_values.py' from that repository
specifically as a port.

How this addresses that need:
* Completes the stubbed notebook 'tag_export.py'
* Notebook is functional, requiring only env var 'TIMDEX_DATASET_LOCATION'
and appropriate permissions

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1438
@ghukill ghukill marked this pull request as ready for review August 26, 2025 17:45
@ghukill ghukill requested a review from a team as a code owner August 26, 2025 17:45
@ehanson8 ehanson8 self-assigned this Aug 26, 2025
Copy link

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and works as expected! The unit test question for marimo notebooks will be interesting to discuss, I agree that it'll be easier to talk about after we get a better sense of the process. I'm also interested in exploring functional code outside of the notebook as we previously discussed, that could be unit tested more traditionally. But getting this out is the excellent and necessary first step!

@ghukill
Copy link
Contributor Author

ghukill commented Aug 26, 2025

Looks good and works as expected! The unit test question for marimo notebooks will be interesting to discuss, I agree that it'll be easier to talk about after we get a better sense of the process. I'm also interested in exploring functional code outside of the notebook as we previously discussed, that could be unit tested more traditionally. But getting this out is the excellent and necessary first step!

100% agree @ehanson8.

In this scenario, 99% of the work we do is just TDA functionality and I wouldn't want to test that. BUT, I'd love to test the MARC parsing using the marcalyx library, and that would be nice external code (outside the notebook) that we could test.

@ghukill ghukill merged commit 324c0d7 into main Aug 26, 2025
2 checks passed
@ghukill ghukill deleted the IN-1438-tag-export-notebook branch September 2, 2025 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants