Skip to content

Assign urls to edx contentfiles when possible #2420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 8, 2025
Merged

Conversation

mbertrand
Copy link
Member

@mbertrand mbertrand commented Aug 7, 2025

What are the relevant tickets?

Related to https://github.com/mitodl/hq/issues/7989

Description (What does it do?)

  • Tries to assign urls to edx contentfiles when possible
  • Creates a mapping of transcript files to video contentfiles, so that the url for a transcript file can link to the parent video (if the parent can be determined from the video metadata - otherwise a direct link to the transcript file is generated).

How can this be tested?

Find a course from each of the following sources and ingest the contentfiles for it: mit_edx, mitxonline, xpro, oll:

./manage.py backpopulate_<etl_source>_files --resource-ids <course_id> --overwrite

It should complete without errors, and the majority of contentfiles should then have urls. Try out a bunch of links to make sure they work, including urls that contain asset and urls that contain block

Also run ./manage.py backpopulate_canvas_courses --canvas-ids <course_readable_id> --overwrite to make sure that still works normally.

For mit_edx, mitxonline and xpro courses, you may be redirected to a "You must be enrolled in this course" page for these links if you are not logged in, or logged in as someone other than a superuser.

@mbertrand mbertrand added Work in Progress Needs Review An open Pull Request that is ready for review and removed Work in Progress labels Aug 7, 2025
@shanbady shanbady self-requested a review August 8, 2025 15:05
Copy link
Contributor

@shanbady shanbady left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. I didnt run into any issues. just left a non-blocking question



# Base content URLs for different sources
CONTENT_BASE_URL_MITXONLINE = get_string(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will these values also work fine for rc? (do we need an associated PR in ol-infrastructure?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, the xpro, mitxonline, and oll buckets and their corresponding api urls in RC are all pointing to production. But for mit_edx, the urls point to production and the bucket is a qa bucket. Maybe there's no real difference between the buckets but this seems like a bug on the ol-infrastructure side, so I'll make a PR to fix that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shanbady shanbady added Waiting on author and removed Needs Review An open Pull Request that is ready for review labels Aug 8, 2025
@mbertrand mbertrand merged commit 1210dce into main Aug 8, 2025
13 checks passed
@mbertrand mbertrand deleted the mb/edx_cf_urls branch August 8, 2025 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants