-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
The discrepancy between the transcripts_count and the actual transcript files.
The collection id used for this test is 1797, and the resource_id is 62203.
Running the command below:
python get_collection_resources.py 1797
The output is shown below with transcripts_count=2
{
"resource_id": 62203,
"title": "title1$ test - DO NOT DELETE",
"media_file_id": [
143091,
143092
],
"media_files_count": 2,
"transcripts_count": 2,
"indexes_count": 5,
"persistent_url": "https://ualberta.aviaryplatform.com/r/h41jh3dw0c",
"direct_url": "https://ualberta.aviaryplatform.com/collections/1797/collection_resources/62203",
"created_at": "2022-01-12 03:03:06 UTC",
"updated_at": "2025-04-02 21:51:30 UTC"
},
However, using the API to query these transcripts, we only get 1
python get_transcript_files.py 62203
{
"data": {
"id": 62203,
"resource_file_id": 55344,
"is_caption": false,
"is_public": false,
"title": "trint_mssa_hvt_1851_p1of2_transcript.vtt",
"language": "en",
"description": null,
"is_downloadable": "No",
"export": {
"webvtt": {
"file": "https://ualberta.aviaryplatform.com/api/v1/transcripts/62203/export/webvtt",
"file_name": "trint_mssa_hvt_1851_p1of2_transcript.vtt",
"file_content_type": "text/vtt"
},
"txt": {
"file": "https://ualberta.aviaryplatform.com/api/v1/transcripts/62203/export/txt",
"file_name": "trint_mssa_hvt_1851_p1of2_transcript.txt",
"file_content_type": "text/plain"
},
"json": {
"file": "https://ualberta.aviaryplatform.com/api/v1/transcripts/62203/export/json",
"file_name": "trint_mssa_hvt_1851_p1of2_transcript.json",
"file_content_type": "text/json"
}
}
},
"success": true
}
Need clarifications for
v1) the number discrepancy for the transcripts
2) when it comes to preservation, the transcript has different formats: json, txt, and webvtt, do we preserve all different formats or just choose one?
Metadata
Metadata
Assignees
Labels
No labels