transcripts_count

**The discrepancy between the transcripts_count and the actual transcript files.**

The collection id used for this test is 1797, and the resource_id is 62203.

Running the command below:
```
 python get_collection_resources.py 1797
```

The output is shown below with transcripts_count=2
```
    {
        "resource_id": 62203,
        "title": "title1$ test - DO NOT DELETE",
        "media_file_id": [
            143091,
            143092
        ],
        "media_files_count": 2,
        "transcripts_count": 2,
        "indexes_count": 5,
        "persistent_url": "https://ualberta.aviaryplatform.com/r/h41jh3dw0c",
        "direct_url": "https://ualberta.aviaryplatform.com/collections/1797/collection_resources/62203",
        "created_at": "2022-01-12 03:03:06 UTC",
        "updated_at": "2025-04-02 21:51:30 UTC"
    },

```

However, using the API to query these transcripts, we only get 1

```
python get_transcript_files.py 62203
{
    "data": {
        "id": 62203,
        "resource_file_id": 55344,
        "is_caption": false,
        "is_public": false,
        "title": "trint_mssa_hvt_1851_p1of2_transcript.vtt",
        "language": "en",
        "description": null,
        "is_downloadable": "No",
        "export": {
            "webvtt": {
                "file": "https://ualberta.aviaryplatform.com/api/v1/transcripts/62203/export/webvtt",
                "file_name": "trint_mssa_hvt_1851_p1of2_transcript.vtt",
                "file_content_type": "text/vtt"
            },
            "txt": {
                "file": "https://ualberta.aviaryplatform.com/api/v1/transcripts/62203/export/txt",
                "file_name": "trint_mssa_hvt_1851_p1of2_transcript.txt",
                "file_content_type": "text/plain"
            },
            "json": {
                "file": "https://ualberta.aviaryplatform.com/api/v1/transcripts/62203/export/json",
                "file_name": "trint_mssa_hvt_1851_p1of2_transcript.json",
                "file_content_type": "text/json"
            }
        }
    },
    "success": true
}
```

Need clarifications for 
v1) the number discrepancy for the transcripts
2) when it comes to preservation, the transcript has different formats: json, txt, and webvtt, do we preserve all different formats or just choose one?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

transcripts_count #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

transcripts_count #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions