-
Notifications
You must be signed in to change notification settings - Fork 6
2. Azure Synchronization
The ultimate deliverables for our application will be a set of data files (xml, txt, csv) and media files delivered to Azure file storage. (Note that this is Azure file storage and not an Azure storage blob.) SkillRX will store and manage its media files in its own file storage. A process within SkillRX will manage synchronization with Azure.
The Azure storage includes folders for delivery to these devices:
- Mini computers (CSV and media)
- Raspberry Pi Devices (XML, TXT, and media)
- USB Storage (XML, TXT, and media)
The systems we are replacing do not deliver CSV files, so the launch of the mini computers in the first week of August 2025 is dependent on SkillRX going live. As of this writing, we are not planning to support USB storage devices.
Our Azure storage is arranged by device and language. The root directory contains:
- a directory for english language mini content (CMES-Mini)
- a directory for english language Raspberry Pi content (CMES-Pi)
- a directory for archived Raspberry Pi content (CMES-Pi_Archive). This will contain archived media files regardless of language or device. NOTE THAT THIS IS A KEY REQUIREMENTS CHANGE. It is a late change but does not have to be fully-implemented for launch: we need to be archiving files for launch but the particularities of the archival destination can either match this requirement or match the previous requirements with separate archival locations based on language.
- a directory for Raspberry Pi content for every other language that has content (currently only spanish). These have Language.code and an underscore prepended to their names. E.g. "SP_CMES-Pi_Archive"
- a directory for Mini content for Spanish is not necessary at launch, but the Minis are following the same language requirements as the Pi devices. I.e. the CMES-Mini will only receive files and data for topics in English.
For these requirements, we will refer to these as "core directories" or "core directory archives".
Each core directory contains subdirectories into which we will be placing the relevant data files and media. Media storage is consistent across core directories. In all cases, the uploaded media files go into "[core directory]/assets/content". Archived media files go into the root of the relevant core directory archive. E.g. an archived training material PDF from "SP_CMES-Pi/assets/content" will be moved into "SP_CMES-Pi_archive".
The file names for media will follow this pattern: "[topic_id]_[filename_with_extension]". We are using the same naming convention when storing these files in S3 for SkillRX and we are maintaining the Topic ID values when we import the data from CMES-Pi.
The mini computers rely on a set of .csv files which they import to their local database. All text fields should be sanitized for anything that could disrupt a CSV import. (The Tags.csv example file for currently on staging contains problematic tags.)
The mini computer directories follow the same language-based division as the Pi directories. At launch, there will be only English content, so there will only be one directory: CMES-mini. Later, as languages are added, there will be directories for SP_CMES-mini and potentially others.
CSV files and media delivery will be filtered by language. The English Mini directory, CMES-mini, will only receive media and csv files related to English-language topics.
The files are uploaded to CMES-mini/assets/csv
This file exists in the Azure repository but we are not supporting it. We can disregard this file.
I believe we can disregard this file. [need to check with stakeholders]
Information about training materials.
Fields:
- TopicID. The ID of the topic with which the training material media is associated.
- FileName. The full name of the file following the naming conventions described in our requirements for training material uploads: no spaces, prepend topic_id and an underscore to the file name. E.g. "12345_my_uploaded_file.pdf"
- FileType. 2 for MP3. 1 for PDF. [Check with stakeholders for other values.]
- FileSize. The size of the file.
Fields:
- TagID. The ID in SkillRX.
- Tag. The text of the tag.
Fields:
- TopicID. The ID in SkillRX.
- TopicName. Topic.title.
- TopicVolume. Topic.published_at.year.
- TopicIssue. [I believe we decided not to include this. Check requirements.]
- TopicYear. Topic.published_at.year.
- TopicMonth. Topic.published_at.month.
- ContentProvider. Topic.provider.name.
We are not managing authors. Disregard this file.
The association between topics and tags.
Fields:
- TopicID. The ID in SkillRX of the topic with which the tag is associated.
- TagID. The ID in SkillRX of the tag.
The Raspberry Pi device data delivery is more complex than the mini data delivery. There are multiple file types, multiple files of some types, and even some redundancy due to differences in the configuration of different generations of client software on the Raspberry Pi devices currently in the field.
For instance, there are two versions of the XML files that contain topics: Provider and Legacy. These terms can be confusing, but the two files follow the exact same structure with only one difference: the "legacy" file is a single file containing the topics for all providers. The "provider" files each contain only the topics for one provider.
We will generate and deliver one set of files for each language, delivered to the root storage path for the environment plus the paths and filenames specified here:
- Legacy XML for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/XML/[language.file_storage_prefix]Server_XML.xml
- Provider XML for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/XML/[language.file_storage_prefix][provider name].xml for every provider
- New topics for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/XML/[language.file_storage_prefix]New_Uploads.xml
- We will not generate Top topics for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/XML/[language.file_storage_prefix]Top_Topics.xml as these depend on collecting stats from remote devices.
These file names and locations follow the patterns described in the XML Generation application.
See the XML Generation application for details. You may need to ask us to request access from the stakeholders. Here are some excerpts from that documentation:
From the XML File Generation app documentation:
"Provider XML" structure from the XML Generation app documentation:
"Legacy XML" structure from the XML Generation app documentation:
In addition to the XML and the media files, the Raspberry PI core directories will receive .txt files containing tag information.
We will generate and deliver one set of files for each language, delivered to the root storage path for the environment plus the paths and filenames specified here:
- Tag file for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/Tags/[language.file_storage_prefix]tags.txt
- Tags and title file for CMES-Pi: [language.file_storage_prefix]CMES-Pi/assets/Tags/[language.file_storage_prefix]tagsAndTitle.txt
Our Azure interface uses Azure File Shares, a gem written by Ruby for Good volunteer Dmitry Trager.
We have not found a viable way to set up Azure file storage for local development environments, so we are using a shared Azure file storage created by our stakeholders. We can provide the name of this environment and access details to developers who take on work related to Azure storage. Since this is a shared environment, local development environments will all be writing to the same storage, so developers will need to coordinate.
The Azure client will be used to:
- Authenticate
- Add individual media, csv, txt, and xml files to Azure.
- Archive files when a topic is archived. There is no api for moving files, so we will delete the file and upload it to the archive location * Delete files when a topic is deleted
Synchronization between SkillRX and the Azure File Storage has two main aspects: training material media and generated files. The media can be synchronized in real time but file generation and synchronization will need to be handled asynchronously.
With the generated xml, csv, and txt files, each change within SkillRX can result in changes in multiple files. A series of changes during an editing session can result in a cascading set of updates. We will handle file generation as a scheduled task, executed at intervals (currently once per day). We will also provide a way for admins to trigger a file generation/sync through the user interface.
With training materials, we are maintaining two separate repositories with the same content but with different organizational structures: the Raspberry PI directories (one for each language) and the mini computer's directories. Files are added, replaced, archived (when a topic is archived), or--in rare cases and only by admins--they can be deleted. After any of those changes is made we will update the appropriate locations in Azure storage following the storage requirements listed above.
For example: An editor creates a new topic in Spanish and uploads three training material files. Once the update is finished and the files are in S3, we will upload all three files to the directories for the Spanish language mini computers and for Spanish language Raspberry Pis.
This can be handled via background jobs after a given Topic is updated or it can be handled alongside the sync of the of the generated files. This will be worked out during development.