To implement #179, a mechanism should be created that looks for (new) triples of (for each transcript):
- corrected treebank XML file
- initial annotations SAF file
- corrected annotations SAF file
In theory, 2 & 3 will only exist if a user manually corrected the transcript. This is not guaranteed however, because the same file could be reuploaded. A quick check by using SafReader results should determine the difference.
Periodical harvest could be performed using Celery periodic task, with a simple bash script to add the files to sasta-datasets in a structured format.