-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Milestone
Description
Goals:
- make folder structure more-like ParlaMint one
- add samples and intermediate statuses
- add README files to folders in oprder to document the content
- use symlinks to link external tools like perlbrew, Saxon, ...
The structure:
Scripts/StaticData/Taxonomies/- ParCzech and ParlaMint taxonomies (currently in src/metadater/taxonomies/)Metadata/- manually added metadata and annotationsorg-coalition-opposition.tsvcurrently here src/metadater/parczech_coal-opp.tsv- (orientation has been bootstraped from ParlaMint)
org-ana.tsvannotation of organization src/psp-db/org-ana.unl
Patches/- string replacements, like this:
translations.tsvmaybe only one file is needed and add some column for context (if context specific translation)
Build/- dont add to git
use the same structure as SamplesSamples/.gitignoreSources/html/{date}/- use in Build folder - the date when the source was downloadedsample/- static sample data
html.tsv- some type of metainformations, like: date, meeting, house, status ,...
- there are also some extraordinary meetings: https://www.psp.cz/eknih/2021ps/stenprot/220615/s901001.htm
- url structure:
https://www.psp.cz/eknih/2013ps/stenprot/001schuz/s001001.htm
https://www.psp.cz/eknih/2013ps/stenprot/{MEEGING}schuz/s{MEETING}{PAGE}.htm
- columns:
- downDate
- hash
- filePath
- url
- authorised
- audioUrl
- some type of metainformations, like: date, meeting, house, status ,...
Audio/audio.tsv- columns:
- downDate
- hash
- filepath
- url
- status
- urlSeen - leave it empty, when we guess the url
- status - manage when the url does not lead to file
- columns:
metadata/- will contain multiple various sources
- government web pages
- sql tables from psp- metadata.tsv
Working/(intermediate folders and files)text-tei-like/(preserve download date)text-notes/text-udpipe/text-audio-in/text-nametag/meta-......- ??? TODO specifyaudio-......- ??? TODO specifyspeaker-person.tsv- linking file, speaker, person, role
Logs/Results/- (orDist/??? but it is a bit strange in Sample folder) folder contains compiled corporaParCzech.TEI/ParCzech.TEI.ana/ParCzech.TEITOK/- publish it on TEITOK-devAudioPSP/ParlaMint.Source-TEI/(this format goes to ParlaMint pipeline)ParlaMint-CZ.TEI/ParlaMint-CZ.TEI.ana/
ParlaMint.Sample/(this folder content can be pushed to ParlaMint repository through pull request)ParlaSpeech/- Source - ???
Docs/Schema/MakefilelicencecontributingcitingREADME.md.gitignore
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels