This page includes all the materials for the course KKLT0030 Automatic text processing 5 credits.
The course Moodle page has private materials, such as possible recordings and announcements.
- Getting started
- Notebook 1
- Commands
- Getting data and printing stuff: wget, echo
- Printing files: cat, head, tail
- Copying, renaming, removing: cp, mv, rm
- Others: wc -w, ls
- Notebook2
- Commands: egrep, sort, uniq
- Options
- egrep -v, -i, -w, -c, -B, -A
- head -n, tail -n
- wc -l, -w
- uniq -c, sort -r, -n
- Pipes, especially frequency counts
- sort | uniq -c | sort -rn
- Notebook3 exercises
- Notebook4
- Git clone for cloning Github reports
- Gzipped files using gzip and zcat
- Changing characters using tr
- Combining tr to a frequency list pipeline
- Using tr to normalize
- Regular expressions
- Notebook 5 exercies
- Notebook 6
- Dependency syntax analysis pipeline
- Sentence + token segmentation, lemmatisation, POS, dependencies
- conllu format
- Universal dependencies treebanks
- Trankit parser
- Notebook 7
- recap
- Notebook 8
- Working on the server
- Directory structure, files and folders
- Notebook 8 cont'd
- Scripts
- Stdin/stdout, arguments
- Notebook 9
- Recap of Notebook 8 subjects
- Notebook 9
- Perl substitution
- Notebook 10
- For loops
- More for loops
- Recap
- FULL RECAP
- Q&A
- EXTRA: Python with Bash
- EXTRA: Python with Bash