Paper: Developing a Machine Learning Algorithm for Wikipedia Vandalism Detection with Logistic Regression
Using the PAN-WVC-10 corpus: Potthast et. al.
Many thanks to Professor Franceska Xhakaj for the guidance and support throughout.
# Just for my own reference:
python main.py >& results.txt &
tail -f results.txt
TODO:
- data from getdata.py contains multiple line comments, throwing index out of bounds
- tune feature params