On wikipedia everyone can edit a page.
The hystory of a page contains a snapshot of it for each edit
On wikipedia everyone can delete an edit restoring the previous edit, this is a revert.
all the important code is in src/main/
the create folder contains all the files used to compute a dataset
the analyze folder contains all the files used to analyze a dataset
It's easier to understand the file names using examples:
c_admin_user_reverts_month_tsv.py
type_class_aggregation_name_month_format.py
type = create (c) or analyze(a)
class = i decided to split into different logical class the files , now the two classes are admin and chains and generic 
aggregation = if the file is by user or by page 
name = the name of the metrics it's computing 
month (optional)= if the file is by month 
format = in type c the format of the output file ( tsv or json)\
it's a tsv, each line is an event, there are different type of events:
- revision: edit
 - page: create, move, restore, etc of a page
 - user: create, delete, change group (become an admin)
 
this is the official information page.
this is my information page about fields i use.
this is the google doc which contains info about metrics
from the wikimedia dataset i computed different other datasets
sorted_by_pages.tsv : same as wikimedia but only with revision events and sorted by page name
edits    reverts/reverted
2001          6       0
2002       1472       2
2003      23907     207
2004     431129   13592
2005    1657387   88938
2006    4894301  314799
2007    8109283  636507
2008    8818419  867780
2009    8803086  884355
2010    9242952  988517
2011    9394522 1175935
2012    9764942 1173113
2013    8937768 1064283
2014    7376637  825818
2015    8543088 1088518
2016    8242733 1028889
2017    9307301 1019460
2018    9752861 1045824
2019    9186886 1067841
2020    9130259 1041494
a chain happens when the targetted edit of a revert is a revert(which could belong to a chain)
for each page is saved each chain and some statistics about it
wars_json/pages
{
    "title": "Loligo_vulgaris", 
    "chains": 
    [{
        "revisions": ["113715375", "113715381", "113715393"], 
        "users": {"62.18.117.244": "", "Leo0428": "17181"}, 
        "len": 3, 
        "start": "2020-06-15 22:16:23.0", 
        "end": "2020-06-15 22:17:38.0"
    }], 
    "n_chains": 1, 
    "n_reverts_in_chains": 3, 
    "n_reverts": 38
    "mean": 3.0, 
    "longest": 3, 
    "G": 0,
    "M": 0, 
    "lunghezze": {"3": 1}
}
similarly, it's possibile to see every chain a user got involved
wars_json/users
{
        
    "user": "80.181.45.118",
    "chains": [
        {
            "page": "Puppy_Dog_Pals",
            "revisions": [ "109421725", "109422928", "109422931","109465730"],
            "users": { "80.181.45.118": "",  "Moxmarco": "10204", "Sakretsu": "75109" },
            "len": 4,
            "start": "2019-12-14 13: 34: 12.0",
            "end": "2019-12-16 23: 08: 09.0"
        }
    ],
    "n_chains": 1,
    "n_reverts": 4,
    "mean": 4,
    "longest": 4,
    "G": [ 0, "{'87.19.234.101', 'ValeJappo', '80.181.45.118', 'Moxmarco', 'Sakretsu'}"],
    "lunghezze": { "3": 1 }
    
}
from this json i computed the metric by month adding more_than and involved
monthly pages
title    year_month    nchain   nrev    mean    longest     more_than5      more_than7      more_than9      G   involved
monthly users
user    year_month    nchain   nrev    mean    longest     more_than5      more_than7      more_than9      G    involved
i also computed data about the group of the reverter and of the reverted. an user could be
- adm : sysop, administrator
 - reg : registered but not admin
 - not : anonymous user
 
adm_adm refert to the number of reverts an admin made to another admin reg refer to the number of revert made by registered user (admin included)
NB: the last 2 fields are not_reg and reg , in this case reg are registered users including admins
the data contains info about the reverts and the mutual reverts, a mutual reverts happens when in the same page if A reverts B then B reverts A
M is the controversiality metric computed by Yasseri 
G is a metric that's similar to M which evalue the chains in a page(or user), when in a chain are involved users with a big edit count G will be bigger
reverts
page_id     page_name   year_month   adm_adm    adm_reg     reg_adm     reg_reg     not_reg     reg
mutual
page_id     page_name   year_month    adm_adm    adm_reg     reg_reg     not_reg     reg
revert
user     group    year_month    tot_received     t_reg     t_not     t_adm     tot_done     d_reg     d_not     d_adm    
mutual
user    group   page_name   year_month  mutual_with_admin   mutual_with_reg  mutual_with_not
sort_dataset.py 1h for filtering 15min for sorting 1h15m total