Hi! Great dataset!
I am interested in experimenting on it for my own work, as well as comparing ML approaches on it. I want to get the data in the form of a table (amenable to pandas and the like), while keeping the "Raw" data (i.e raw text, labels, marking rows as being from source X, keeping dates, reviewer # as an column (ID) variable, etc'.
I know the pipeline munges these features, but the output is too processed for my purposes - where should I look at in the code, in order to get the intermediate outputs?
e.g. a csv of all reviews and texts, with the raw variables (each in it's own column), across all the datasets? (and train/test splits)?
Thanks!