- Text analytics code for the Lima Andina project.
- This is a selection of analysis and visualizations performed on the Lima Andina datasets that contain more than 6500 El Comercio newspaper ads published by internal migrants organizations in Lima between 1906 and 1933.
- We have made public two datasets, "Associations" and "Articles", through the Borealis repository.
- From the description in the Borealis repository:
"The first contains information on the associations, and the second contains information on advertisements published in El Comercio by those associations. These announcements typically concerned upcoming and past meetings, elections and other association activities. In aggregate, the dataset reveals patterns in association activity that in turn illuminate the history of internal migration in early-20th century Peru."
- Table of contents
- Example of a Tensorflow Embedding Projector visualization of semantic relationships in the corpus. External link
- Change the visualization settings to Uniform Manifold approximation and Projection (UMAP) for a more complete view of the embedding space.
- DISCLAIMER: The code in this repository is made available only as a way of documenting the Digital Humanities side of the project. You can download the Jupyter notebooks and experiment with them at your own risk. The dataset that we used in these notebooks is different from the dataset available in Borealis but contain the same textual information in terms of the newspaper ads. Here we use a somewhat untidy dataset to illustrate data cleaning techniques.