lima-andina-text-analytics

Text analytics code for the Lima Andina project.
This is a selection of analysis and visualizations performed on the Lima Andina datasets that contain more than 6500 El Comercio newspaper ads published by internal migrants organizations in Lima between 1906 and 1933.
We have made public two datasets, "Associations" and "Articles", through the Borealis repository.
From the description in the Borealis repository:

"The first contains information on the associations, and the second contains information on advertisements published in El Comercio by those associations. These announcements typically concerned upcoming and past meetings, elections and other association activities. In aggregate, the dataset reveals patterns in association activity that in turn illuminate the history of internal migration in early-20th century Peru."

Table of contents

Exploratory analysis and visualizations

Pre-processing

Named Entities

Topic modelling

Syntactic similarity

Embeddings

Example of a Tensorflow Embedding Projector visualization of semantic relationships in the corpus. External link
- Change the visualization settings to Uniform Manifold approximation and Projection (UMAP) for a more complete view of the embedding space.
DISCLAIMER: The code in this repository is made available only as a way of documenting the Digital Humanities side of the project. You can download the Jupyter notebooks and experiment with them at your own risk. The dataset that we used in these notebooks is different from the dataset available in Borealis but contain the same textual information in terms of the newspaper ads. Here we use a somewhat untidy dataset to illustrate data cleaning techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
LICENSE		LICENSE
README.md		README.md
articulos_w2v_sg_30_vecs.tsv		articulos_w2v_sg_30_vecs.tsv
articulos_w2v_sg_30_words.tsv		articulos_w2v_sg_30_words.tsv
lima-andia-exploracion-datos.ipynb		lima-andia-exploracion-datos.ipynb
lima-andia-preprocesamiento-texto.ipynb		lima-andia-preprocesamiento-texto.ipynb
lima-andina-entidades-nombradas.ipynb		lima-andina-entidades-nombradas.ipynb
lima-andina-modelado-de-temas.ipynb		lima-andina-modelado-de-temas.ipynb
lima-andina-ncrustaciones.ipynb		lima-andina-ncrustaciones.ipynb
lima-andina-similitud.ipynb		lima-andina-similitud.ipynb
settings.py		settings.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

lima-andina-text-analytics

About

Uh oh!

Releases

Packages

Languages

License

parejar/lima-andina-text-analytics

Folders and files

Latest commit

History

Repository files navigation

lima-andina-text-analytics

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages