Code, models, and data for Compositionality of Complex Graphemes in the Undeciphered Proto-Elamite Script using Image and Text Embedding Models, published in Findings of ACL 2021.
To build all models from scratch and generate the results from the paper, run
make allfrom the root directory.
Alternatively, each model has its own directory with a run.sh file which will train all versions of that model used in the paper. You must generate the input files with make .data before training any models.
Pretrained models are included in pretrained/models.
Embeddings from all models used in the paper are included in pretrained/embeddings.
All statistics and analysis scripts are located in ocs\_pcs.
python metrics.py && python stats.pywill compute PCS for every sign in every model and summarize the resulting scores.
python analogy.pycomputes the number of compositional signs and analogies in each model and outputs the results cited in the paper. For each value of k, a csv file will be saved to ocs\_pcs/csvs listing information about which signs are compositional, to what degree they are compositional, and in which models.
These scripts use the pretrained embeddings included with the repository, so they can be run without retraining the models from scratch.