This project implements a simple natural language translation system using pre-trained word embeddings. It supports basic English-to-French and English-to-Spanish word and sentence translation by leveraging cosine similarity between word vectors.
ics_project/
├── include/
├── common.h
├── eval.h
├── globals.h
├── io.h
├── translate.h
│── vector.h
├── src/
├── *_eval.c
├── *_globals.c
├── *_io.c
├── *_main.c
├── *_translate.c
│── *_vector.c
├── obj/
├── makefile # Build instructions
├── main # Compiled binary (after build)
├── .gitignore
|── LICENSE
If Data is not donwloaded Downlaod it from here
Unzip & Add it to the /data folder
Build the project using:
makeIf this doesnt work use :
gcc -Iinclude src/*.c -o main -lm./main- Load and process word embeddings from files
- Translate individual words using top-k cosine similarity
- Translate entire sentences word-by-word
- Evaluate similarity and semantic closeness of translations
- Modular code architecture with clear separation of concerns
This project uses FastText word embeddings developed by Facebook AI Research (FAIR). Unlike Word2Vec, FastText represents words as character n-grams, allowing it to handle out-of-vocabulary words using subword information. We used pre-trained aligned word vectors trained on Wikipedia and Common Crawl, which place words from different languages (e.g., English, French, Spanish) in the same vector space. This enabled accurate word-level translation using cosine similarity to find semantically similar words across languages.