Skip to content

SiddhantGahankari/RosettaVec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Translation Model

This project implements a simple natural language translation system using pre-trained word embeddings. It supports basic English-to-French and English-to-Spanish word and sentence translation by leveraging cosine similarity between word vectors.

Project Structure

ics_project/

Header files

├── include/
├── common.h
├── eval.h
├── globals.h
├── io.h
├── translate.h
│── vector.h

Source files

├── src/ ├── *_eval.c
├── *_globals.c
├── *_io.c
├── *_main.c
├── *_translate.c
│── *_vector.c

Object files

├── obj/
├── makefile # Build instructions
├── main # Compiled binary (after build)
├── .gitignore
|── LICENSE

Build Instructions

If Data is not donwloaded Downlaod it from here

Unzip & Add it to the /data folder

Build the project using:

make

If this doesnt work use :

gcc -Iinclude src/*.c -o main -lm

Running Instructions

./main

Features included in the project:

  • Load and process word embeddings from files
  • Translate individual words using top-k cosine similarity
  • Translate entire sentences word-by-word
  • Evaluate similarity and semantic closeness of translations
  • Modular code architecture with clear separation of concerns

Sources

This project uses FastText word embeddings developed by Facebook AI Research (FAIR). Unlike Word2Vec, FastText represents words as character n-grams, allowing it to handle out-of-vocabulary words using subword information. We used pre-trained aligned word vectors trained on Wikipedia and Common Crawl, which place words from different languages (e.g., English, French, Spanish) in the same vector space. This enabled accurate word-level translation using cosine similarity to find semantically similar words across languages.

About

Translation Model using Pre-Trained Embeddings from English to Spanish and French

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors