(SparseTransX) FastKG - A sparse implementation of translational KG embedding models

This is the official implementation of the SparseTransX library accepted for publication in MLSys 2025, the 8th Annual Conference on Machine Learning and Systems. "SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations".

arXiv: https://arxiv.org/abs/2502.16949

Note: The repo currently has TransE only. Others are in raw code at the artifact evaluation repository here: https://github.com/OnixHoque/sptransx-mlsys2025-reproduce/. We are planning to migrate them here soon!

Installation

git clone https://github.com/HipGraph/SpTransX.git
cd SpTransX
pip install -e .

CPU/GPU Testing

To test fb15k dataset:

cd ./tests
python trans_e.py

If you need to convert from RDF, TTL, or Neo4j format, please use the utility functions available at ./fastkg/converter.py.

MultiGPU/MultiNode Testing

FastKG is compatible with PyTorch DDP and FSDP Wrapper. They can be utilized to perform MultiGPU/MultiNode training.

Streaming Dataset and Model

FastKG supports streaming both the model and the dataset from disk in case they are too large to fit in CPU memory. The streaming is also available for distributed training. Examples are available below.

CPU/GPU

See the example in ./tests/trans_e_stream_dataset.py and ./tests/trans_e_stream_model.py on how to stream the dataset and model on-demand instead of loading the whole into CPU memory.

cd ./tests/
python trans_e_stream_dataset.py
# or, 
python trans_e_stream_model.py

Streaming Dataset

Create a StreamingSparseKGDataset instead of SparseKGDataset.

convert_nt_to_db('../fastkg/datasets/fb15k/train.txt', batch_size=50000, num_lines=None, sep='\t', db_name='fb15k.db')
dataset_sparse = StreamingSparseKGDataset('fb15k.db', batch_size=b_size, shuffle=False, drop_last=False, calculate_rel_stat=True)
# For validation:
from fastkg.streaming import map_entity_and_rel
map_entity_and_rel(df_test, 'fb15k.db')

Streaming Model

Pass a filename in storage argument when creating SparseTransE model.

model_sparse = SparseTransE(dataset_sparse.n_ent, dataset_sparse.n_rel, emb_dim, storage='embedding_tensor.bin', initialize=True)
model_sparse.to(device)

Note

Please note that some systems may not support memory-mapped tensor (mmap is required for streaming the model since it uses memory mapped tensor) such as DVS in NERSC supercomputer. For NERSC, it is advised to use $PSCRATCH instead.

Contact

Please contact the following person if you have any questions: Md Saidul Hoque Anik ([email protected]).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
fastkg		fastkg
tests		tests
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
readme.md		readme.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

(SparseTransX) FastKG - A sparse implementation of translational KG embedding models

Installation

CPU/GPU Testing

MultiGPU/MultiNode Testing

Streaming Dataset and Model

CPU/GPU

Streaming Dataset

Streaming Model

Contact

About

Uh oh!

Releases

Packages

Languages

HipGraph/SpTransX

Folders and files

Latest commit

History

Repository files navigation

(SparseTransX) FastKG - A sparse implementation of translational KG embedding models

Installation

CPU/GPU Testing

MultiGPU/MultiNode Testing

Streaming Dataset and Model

CPU/GPU

Streaming Dataset

Streaming Model

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages