Skip to content

VectorInstitute/masksql

 
 

MaskSQL


code checks unit tests docs codecov GitHub License

MaskSQL is a privacy-preserving framework for LLM-based text-to-SQL that uses schema masking and progressive unmasking to protect sensitive database information while maintaining high query accuracy.

Table of Contents

Installation and Setup Instructions

Prerequisites

  • Python 3.11
  • uv package manager

Setup

Install dependencies and activate the virtual environment:

uv sync --dev
source .venv/bin/activate

Download Dataset

Download and extract the dataset:

wget -O data.zip "https://www.dropbox.com/scl/fi/vtraf79vfi1x105veaflk/data.zip?rlkey=7yq6d46aer6h45pdihrc9rht1&st=zdac3rqx&dl=0"
unzip data.zip

Expected directory structure:

data/
├── databases/
├── 1_input.json
└── ...

Configure Environment

Create a .env file from the template:

cp .env.example .env

Required:

Optional:

  • LIMIT: Number of dataset entries to process (e.g., LIMIT=10)
  • START: Starting index in the dataset (default: 0)

Running MaskSQL

Configuration

To configure the MaskSQL, uses the configs/conf.yaml file by default. You can pass in arbitrary config files using the --config option of the CLI interface.

1. Run RESDSQL (Schema Filtering)

MaskSQL requires RESDSQL for initial schema filtering. Follow the RESDSQL setup instructions to generate the required files.

2. Run the Pipeline

Execute the MaskSQL pipeline:

python3 main.py

or to clean previous outputs and rerun:

python3 main.py --clean

Documentation

Citation

If you use MaskSQL in your research, please cite our paper:

@article{abedini2025masksql,
  title={MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction},
  author={Abedini, Sepideh and Mohapatra, Shubhankar and Emerson, DB and Shafieinejad, Masoumeh and Cresswell, Jesse C and He, Xi},
  journal={arXiv preprint arXiv:2509.23459},
  year={2025}
}

Paper: https://arxiv.org/abs/2509.23459

About

MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.4%
  • Shell 4.6%