MaskSQL is a privacy-preserving framework for LLM-based text-to-SQL that uses schema masking and progressive unmasking to protect sensitive database information while maintaining high query accuracy.
- Python 3.11
- uv package manager
Install dependencies and activate the virtual environment:
uv sync --dev
source .venv/bin/activateDownload and extract the dataset:
wget -O data.zip "https://www.dropbox.com/scl/fi/vtraf79vfi1x105veaflk/data.zip?rlkey=7yq6d46aer6h45pdihrc9rht1&st=zdac3rqx&dl=0"
unzip data.zipExpected directory structure:
data/
├── databases/
├── 1_input.json
└── ...
Create a .env file from the template:
cp .env.example .envRequired:
OPENAI_API_KEY: Your OpenRouter API key
Optional:
LIMIT: Number of dataset entries to process (e.g.,LIMIT=10)START: Starting index in the dataset (default: 0)
To configure the MaskSQL, uses the configs/conf.yaml file by default.
You can pass in arbitrary config files using the --config option of
the CLI interface.
MaskSQL requires RESDSQL for initial schema filtering. Follow the RESDSQL setup instructions to generate the required files.
Execute the MaskSQL pipeline:
python3 main.pyor to clean previous outputs and rerun:
python3 main.py --clean- MaskSQL Framework - Overview of the framework architecture
- Pipeline Stages - Detailed explanation of each pipeline stage
If you use MaskSQL in your research, please cite our paper:
@article{abedini2025masksql,
title={MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction},
author={Abedini, Sepideh and Mohapatra, Shubhankar and Emerson, DB and Shafieinejad, Masoumeh and Cresswell, Jesse C and He, Xi},
journal={arXiv preprint arXiv:2509.23459},
year={2025}
}