SplineSketch: a new quantile sketch with uniform error guarantees and high accuracy in practice

Here, we provide its prototype implementation in Python and in Java, and an experimental pipeline for evaluating its accuracy on synthetic and real-world datasets and also its update and query times, in comparison with t-digest, KLL, and MomentSketch.

Running experiments

Setup: Clone the repository and then run make to compile the Java wrappers that run the individual skech.

There are four experimental pipelines, with parameters adjusted in the individual Python source codes:

Accuracy and running time experiments on synthetic datasets: run with python run_experiments_IID.py
Accuracy and running time experiments on real-world datasets: download datasets as described below and then run with python run_experiments_datasets.py (optionally adjust the datasets in load_<dataset>_data functions)
Update time experiment: run with python run_experiments_update_time.py
Query time experiment: run with python run_experiments_query_time.py

All of these Python programs produce a set of plots with results into plots/ directory.

Downloading real-world datasets

HEPMASS dataset from UC Irvine ML Repository: download all_train.csv.gz and all_test.csv.gz and decompress both files into datasets/hepmass/
Power dataset from UC Irvine ML Repository: download into datasets/household_power_consumption/household_power_consumption.txt
Books dataset from SOSD (a benchmark for learned indexes): download using download_books_dataset.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
IID_generators.py		IID_generators.py
KLLProgram.java		KLLProgram.java
LICENSE		LICENSE
Makefile		Makefile
MomentSketchProgram.java		MomentSketchProgram.java
README.md		README.md
SplineSketch.java		SplineSketch.java
SplineSketchProgram.java		SplineSketchProgram.java
TDigestProgram.java		TDigestProgram.java
commons-math3-3.6.1.jar		commons-math3-3.6.1.jar
datasketches-java-6.0.0.jar		datasketches-java-6.0.0.jar
datasketches-memory-2.2.1.jar		datasketches-memory-2.2.1.jar
download_books_dataset.sh		download_books_dataset.sh
helper_funcs.py		helper_funcs.py
msolver-1.0-SNAPSHOT.jar		msolver-1.0-SNAPSHOT.jar
quantile-bench-1.0-SNAPSHOT.jar		quantile-bench-1.0-SNAPSHOT.jar
run_experiments_IID.py		run_experiments_IID.py
run_experiments_datasets.py		run_experiments_datasets.py
run_experiments_query_time.py		run_experiments_query_time.py
run_experiments_update_time.py		run_experiments_update_time.py
run_sketches_fncs.py		run_sketches_fncs.py
spline_sketch_uniform.py		spline_sketch_uniform.py
t-digest-3.3.jar		t-digest-3.3.jar
test_merge.py		test_merge.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SplineSketch: a new quantile sketch with uniform error guarantees and high accuracy in practice

Running experiments

Downloading real-world datasets

About

Uh oh!

Releases

Packages

Languages

License

smat-dev/SplineSketch-experiments

Folders and files

Latest commit

History

Repository files navigation

SplineSketch: a new quantile sketch with uniform error guarantees and high accuracy in practice

Running experiments

Downloading real-world datasets

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages