My bachelor thesis: Exploring Performance of B-Tree Configurations Using Machine Learning Techniques
This project is organized into several main directories:
code: This directory contains all the source code for the project. Data generating files are here aswell. It is further divided into subdirectories, each of which corresponds to a different part of the project. :01_preprocess: Contains notebooks for preprocessing data. Seepreprocess.ipynbandpreprocess_ps.ipynbfor more details.02_analyze: Contains code for analyzing the preprocessed data. Correlation and other exploration is done here.03_CNN: Unused. Initial experimatation for training a CNN using tensorflow. At the end not used for thesis.04_classifiers: Contains various classifiers and the XAI algorithms run on selected models.05_regressors: Contains various regressors and the XAI algorithms run on selected models.06_XAI: Unused. Initial experimentation of XAI algorithms was done here.07_leaf_size: Unused. Data exploration of the old pattern recognition dataset was done here.08_pattern_recognition: Contains the pattern recognition algorithms. The only relevant file isapriori_small.ipynb
output: This directory is used to store output files generated bycode/generate_data.py.latex: Contains the LaTeX code for the thesis. See theREADME.mdin this directory for more details and references to original author.Bachelor_Thesis_final.pdf: Is the final thesis in pdf format.
Make sure to have python and poetry installed:
poetry install
poetry run python code/generate_data.py
I use VS Code for jupyter notebooks.
The btree-binaries, the urls-data and outputs are hidden, since they are too big.
Some images are not published outside the pdf, due to copyright concerns.