Skip to content

3 Hyper parameter optimization

Ruslan Shaiakhmetov edited this page Jun 20, 2023 · 1 revision

Before comparing the performance of the algorithms, it is crucial to determine the optimal hyperparameters for each algorithm. These hyperparameters include learning rates, forgetting factors, and other parameters that influence the algorithm's performance. A comprehensive review of hyperparameter optimization methods is available in, which covers techniques such as grid search, random search, Bayesian optimization, tree-structured Parzen Estimator (TPE), genetic algorithms, and more.

For our experiment, we selected the tree-structured Parzen Estimator (TPE) as the optimization approach for hyperparameter tuning. TPE is a sequential model-based optimization (SMBO) method that constructs models to approximate the performance of hyperparameter configurations. It then selects the next configuration to evaluate based on the previous models. Although TPE provides a good initial hyperparameter approximation, its random nature can sometimes lead to deviations and a random walk in the hyperparameter space. To refine the hyperparameter approximation, we employed the Nelder-Mead method, as suggested in. By using a small initial simplex, this method encourages local refinement of previously obtained hyperparameters instead of conducting an extensive search.

To evaluate the performance of the algorithms, we planned to compare the value of the loss function after a fixed number of iterations (e.g., 100). However, some algorithms are sensitive to initial conditions. Therefore, it was necessary to randomize the initial conditions within certain limits and average the value of the loss function. It is important to strike a balance between the number of samples used for averaging, as a large number significantly slows down the hyperparameter optimization, while a small number expands the resulting variation in hyperparameters. This trade-off aims to achieve an accurate hyperparameter approximation without sacrificing the speed of the optimization process.

Upon careful examination of the results, a captivating pattern emerges, revealing a sequence of intricately interwoven cycles that contribute to the intricate nature of the parameter optimization process. Specifically, the ten distinct values assigned to the variable b underwent a comprehensive hyperparameter optimization endeavor, encompassing an impressive tally of 100 individual attempts for each value. These ambitious optimization pursuits were further complemented by the inclusion of random initial condition samples—twenty in total—each of which underwent an additional 100 steps dedicated to the refined optimization of the Rosenbrock function. It is through the convergence of these nested cycles that a seemingly straightforward iteration of the parameter $b$ transforms into a formidable computational challenge. To effectively grapple with the inherent complexity of this optimization process, a pragmatic strategy was employed, characterized by the deliberate imposition of constraints on the number of iterations performed within each nested loop. While this deliberate simplification may have resulted in minor, albeit random, deviations surfacing in the performance graphs of the algorithms, it proved instrumental in ensuring that the calculations could be executed within a reasonable timeframe. Thus, a judicious equilibrium was struck, delicately balancing the imperatives of precision and efficiency, ultimately yielding practical and actionable results. Since some processes in the experiment are stochastic in nature, seed control is required for the repeatability of the experiment. In the figure, the seed icon indicates such control.

Clone this wiki locally