This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation
[Supplemental Material]

OUTLINE

This supplemental material is organized as follows.

S1. More background for evaluating configuration tuning systems.

S2. Details about intra-algorithms.

S3. More details about workloads.

S4. More details and results about the experiment.

S5. Construction for database configuration tuning benchmark.

S6. Experimental environment and reproduction instructions.

Refer to caption
Figure S1. Detailed Intra-algorithm Designs of Configuration Tuning Systems (The Full Picture): Black boxes denote the algorithms adopted by existing database tuning systems (indicated by colored paths), and grey boxes denote the algorithms in the HPO field. SA denotes sensitivity analysis based on Gini score and GA denotes Genetic algorithm.

Appendix S1 More background for evaluating configuration tuning systems

The previous evaluation for database configuration tuning is limited to a subset of existing systems where the analysis and evaluation of intra-algorithm components are ignored. Instead, we identify three key modules of configuration tuning systems and conduct a thorough analysis and experimental evaluation from a micro perspective (i.e., evaluating every fine-grained algorithm). Figure S1 presents the fine-grained algorithms adopted by existing database tuning systems or from the HPO field. When conducting database configuration tuning in practice, we have to chose a solution “path” across the three modules: (1) knob selection, (2) configuration optimization, and (3) knowledge transfer, as shown in the figure. Each knob selection algorithm could determine an unique configuration space and can be “linked” to any of the configuration optimization algorithms (i.e., optimizers). And among the optimizers, all the BO-based optimizers assuming a Gaussian model (SMAC, vanilla BO, mixed-kernel BO, TurBO) can be “linked” to workload mapping or RGPE transfer frameworks. And the DDPG algorithm is “linked” to fine-tune framework. We have noted that existing systems only cover a part of the possible solutions and it remains unclear to identify the best “path” for database configuration tuning. We evaluate all the fine-grained algorithms listed in Figure S1 and carefully decompose the search (evaluation) space to identify the best “path” in various scenarios.

Appendix S2 Details about intra-algorithms

In this section, we present details about the intra-algorithms which we describe on a high level in the paper due to space constraints.

S2.1. Ablation Analysis

Ablation analysis (DBLP:conf/aaai/BiedenkappLEHFH17) selects the features whose changes contribute the most to improve the performance of configurations. We now describe how ablation analysis quantifies the performance change due to a certain feature’s change. Given a default configuration 𝜽default\bm{\theta}_{default} and a target configuration 𝜽target\bm{\theta}_{target} (usually a better one), ablation analysis first computes the feature differences Δ(𝜽default,𝜽target)\Delta(\bm{\theta}_{default},\bm{\theta}_{target}) between the default and target configurations. Next, an ablation path 𝜽default,𝜽1,𝜽2,,𝜽target\bm{\theta}_{default},\bm{\theta}_{1},\bm{\theta}_{2},\dots,\bm{\theta}_{target} is iteratively constructed. In each iteration ii with previous ablation path configuration 𝜽i1\bm{\theta}_{i-1}, we consider all remaining feature changes δΔ(𝜽default,𝜽target)\delta\in\Delta(\bm{\theta}_{default},\bm{\theta}_{target}) and apply the change to the previous ablation path configuration and obtain the candidate 𝜽i1[δ]\bm{\theta}_{i-1}[\delta]. Each parameter change δ\delta is a modification of one feature from its value in 𝜽i1\bm{\theta}_{i-1} to its value in 𝜽target\bm{\theta}_{target}, along with any other feature modifications that may be necessary due to conditionally constraints in 𝚯\bm{\Theta}. The next configuration on the ablation path 𝜽i\bm{\theta}_{i} is the candidate 𝜽i1[δ]\bm{\theta}_{i-1}[\delta] with the best objective performance ff. The performance evaluation is approximated via surrogate for efficiency reasons. The order that the feature is changed is the importance rank from the ablation analysis. Given a set of observations, we conduct the ablation analysis between the default configuration and the configurations with better performance than the default in the observation set (i.e., target configurations). For each feature, we use the average rank from each ablation path as the final ranking of importance.

S2.2. SHAP

SHAP (DBLP:conf/nips/LundbergL17) (SHapley Additive exPlanation) uses Shapley values of a conditional expectation function of the original model. SHAP values attribute to each feature the change in the expected model prediction when conditioning on that feature. They explain how to get from the base value that would be predicted if we did not know any features to the current output. When the model is non-linear or the input features are not independent, the order in which features are added to the expectation matters, and the SHAP values arise from averaging the contributing values across all possible orderings (DBLP:conf/nips/LundbergL17). The exact computation of SHAP values is challenging, which can be estimated by Shapley sampling values method (DBLP:journals/kais/StrumbeljK14) or Kernel SHAP method (DBLP:conf/nips/LundbergL17).

S2.3. SMAC

SMAC (DBLP:conf/lion/HutterHL11) constructs a random forest as a set of regression trees, each of which is built on n data points randomly sampled with repetitions from the entire training data set. It computes the random forest’s predictive mean μ^(𝜽)\hat{\mu}(\bm{\theta}) and variance σ^2(𝜽)\hat{\sigma}^{2}(\bm{\theta}) for a new configuration 𝜽\bm{\theta} as the empirical mean and variance of the Gaussian distribution . SMAC uses the random forest model to select a list of promising parameter configurations. To quantify how promising a configuration 𝜽\bm{\theta} is, it uses the model’s predictive distribution for 𝜽\bm{\theta} to compute its expected positive improvement EI(𝜽)EI(\bm{\theta})  (DBLP:journals/jgo/JonesSW98) over the best configuration seen so far. EI(𝜽)EI(\bm{\theta}) is large for configurations 𝜽\bm{\theta} with high predicted performance and for those with high predicted uncertainty; thereby, it offers an automatic trade-off between exploitation (focusing on known good parts of the space) and exploration (gathering more information in unknown parts of the space). To gather a set of promising configurations with low computational overhead, SMAC performs a simple multi-start local search and considers all resulting configurations with locally maximal EIEI.

S2.4. RGPE

RGPE (feurer2018scalable) is a scalable meta-learning framework to accelerate BO-based optimizer. First, for each previous tuning task TiT_{i}, it trains a base Gaussian process (GP) model MiM_{i} on the corresponding observations from HiH_{i}. Then it builds a surrogate model MmetaM_{meta} combine the base GP models, instead of the original surrogate MTM_{T} fitted on the observations HTH_{T} of the target task only. The prediction of MmetaM_{meta} at point 𝜽\bm{\theta} is given by:

(1) yN(iwiμi(𝜽),iwiσi2(𝜽)),y\sim N(\sum_{i}{w_{i}\mu_{i}(\bm{\theta})},\sum_{i}{w_{i}\sigma^{2}_{i}(\bm{\theta})}),

where wiw_{i} is the weight of base surrogate MiM_{i}, and μi\mu_{i} and σi2\sigma^{2}_{i} are the predictive mean and variance of the base surrogate MiM_{i}. The weight wiw_{i} reflects the similarity between the previous task and the current task. Therefore, MmetaM_{meta} utilizes the knowledge on previous tuning tasks, which can greatly accelerate the convergence of the tuning in the target task. We then use the following ranking loss function LL, i.e., the number of misranked pairs, to measure the similarity between previous tasks and the target task:

(2) L(Mj,HT)=j=1ntk=1nt𝟙((Mi(𝜽j)Mi(𝜽k))(yjyk)),L(M_{j},H_{T})\!=\!\sum_{j=1}^{n_{t}}\sum_{k=1}^{n_{t}}\mathbbm{1}\!\Big{(}(M_{i}(\bm{\theta}_{j})\!\leq\!M_{i}(\bm{\theta}_{k}))\!\oplus(y_{j}\leq y_{k})\Big{)},

where \oplus is the exclusive-or operator, ntn_{t} denotes the number of tuning tasks and Mi(𝜽j)M_{i}(\bm{\theta}_{j}) means the prediction of MiM_{i} on configuration 𝜽\bm{\theta}. Based on the ranking loss function, the weight wiw_{i} is set to the probability that MiM_{i} has the smallest ranking loss on HTH_{T}, that is, wi=(i=argminjL(Mj,HT))w_{i}=\mathbbm{P}(i=\mathop{\arg\min}_{j}L(M_{j},H_{T})). This probability can be estimated using MCMC sampling (DBLP:conf/aistats/MartensTY19).

S2.5. More Details about Workloads

While we have presented general profile information for workloads in Table  LABEL:wkl in the paper, we detail the implementation and the reason we select those workloads in this section.

The Reasons for Workload Selection. When answering Q1 and Q2, we analyze the tuning performances over OLTP and OLAP scenarios. We use an OLAP workload – JOB and an OLTP workload – SYSBENCH. The reason is that the two workloads are often adopted in evaluating database configuration tuning methods and involve the scenarios of an online transaction/analytical processing. For example, JOB is adopted by QTune (DBLP:journals/pvldb/LiZLG19) and SYSBENCH is adopted by CDBTune (DBLP:conf/sigmod/ZhangLZLXCXWCLR19), QTune (DBLP:journals/pvldb/LiZLG19) and ResTune (DBLP:conf/sigmod/ZhangWCJT0Z021). When conducting knowledge transfer experiments (Q3), we focus on the OLTP scenarios since there are fewer OLAP workloads suitable for constructing source workloads of tuning tasks, except JOB and TPC-H. We choose three OLTP workloads – SYSBENCH, TPC-C, Twitter as the target tuning workloads, which have been adopted in previous studies. For example, OtterTune (DBLP:conf/sigmod/AkenPGZ17), CDBTune (DBLP:conf/sigmod/ZhangLZLXCXWCLR19), and ResTune (DBLP:conf/sigmod/ZhangWCJT0Z021)) has adopted TPC-C for evaluation and ResTune has also adopted Twitter. We use additional four OLTP workloads (i.e., SEATS, Smallbank, TATP, Voter, SIBENCH) as source workloads and configure them with various sizes, read-write ratios as shown in Table  LABEL:wkl. SIBench is a microbenchmark designed to explore snapshot isolation in DBMSs (DBLP:journals/pvldb/JungHFR11). Based on our observations, the tuning opportunity for SIBench is limited. We add SIBench to the source workloads with the purpose of increasing the diversity.

Implementation of The Workloads. For JOB, we use the same setup illustrated in (DBLP:journals/pvldb/LeisGMBK015). For SYSBENCH, we load 150 tables each of which contains 800000 rows, and adopt the read-write mode. For workloads from OLTP-Bench, we use the scale factor to determine the data size (e.g., the number of warehouses in TPCC) as shown in Table S1. In addition, the parameter terminal is set to 64 for each workload. We keep other parameters as the default value as OLTP-Bench provided, including isolation and weights of transactions.

Table S1. Scale factors of workload from OLTP-Bench
Workload TPCC Twitter Smallbank SIBench Voter Seats TATP
Scale Factor 200 1500 10 1000 10000 50 100

Appendix S3 More details and results about experiment

In this section, we present additional experimental details.

S3.1. Knob Selection

Table S2. The Top-20 important knobs selected by SHAP for OLTP workloads
Knob Type Dynamic Module Description
innodb_thread_concurrency Integer Yes Concurrency The maximum number of threads permitted inside of InnoDB.
innodb_log_file_size Integer No Logging The size in bytes of each log file in a log group.
max_allowed_packet Integer Yes Replication The upper limit on the size of any single message between the MySQL server and clients.
innodb_io_capacity_max Integer Yes IO The maximum number of IOPS performed by InnoDB background tasks.
tmp_table_size Integer Yes Memory The maximum size of internal in-memory temporary tables.
query_prealloc_size Integer Yes Memory The size in bytes of the persistent buffer used for statement parsing and execution.
max_heap_table_size Integer Yes Memory The maximum size to which user-created memory tables are permitted to grow.
innodb_doublewrite Categorical No Memory Whether the doublwrite buffer is enabled.
transaction_alloc_block_size Interger Yes Memory The amount in bytes by which to increase a per-transaction memory pool which needs memory.
join_buffer_size Interger Yes Memory The minimum size of the buffer that is used for joins.
innodb_flush_log_at_trx_commit Categorical Yes Logging Controlling the balance between ACID compliance for commit operations and performance.
innodb_max_dirty_pages_pct_lwm Integer Yes Logging The percentage of dirty pages at which preflushing is enabled to control the dirty page ratio.
innodb_log_files_in_group Integer No Logging The number of log files in the log group.
innodb_buffer_pool_size Integer Yes Memory The size in bytes of the buffer pool.
innodb_online_alter_log_max_size Integer Yes Logging An upper limit on the size of the log files used during online DDL operations for InnoDB tables.
key_cache_age_threshold Integer Yes Memory The demotion of buffers from the hot sublist of a key cache to the warm sublist.
binlog_cache_size Integer Yes Memory The size of the cache to hold changes to the binary log during a transaction.
innodb_purge_rseg_truncate_frequency Integer Yes Logging The frequency with which the purge system frees rollback segments.
query_cache_limit Integer Yes Memory The minimum size of cached results.
innodb_sort_buffer_size Integer No Memory The sort buffer size for online DDL operations that create or rebuild secondary indexes.
Table S3. Hardware configurations for more instances.
Instance A B C D
CPU 4 cores 8 cores 16 cores 32 cores
RAM 8GB 16GB 32GB 64GB

Top impacting knobs with high tunability for OLTP workloads. We further conduct an experiment using SHAP to generate a ranking of the most impacting knobs across OLTP workloads and hardware instances. And we use this ranking to conduct an evaluation for knowledge transfer component across OLTP workloads in LABEL:sec:exp-transfer in the paper. We use the seven OLTP workloads listed in Table  LABEL:wkl in the paper and perform LHS to collect 1250 samples for each workload on the four hardware instances listed in Table LABEL:tab:hardward. Then we adopt SHAP to generate an importance ranking respectively and count the number of times that each knob appears in the top 20 of all the rankings to measure their overall importance. Table S2 shows the top-20 important knobs for OLTP workloads and their brief description. We believe this ranking could provide database practitioners a guidance for choosing the knobs to tune. Dynamic variables can be changed at runtime using the SET statement, while others can only be set at server startup using options on the command line or in an option file. We have that configuring the maximum number of threads and the size of the log file can contribute to the performance gain the most, which is aligned with the previous analysis  (DBLP:conf/sigmod/AkenPGZ17; DBLP:conf/sigmod/ZhangWCJT0Z021; DBLP:journals/pvldb/AkenYBFZBP21). And 11 of the top-20 knobs are related to memory allocation, which indicates that the default setting of memory allocation in MySQL may not be appropriate across the workloads and hardware instances. We leave the important knobs ranking for OLAP workloads as future work, as there are fewer OLAP workloads suitable for database tuning tasks, except JOB and TPC-H.

S3.2. Configuration Optimization

Table S4. Average ranking of optimizers in terms of the best configuration they found.Bold values are the best.
Optimizer Vanilla BO Oon-Hot BO Mixed-Kernel BO SMAC TPE TURBO DDPG GA
Small Configuration Space JOB 5.00 2.67 2.33 3.33 6.33 3.67 5.00 7.67
SYSBENCH 5.67 5.33 2.00 3.33 5.33 4.00 5.00 5.33
Average 5.33 4.00 2.17 3.33 5.83 3.83 5.00 6.50
Medium Configuration Space JOB 5.33 3.33 3.00 1.00 7.67 3.67 5.00 7.00
SYSBENCH 5.00 4.33 1.67 1.67 6.67 4.33 6.00 6.33
Average 5.17 3.83 2.33 1.33 7.17 4.00 5.50 6.67
Large Configuration Space JOB 7.00 7.00 7.00 1.00 7.00 7.00 2.00 7.00
SYSBENCH 7.67 6.00 3.33 1.00 6.00 3.00 4.33 4.67
Average 7.33 6.50 5.17 1.00 6.50 5.00 3.17 5.83
Overall 5.94 4.78 3.22 1.89 6.50 4.28 4.56 6.33

Average ranking of optimizers in terms of the best configuration they found. While we have presented the average rankings of optimizers in Table 7 in the paper, we detail the rankings on each workload and configuration space as shown in Table S4. In addition, we present how we calculate the average ranking of optimizers. For each workload and configuration space, we run three tuning sessions for an optimizer and sort the three sessions in terms of the best performance they found within 200 iterations. Then, we rank the optimizers based on the best performance in their best session, and then rank them based on their second session, and lastly the worst session. Finally, we average the three ranks of an optimizer, which corresponds to a row in Table S4.

S3.3. Knowledge Transfer

Refer to caption
Figure S2. The absolute performance over iteration of each combination of transfer framework and base learner.

We have demonstrated the average performance enhancement (i.e., PE), speedup, and absolute performance ranking (i.e., APR) in Table 8 and omit the performance plot in the paper due to space constraints. Figure S2 plots the absolute performance over iteration of each baseline (i.e., the combination of transfer framework and base learner). On TPCC, both RGPE (Mixed-kernel BO) and RGPE (SMAC) find the approximately best performance in 200 iterations, while RGPE (SMAC) has a better speedup. On SYSBENCH, RGPE (SMAC) finds the best performance, though it takes a few more steps. On Twitter, RGPE (Mixed-kernel BO) finds the best performance, and at a fast speed. In general, we find that the combinations of RGPE and base learners have the best absolute performance as well as speedup.

Appendix S4 Experimental environment and reproduction instructions.

We conduct all experiments on Aliyun ECS. Each experiment consists of two instances. The first instance is used for the tuning sever, deployed on ecs.s6-c1m2.xlarge. The second instance is used for the target DBMS deployment, with four kinds of hardware configurations : ecs.s6-c1m2.xlarge, ecs.s6-c1m2.2xlarge, ecs.s6-c1m2.4xlarge, and ecs.n4.8xlarge. The detailed physical memory and CPU information are demonstrated in Table S5. The operation system of each ECS is Linux 4.9. The Python version used is 3.7, and the detailed package requirements for our experiments are listed on our GitHub repository. Please check the requirements.txt in the root directory.

To reproduce the results of out experiments, please download and install the workloads we use following the instructions in README.md, and run train.py under directory script/ with specified arguments like below:

python train.py --method=VBO --knobs_num=5 --y_variable=lat
--workload=job  --dbname=imdbload --lhs_log=JOB5_VBO.res
--knobs_config=../experiment/gen_knobs/job_shap.json

More reproduction details are provided on our online repository11footnotemark: 1.

Table S5. Hardware configurations for database instances.
Type CPU RAM
ecs.s6-c1m2.xlarge 4 cores 8GB
ecs.s6-c1m2.2xlarge 8 cores 16GB
ecs.s6-c1m2.4xlarge 16 cores 32GB
ecs.n4.8xlarge 32 cores 64GB