Chapter 1 Introduction

This document is based on a single dataset available at ./data/statscomp.csv. With this dataset we ask different research questions that motivates the statistical models developed on the paper. Explanations about the models and how the data was obtained are available in the paper.

1.0.1 Exploring the dataset

This dataset follows the principle of tidy data as described in https://r4ds.had.co.nz/tidy-data.html. The key idea is that every variables has its own column, and every observation has its own unique row. Throughout this document, to facilitate our modeling approach, we will modify this dataset in different ways, often resulting in non-tidy data. However every model will start from the same base tidy dataset.

This approach will hopefully make it easier for the reader to understand from where we are starting and adopt similar strategies in their own models. Additionally, we recommend, if the reader has the opportunity to influence the data collection process, the choice of tidy data. It is often ideal for exploratory analysis, plotting, is the basis for most models, and easy to transform to be used in different models.

d <- readr::read_csv('./data/statscomp.csv')

Here we are excluding a few columns to simplify our view

kable(head(dplyr::select(d,
                         -BestArm, -Continuous, -Differentiability, -Separability, -Scalability, -Modality,-BBOB,-BaseClass, -MaxFeval, -FevalPerDimensions),
           n=10)) %>% 
  kableExtra::scroll_box(width = "100%")

Algorithm	CostFunction	NumberFunctionEval	EuclideanDistance	TrueRewardDifference	CumulativeRegret	TimeToComplete	Ndimensions	OptimizationSuccessful	MaxFevalPerDimensions	SolveAt1	SolveAt1e-1	SolveAt1e-3	SolveAt1e-6	SolveEarlierAt1	SolveEarlierAt1e-1	SolveEarlierAt1e-3	SolveEarlierAt1e-6	simNumber
NelderMead	BentCigarN6	600	8.9123708	7.364249e+07	4.926652e+10	0.0300207	6	TRUE	100	FALSE	FALSE	FALSE	FALSE	NA	NA	NA	NA	0
PSO	BentCigarN6	600	0.5605997	1.559497e+05	7.938082e+10	0.0394440	6	TRUE	100	FALSE	FALSE	FALSE	FALSE	NA	NA	NA	NA	0
SimulatedAnnealing	BentCigarN6	600	9.7499527	1.086834e+08	1.208830e+12	0.0424774	6	TRUE	100	FALSE	FALSE	FALSE	FALSE	NA	NA	NA	NA	0
CuckooSearch	BentCigarN6	600	8.0025211	1.200314e+07	1.017438e+13	0.0317579	6	TRUE	100	FALSE	FALSE	FALSE	FALSE	NA	NA	NA	NA	0
DifferentialEvolution	BentCigarN6	600	5.3888603	4.634518e+06	2.399718e+12	0.1168543	6	TRUE	100	FALSE	FALSE	FALSE	FALSE	NA	NA	NA	NA	0
RandomSearch1	BentCigarN6	600	1.5702536	1.919896e+06	1.983989e+13	0.0399160	6	TRUE	100	FALSE	FALSE	FALSE	FALSE	NA	NA	NA	NA	0
RandomSearch2	BentCigarN6	599	1.5702536	1.655484e+06	1.929796e+13	0.0356977	6	TRUE	100	FALSE	FALSE	FALSE	FALSE	NA	NA	NA	NA	0
CMAES	BentCigarN6	604	0.5744144	3.357865e-01	2.589247e+08	0.1810286	6	TRUE	100	TRUE	FALSE	FALSE	FALSE	543	NA	NA	NA	0
NelderMead	BentCigarN6	600	7.9123493	2.152945e+07	2.041479e+11	0.0423668	6	TRUE	100	FALSE	FALSE	FALSE	FALSE	NA	NA	NA	NA	1
PSO	BentCigarN6	600	1.0818350	1.853063e+05	7.224343e+10	0.0397996	6	TRUE	100	FALSE	FALSE	FALSE	FALSE	NA	NA	NA	NA	1

1.0.2 Column definitions of the dataset

Algorithms: string Algorithm used in the optimization
CostFunction: string Specific cost function used. If the cost function can be instantiated in more than one dimension this name also includes the number of dimensions, e.g. SphereN10 is has the base class Sphere and the N=10 for dimensions.
BestArm: string represents the xalgo obtained at the end of the optimization
NumberFunctionEval: numeric number of times the functon was evaluated in total
EuclideanDistance: numeric ||xalgo - xoptimal||2
TrueRewardDifference: numeric falgo - foptimal
CumulativeRegret: numeric total regret
TimeToComplete: numeric time taken to complete the optimization
Continuous: string function properties from the Jamil and Yang survey 2013
Differentiability: string function properties from the Jamil and Yang survey 2013
Separability: string function properties from the Jamil and Yang survey 2013
Scalability: string function properties from the Jamil and Yang survey 2013
Modality: string function properties from the Jamil and Yang survey 2013
Ndimension: numeric number of dimensions
OptimizationSuccessful"
BBOB: boolean is part of the BBOB 2009 functions?
BaseClass: string the benchmark function used. E.g. SphereN10 has the base class Sphere
SD: numeric gaussian noise added to the benchmark function
MaxFeval: numeric maximum number of function evaluations in total
MaxFevalPerDimensions: numeric maximum number of function evaluations allowed per dimensions
FevalPerDimensions: numeric number of times the benchmark function was evaluated per dimensions (some algorithms might evaluate a bit less than the maximum)
SolveAt1: boolean was the problem solved at precision 1
SolveAt1e-1" boolean was the problem solved at precision 1e-1
SolveAt1e-3" boolean was the problem solved at precision 1e-3
SolveAt1e-6: boolean was the problem solved at precision 1e-6
SolveEarlierAt1: numeric iteration number where converged to the result at precision 1
SolveEarlierAt1e-1: numeric iteration number where converged to the result at precision 1e-1
SolveEarlierAt1e-3: numeric iteration number where converged to the result at precision 1e-3
SolveEarlierAt1e-6: numeric iteration number where converged to the result at precision 1e-6
simNumber: numeric number of the repeated measures, in the dataset, every algorithm was evaluated 10 times in each benchmark function in each condition, in this case the number goes from 0 to 9