Chapter 1 Introduction

This document is based on a single dataset available at ./data/statscomp.csv. With this dataset we ask different research questions that motivates the statistical models developed on the paper. Explanations about the models and how the data was obtained are available in the paper.

1.0.1 Exploring the dataset

This dataset follows the principle of tidy data as described in https://r4ds.had.co.nz/tidy-data.html. The key idea is that every variables has its own column, and every observation has its own unique row. Throughout this document, to facilitate our modeling approach, we will modify this dataset in different ways, often resulting in non-tidy data. However every model will start from the same base tidy dataset.

This approach will hopefully make it easier for the reader to understand from where we are starting and adopt similar strategies in their own models. Additionally, we recommend, if the reader has the opportunity to influence the data collection process, the choice of tidy data. It is often ideal for exploratory analysis, plotting, is the basis for most models, and easy to transform to be used in different models.

d <- readr::read_csv('./data/statscomp.csv')

Here we are excluding a few columns to simplify our view

kable(head(dplyr::select(d,
                         -BestArm, -Continuous, -Differentiability, -Separability, -Scalability, -Modality,-BBOB,-BaseClass, -MaxFeval, -FevalPerDimensions),
           n=10)) %>% 
  kableExtra::scroll_box(width = "100%")
Algorithm CostFunction NumberFunctionEval EuclideanDistance TrueRewardDifference CumulativeRegret TimeToComplete Ndimensions OptimizationSuccessful SD MaxFevalPerDimensions SolveAt1 SolveAt1e-1 SolveAt1e-3 SolveAt1e-6 SolveEarlierAt1 SolveEarlierAt1e-1 SolveEarlierAt1e-3 SolveEarlierAt1e-6 simNumber
NelderMead BentCigarN6 600 8.9123708 7.364249e+07 4.926652e+10 0.0300207 6 TRUE 0 100 FALSE FALSE FALSE FALSE NA NA NA NA 0
PSO BentCigarN6 600 0.5605997 1.559497e+05 7.938082e+10 0.0394440 6 TRUE 0 100 FALSE FALSE FALSE FALSE NA NA NA NA 0
SimulatedAnnealing BentCigarN6 600 9.7499527 1.086834e+08 1.208830e+12 0.0424774 6 TRUE 0 100 FALSE FALSE FALSE FALSE NA NA NA NA 0
CuckooSearch BentCigarN6 600 8.0025211 1.200314e+07 1.017438e+13 0.0317579 6 TRUE 0 100 FALSE FALSE FALSE FALSE NA NA NA NA 0
DifferentialEvolution BentCigarN6 600 5.3888603 4.634518e+06 2.399718e+12 0.1168543 6 TRUE 0 100 FALSE FALSE FALSE FALSE NA NA NA NA 0
RandomSearch1 BentCigarN6 600 1.5702536 1.919896e+06 1.983989e+13 0.0399160 6 TRUE 0 100 FALSE FALSE FALSE FALSE NA NA NA NA 0
RandomSearch2 BentCigarN6 599 1.5702536 1.655484e+06 1.929796e+13 0.0356977 6 TRUE 0 100 FALSE FALSE FALSE FALSE NA NA NA NA 0
CMAES BentCigarN6 604 0.5744144 3.357865e-01 2.589247e+08 0.1810286 6 TRUE 0 100 TRUE FALSE FALSE FALSE 543 NA NA NA 0
NelderMead BentCigarN6 600 7.9123493 2.152945e+07 2.041479e+11 0.0423668 6 TRUE 0 100 FALSE FALSE FALSE FALSE NA NA NA NA 1
PSO BentCigarN6 600 1.0818350 1.853063e+05 7.224343e+10 0.0397996 6 TRUE 0 100 FALSE FALSE FALSE FALSE NA NA NA NA 1

1.0.2 Column definitions of the dataset

  • Algorithms: string Algorithm used in the optimization
  • CostFunction: string Specific cost function used. If the cost function can be instantiated in more than one dimension this name also includes the number of dimensions, e.g. SphereN10 is has the base class Sphere and the N=10 for dimensions.
  • BestArm: string represents the xalgo obtained at the end of the optimization
  • NumberFunctionEval: numeric number of times the functon was evaluated in total
  • EuclideanDistance: numeric ||xalgo - xoptimal||2
  • TrueRewardDifference: numeric falgo - foptimal
  • CumulativeRegret: numeric total regret
  • TimeToComplete: numeric time taken to complete the optimization
  • Continuous: string function properties from the Jamil and Yang survey 2013
  • Differentiability: string function properties from the Jamil and Yang survey 2013
  • Separability: string function properties from the Jamil and Yang survey 2013
  • Scalability: string function properties from the Jamil and Yang survey 2013
  • Modality: string function properties from the Jamil and Yang survey 2013
  • Ndimension: numeric number of dimensions
  • OptimizationSuccessful"
  • BBOB: boolean is part of the BBOB 2009 functions?
  • BaseClass: string the benchmark function used. E.g. SphereN10 has the base class Sphere
  • SD: numeric gaussian noise added to the benchmark function
  • MaxFeval: numeric maximum number of function evaluations in total
  • MaxFevalPerDimensions: numeric maximum number of function evaluations allowed per dimensions
  • FevalPerDimensions: numeric number of times the benchmark function was evaluated per dimensions (some algorithms might evaluate a bit less than the maximum)
  • SolveAt1: boolean was the problem solved at precision 1
  • SolveAt1e-1" boolean was the problem solved at precision 1e-1
  • SolveAt1e-3" boolean was the problem solved at precision 1e-3
  • SolveAt1e-6: boolean was the problem solved at precision 1e-6
  • SolveEarlierAt1: numeric iteration number where converged to the result at precision 1
  • SolveEarlierAt1e-1: numeric iteration number where converged to the result at precision 1e-1
  • SolveEarlierAt1e-3: numeric iteration number where converged to the result at precision 1e-3
  • SolveEarlierAt1e-6: numeric iteration number where converged to the result at precision 1e-6
  • simNumber: numeric number of the repeated measures, in the dataset, every algorithm was evaluated 10 times in each benchmark function in each condition, in this case the number goes from 0 to 9