Complete benchmark experiment to compare different learning algorithms across one or more tasks w.r.t. a given resampling strategy. Experiments are paired, meaning always the same training / test sets are used for the different learners. Furthermore, you can of course pass “enhanced” learners via wrappers, e.g., a learner can be automatically tuned using makeTuneWrapper.

benchmark(learners, tasks, resamplings, measures, keep.pred = TRUE,
  models = TRUE, show.info = getMlrOption("show.info"))

Arguments

learners

(list of Learner | character)
Learning algorithms which should be compared, can also be a single learner. If you pass strings the learners will be created via makeLearner.

tasks

list of Task
Tasks that learners should be run on.

resamplings

(list of ResampleDesc | ResampleInstance)
Resampling strategy for each tasks. If only one is provided, it will be replicated to match the number of tasks. If missing, a 10-fold cross validation is used.

measures

(list of Measure)
Performance measures for all tasks. If missing, the default measure of the first task is used.

keep.pred

(logical(1))
Keep the prediction data in the pred slot of the result object. If you do many experiments (on larger data sets) these objects might unnecessarily increase object size / mem usage, if you do not really need them. In this case you can set this argument to FALSE. Default is TRUE.

models

(logical(1))
Should all fitted models be stored in the ResampleResult? Default is TRUE.

show.info

(logical(1))
Print verbose output on console? Default is set via configureMlr.

Value

BenchmarkResult.

See also

Examples

lrns = list(makeLearner("classif.lda"), makeLearner("classif.rpart")) tasks = list(iris.task, sonar.task) rdesc = makeResampleDesc("CV", iters = 2L) meas = list(acc, ber) bmr = benchmark(lrns, tasks, rdesc, measures = meas)
#> Task: iris-example, Learner: classif.lda
#> Resampling: cross-validation
#> Measures: acc ber
#> [Resample] iter 1: 1.0000000 0.0000000
#> [Resample] iter 2: 0.9600000 0.0394872
#>
#> Aggregated Result: acc.test.mean=0.9800000,ber.test.mean=0.0197436
#>
#> Task: Sonar-example, Learner: classif.lda
#> Resampling: cross-validation
#> Measures: acc ber
#> [Resample] iter 1: 0.6538462 0.3484848
#> [Resample] iter 2: 0.7115385 0.2881983
#>
#> Aggregated Result: acc.test.mean=0.6826923,ber.test.mean=0.3183416
#>
#> Task: iris-example, Learner: classif.rpart
#> Resampling: cross-validation
#> Measures: acc ber
#> [Resample] iter 1: 0.9600000 0.0411111
#> [Resample] iter 2: 0.9333333 0.0651282
#>
#> Aggregated Result: acc.test.mean=0.9466667,ber.test.mean=0.0531197
#>
#> Task: Sonar-example, Learner: classif.rpart
#> Resampling: cross-validation
#> Measures: acc ber
#> [Resample] iter 1: 0.7115385 0.2954545
#> [Resample] iter 2: 0.7307692 0.2711802
#>
#> Aggregated Result: acc.test.mean=0.7211538,ber.test.mean=0.2833174
#>
rmat = convertBMRToRankMatrix(bmr) print(rmat)
#> iris-example Sonar-example #> classif.lda 1 2 #> classif.rpart 2 1
plotBMRBoxplots(bmr, ber, style = "violin")
#> Warning: no non-missing arguments to max; returning -Inf
#> Warning: no non-missing arguments to max; returning -Inf
plotBMRRanksAsBarChart(bmr, pos = "stack")
#> #> Friedman rank sum test #> #> data: acc.test.mean and learner.id and task.id #> Friedman chi-squared = 0, df = 1, p-value = 1 #>
friedmanPostHocTestBMR(bmr, p.value = 0.05)
#> Loading required package: PMCMR
#> PMCMR is superseded by PMCMRplus and will be no longer maintained. You may wish to install PMCMRplus instead.
#> Warning: Cannot reject null hypothesis of overall Friedman test, #> returning overall Friedman test.
#> #> Friedman rank sum test #> #> data: acc.test.mean and learner.id and task.id #> Friedman chi-squared = 0, df = 1, p-value = 1 #>