Benchmark experiment for multiple learners and tasks.

Complete benchmark experiment to compare different learning algorithms across one or more tasks w.r.t. a given resampling strategy. Experiments are paired, meaning always the same training / test sets are used for the different learners. Furthermore, you can of course pass “enhanced” learners via wrappers, e.g., a learner can be automatically tuned using makeTuneWrapper.

benchmark(
  learners,
  tasks,
  resamplings,
  measures,
  keep.pred = TRUE,
  keep.extract = FALSE,
  models = FALSE,
  show.info = getMlrOption("show.info")
)

Arguments

learners	(list of Learner \| character) Learning algorithms which should be compared, can also be a single learner. If you pass strings the learners will be created via makeLearner.
tasks	list of Task Tasks that learners should be run on.
resamplings	(list of ResampleDesc \| ResampleInstance) Resampling strategy for each tasks. If only one is provided, it will be replicated to match the number of tasks. If missing, a 10-fold cross validation is used.
measures	(list of Measure) Performance measures for all tasks. If missing, the default measure of the first task is used.
keep.pred	(`logical(1)`) Keep the prediction data in the `pred` slot of the result object. If you do many experiments (on larger data sets) these objects might unnecessarily increase object size / mem usage, if you do not really need them. The default is set to `TRUE`.
keep.extract	(`logical(1)`) Keep the `extract` slot of the result object. When creating a lot of benchmark results with extensive tuning, the resulting R objects can become very large in size. That is why the tuning results stored in the `extract` slot are removed by default (`keep.extract = FALSE`). Note that when `keep.extract = FALSE` you will not be able to conduct analysis in the tuning results.
models	(`logical(1)`) Should all fitted models be stored in the ResampleResult? Default is `FALSE`.
show.info	(`logical(1)`) Print verbose output on console? Default is set via configureMlr.

Value

BenchmarkResult.

Examples

lrns = list(makeLearner("classif.lda"), makeLearner("classif.rpart"))
tasks = list(iris.task, sonar.task)
rdesc = makeResampleDesc("CV", iters = 2L)
meas = list(acc, ber)
bmr = benchmark(lrns, tasks, rdesc, measures = meas)
#> Task: iris-example, Learner: classif.lda
#> Resampling: cross-validation
#> Measures:             acc       ber       
#> [Resample] iter 1:    1.0000000 0.0000000 
#> [Resample] iter 2:    0.9600000 0.0411111 
#> 
#> Aggregated Result: acc.test.mean=0.9800000,ber.test.mean=0.0205556
#> 
#> Task: Sonar-example, Learner: classif.lda
#> Resampling: cross-validation
#> Measures:             acc       ber       
#> [Resample] iter 1:    0.7019231 0.2980769 
#> [Resample] iter 2:    0.7788462 0.2133710 
#> 
#> Aggregated Result: acc.test.mean=0.7403846,ber.test.mean=0.2557240
#> 
#> Task: iris-example, Learner: classif.rpart
#> Resampling: cross-validation
#> Measures:             acc       ber       
#> [Resample] iter 1:    0.9600000 0.0394872 
#> [Resample] iter 2:    0.9333333 0.0672222 
#> 
#> Aggregated Result: acc.test.mean=0.9466667,ber.test.mean=0.0533547
#> 
#> Task: Sonar-example, Learner: classif.rpart
#> Resampling: cross-validation
#> Measures:             acc       ber       
#> [Resample] iter 1:    0.7019231 0.2980769 
#> [Resample] iter 2:    0.6923077 0.3054614 
#> 
#> Aggregated Result: acc.test.mean=0.6971154,ber.test.mean=0.3017692
#> 
rmat = convertBMRToRankMatrix(bmr)
print(rmat)
#>               Sonar-example iris-example
#> classif.lda               1            1
#> classif.rpart             2            2
plotBMRSummary(bmr)
plotBMRBoxplots(bmr, ber, style = "violin")
#> Warning: `fun.y` is deprecated. Use `fun` instead.
#> Warning: `fun.ymin` is deprecated. Use `fun.min` instead.
#> Warning: `fun.ymax` is deprecated. Use `fun.max` instead.
#> Warning: no non-missing arguments to max; returning -Inf
#> Warning: Computation failed in `stat_ydensity()`:
#> replacement has 1 row, data has 0
#> Warning: no non-missing arguments to max; returning -Inf
#> Warning: Computation failed in `stat_ydensity()`:
#> replacement has 1 row, data has 0
plotBMRRanksAsBarChart(bmr, pos = "stack")
friedmanTestBMR(bmr)
#> 
#> 	Friedman rank sum test
#> 
#> data:  acc.test.mean and learner.id and task.id
#> Friedman chi-squared = 2, df = 1, p-value = 0.1573
#> 
friedmanPostHocTestBMR(bmr, p.value = 0.05)
#> Loading required package: PMCMRplus
#> Warning: Cannot reject null hypothesis of overall Friedman test,
#>              returning overall Friedman test.
#> 
#> 	Friedman rank sum test
#> 
#> data:  acc.test.mean and learner.id and task.id
#> Friedman chi-squared = 2, df = 1, p-value = 0.1573
#>

Benchmark experiment for multiple learners and tasks.

Arguments

Value

See also

Examples