Wrappers can be employed to extend integrated learners (makeLearner()
) with new functionality. The broad scope of operations and methods which are implemented as wrappers underline the flexibility of the wrapping approach:
makeMulticlassWrapper()
) for binary-class learnersAll these operations and methods have a few things in common: First, they all wrap around mlr
learners (makeLearner()
) and they return a new learner. Therefore learners can be wrapped multiple times. Second, they are implemented using a train (pre-model hook) and predict (post-model hook) method.
In this section we exemplary describe the bagging wrapper to create a random forest which supports weights. To achieve that we combine several decision trees from the rpart
package to create our own custom random forest.
First, we create a weighted toy task.
data(iris)
task = makeClassifTask(data = iris, target = "Species", weights = as.integer(iris$Species))
Next, we use makeBaggingWrapper()
to create the base learners and the bagged learner. We choose to set equivalents of ntree
(100 base learners) and mtry
(proportion of randomly selected features).
base.lrn = makeLearner("classif.rpart")
wrapped.lrn = makeBaggingWrapper(base.lrn, bw.iters = 100, bw.feats = 0.5)
print(wrapped.lrn)
## Learner classif.rpart.bagged from package rpart
## Type: classif
## Name: ; Short name:
## Class: BaggingWrapper
## Properties: twoclass,multiclass,missings,numerics,factors,ordered,prob,weights,featimp
## Predict-Type: response
## Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5
As we can see in the output, the wrapped learner inherited all properties from the base learner, especially the “weights” attribute is still present. We can use this newly constructed learner like all base learners, i.e. we can use it in train()
, benchmark()
, resample()
, etc.
benchmark(tasks = task, learners = list(base.lrn, wrapped.lrn))
## Task: iris, Learner: classif.rpart
## Resampling: cross-validation
## Measures: mmce
## [Resample] iter 1: 0.0000000
## [Resample] iter 2: 0.2000000
## [Resample] iter 3: 0.2000000
## [Resample] iter 4: 0.0666667
## [Resample] iter 5: 0.0666667
## [Resample] iter 6: 0.0000000
## [Resample] iter 7: 0.0666667
## [Resample] iter 8: 0.0000000
## [Resample] iter 9: 0.1333333
## [Resample] iter 10: 0.0666667
##
## Aggregated Result: mmce.test.mean=0.0800000
##
## Task: iris, Learner: classif.rpart.bagged
## Resampling: cross-validation
## Measures: mmce
## [Resample] iter 1: 0.0000000
## [Resample] iter 2: 0.2000000
## [Resample] iter 3: 0.0666667
## [Resample] iter 4: 0.0666667
## [Resample] iter 5: 0.0000000
## [Resample] iter 6: 0.0000000
## [Resample] iter 7: 0.0000000
## [Resample] iter 8: 0.0000000
## [Resample] iter 9: 0.0666667
## [Resample] iter 10: 0.0666667
##
## Aggregated Result: mmce.test.mean=0.0466667
##
## task.id learner.id mmce.test.mean
## 1 iris classif.rpart 0.08000000
## 2 iris classif.rpart.bagged 0.04666667
That far we are quite happy with our new learner. But we hope for a better performance by tuning some hyperparameters of both the decision trees and bagging wrapper. Let’s have a look at the available hyperparameters of the fused learner:
getParamSet(wrapped.lrn)
## Type len Def Constr Req Tunable Trafo
## bw.iters integer - 10 1 to Inf - TRUE -
## bw.replace logical - TRUE - - TRUE -
## bw.size numeric - - 0 to 1 - TRUE -
## bw.feats numeric - 0.667 0 to 1 - TRUE -
## minsplit integer - 20 1 to Inf - TRUE -
## minbucket integer - - 1 to Inf - TRUE -
## cp numeric - 0.01 0 to 1 - TRUE -
## maxcompete integer - 4 0 to Inf - TRUE -
## maxsurrogate integer - 5 0 to Inf - TRUE -
## usesurrogate discrete - 2 0,1,2 - TRUE -
## surrogatestyle discrete - 0 0,1 - TRUE -
## maxdepth integer - 30 1 to 30 - TRUE -
## xval integer - 10 0 to Inf - FALSE -
## parms untyped - - - - TRUE -
We choose to tune the parameters minsplit
and bw.feats
for the mmce using a random search (TuneControl()
) in a 3-fold CV:
ctrl = makeTuneControlRandom(maxit = 10)
rdesc = makeResampleDesc("CV", iters = 3)
par.set = makeParamSet(
makeIntegerParam("minsplit", lower = 1, upper = 10),
makeNumericParam("bw.feats", lower = 0.25, upper = 1)
)
tuned.lrn = makeTuneWrapper(wrapped.lrn, rdesc, mmce, par.set, ctrl)
print(tuned.lrn)
## Learner classif.rpart.bagged.tuned from package rpart
## Type: classif
## Name: ; Short name:
## Class: TuneWrapper
## Properties: numerics,factors,ordered,missings,weights,prob,twoclass,multiclass,featimp
## Predict-Type: response
## Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5
Calling the train method of the newly constructed learner performs the following steps:
$next.learner
and calls its train method.bw.feats
is used in the bagging wrapper training function, the argument minsplit
gets passed down to $next.learner
. The base wrapper function calls the base learner bw.iters
times and stores the resulting models.
lrn = train(tuned.lrn, task = task)
## [Tune] Started tuning learner classif.rpart.bagged for parameter set:
## Type len Def Constr Req Tunable Trafo
## minsplit integer - - 1 to 10 - TRUE -
## bw.feats numeric - - 0.25 to 1 - TRUE -
## With control class: TuneControlRandom
## Imputation value: 1
## [Tune-x] 1: minsplit=5; bw.feats=0.533
## [Tune-y] 1: mmce.test.mean=0.0466667; time: 0.0 min
## [Tune-x] 2: minsplit=3; bw.feats=0.377
## [Tune-y] 2: mmce.test.mean=0.0600000; time: 0.0 min
## [Tune-x] 3: minsplit=8; bw.feats=0.29
## [Tune-y] 3: mmce.test.mean=0.0800000; time: 0.0 min
## [Tune-x] 4: minsplit=6; bw.feats=0.555
## [Tune-y] 4: mmce.test.mean=0.0533333; time: 0.0 min
## [Tune-x] 5: minsplit=9; bw.feats=0.699
## [Tune-y] 5: mmce.test.mean=0.0466667; time: 0.0 min
## [Tune-x] 6: minsplit=6; bw.feats=0.985
## [Tune-y] 6: mmce.test.mean=0.0466667; time: 0.0 min
## [Tune-x] 7: minsplit=10; bw.feats=0.632
## [Tune-y] 7: mmce.test.mean=0.0400000; time: 0.0 min
## [Tune-x] 8: minsplit=1; bw.feats=0.943
## [Tune-y] 8: mmce.test.mean=0.0600000; time: 0.0 min
## [Tune-x] 9: minsplit=9; bw.feats=0.715
## [Tune-y] 9: mmce.test.mean=0.0533333; time: 0.0 min
## [Tune-x] 10: minsplit=1; bw.feats=0.503
## [Tune-y] 10: mmce.test.mean=0.0466667; time: 0.0 min
## [Tune] Result: minsplit=10; bw.feats=0.632 : mmce.test.mean=0.0400000
print(lrn)
## Model for learner.id=classif.rpart.bagged.tuned; learner.class=TuneWrapper
## Trained on: task.id = iris; obs = 150; features = 4
## Hyperparameters: xval=0,bw.iters=100,bw.feats=0.5