A stacked learner uses predictions of several base learners and fits a super learner using these predictions as features in order to predict the outcome. The following stacking methods are available:
average
Averaging of base learner predictions without weights.stack.nocv
Fits the super learner, where in-sample predictions of the base learners are used.stack.cv
Fits the super learner, where the base learner predictions are computed by cross-validated predictions (the resampling strategy can be set via theresampling
argument).hill.climb
Select a subset of base learner predictions by hill climbing algorithm.compress
Train a neural network to compress the model from a collection of base learners.
Usage
makeStackedLearner(
base.learners,
super.learner = NULL,
predict.type = NULL,
method = "stack.nocv",
use.feat = FALSE,
resampling = NULL,
parset = list()
)
Arguments
- base.learners
((list of) Learner)
A list of learners created withmakeLearner
.- super.learner
(Learner | character(1))
The super learner that makes the final prediction based on the base learners. If you pass a string, the super learner will be created viamakeLearner
. Not used formethod = 'average'
. Default isNULL
.- predict.type
(
character(1)
)
Sets the type of the final prediction formethod = 'average'
. For other methods, the predict type should be set withinsuper.learner
. If the type of the base learner prediction, which is set up withinbase.learners
, is"prob"
thenpredict.type = 'prob'
will use the average of all base learner predictions andpredict.type = 'response'
will use the class with highest probability as final prediction."response"
then, for classification tasks withpredict.type = 'prob'
, the final prediction will be the relative frequency based on the predicted base learner classes and classification tasks withpredict.type = 'response'
will use majority vote of the base learner predictions to determine the final prediction. For regression tasks, the final prediction will be the average of the base learner predictions.
- method
(
character(1)
)
“average” for averaging the predictions of the base learners, “stack.nocv” for building a super learner using the predictions of the base learners, “stack.cv” for building a super learner using cross-validated predictions of the base learners. “hill.climb” for averaging the predictions of the base learners, with the weights learned from hill climbing algorithm and “compress” for compressing the model to mimic the predictions of a collection of base learners while speeding up the predictions and reducing the size of the model. Default is “stack.nocv”,- use.feat
(
logical(1)
)
Whether the original features should also be passed to the super learner. Not used formethod = 'average'
. Default isFALSE
.- resampling
(ResampleDesc)
Resampling strategy formethod = 'stack.cv'
. Currently only CV is allowed for resampling. The defaultNULL
uses 5-fold CV.- parset
the parameters for
hill.climb
method, includingreplace
Whether a base learner can be selected more than once.init
Number of best models being included before the selection algorithm.bagprob
The proportion of models being considered in one round of selection.bagtime
The number of rounds of the bagging selection.metric
The result evaluation metric function taking two parameterspred
andtrue
, the smaller the score the better.
the parameters for
compress
method, includingk
the size multiplier of the generated dataprob
the probability to exchange valuess
the standard deviation of each numerical feature
Examples
# Classification
data(iris)
tsk = makeClassifTask(data = iris, target = "Species")
base = c("classif.rpart", "classif.lda", "classif.svm")
lrns = lapply(base, makeLearner)
lrns = lapply(lrns, setPredictType, "prob")
m = makeStackedLearner(base.learners = lrns,
predict.type = "prob", method = "hill.climb")
tmp = train(m, tsk)
#> Error in x[0, , drop = FALSE]: incorrect number of dimensions
res = predict(tmp, tsk)
#> Error in predict(tmp, tsk): object 'tmp' not found
# Regression
data(BostonHousing, package = "mlbench")
tsk = makeRegrTask(data = BostonHousing, target = "medv")
base = c("regr.rpart", "regr.svm")
lrns = lapply(base, makeLearner)
m = makeStackedLearner(base.learners = lrns,
predict.type = "response", method = "compress")
tmp = train(m, tsk)
#> Error in x[0, , drop = FALSE]: incorrect number of dimensions
res = predict(tmp, tsk)
#> Error in predict(tmp, tsk): object 'tmp' not found