This page lists the learning methods already integrated in mlr
.
Columns Num., Fac., Ord., NAs, and Weights indicate if a method can cope with numerical, factor, and ordered factor predictors, if it can deal with missing values in a meaningful way (other than simply removing observations with missing values) and if observation weights are supported.
Column Props shows further properties of the learning methods specific to the type of learning task. See also RLearner()
for details.
For classification the following additional learner properties are relevant and shown in column Props:
Class / Short Name / Name | Packages | Num. | Fac. | Ord. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
classif.ada ada ada Boosting |
ada rpart |
X | X | prob twoclass |
xval has been set to 0 by default for speed. |
|||
classif.adaboostm1 adaboostm1 ada Boosting M1 |
RWeka | X | X | prob twoclass multiclass |
NAs are directly passed to WEKA with na.action = na.pass . |
|||
classif.bartMachine bartmachine Bayesian Additive Regression Trees |
bartMachine | X | X | X | prob twoclass |
use_missing_data has been set to TRUE by default to allow missing data support. |
||
classif.binomial binomial Binomial Regression |
stats | X | X | X | prob twoclass |
Delegates to glm with freely choosable binomial link function via learner parameter link . We set ‘model’ to FALSE by default to save memory. |
||
classif.boosting adabag Adabag Boosting |
adabag rpart |
X | X | X | prob twoclass multiclass featimp |
xval has been set to 0 by default for speed. |
||
classif.bst bst Gradient Boosting |
bst rpart |
X | twoclass | Renamed parameter learner to Learner due to nameclash with setHyperPars . Default changes: Learner = "ls" , xval = 0 , and maxdepth = 1 . |
||||
classif.C50 C50 C50 |
C50 | X | X | X | X | prob twoclass multiclass |
||
classif.cforest cforest Random forest based on conditional inference trees |
party | X | X | X | X | X | prob twoclass multiclass featimp |
See ?ctree_control for possible breakage for nominal features with missingness. |
classif.clusterSVM clusterSVM Clustered Support Vector Machines |
SwarmSVM LiblineaR |
X | twoclass |
centers set to 2 by default. |
||||
classif.ctree ctree Conditional Inference Trees |
party | X | X | X | X | X | prob twoclass multiclass |
See ?ctree_control for possible breakage for nominal features with missingness. |
classif.cvglmnet cvglmnet GLM with Lasso or Elasticnet Regularization (Cross Validated Lambda) |
glmnet | X | X | X | prob twoclass multiclass |
The family parameter is set to binomial for two-class problems and to multinomial otherwise. Factors automatically get converted to dummy columns, ordered factors to integer. glmnet uses a global control object for its parameters. mlr resets all control parameters to their defaults before setting the specified parameters and after training. If you are setting glmnet.control parameters through glmnet.control, you need to save and re-set them after running the glmnet learner. |
||
classif.dbnDNN dbn.dnn Deep neural network with weights initialized by DBN |
deepnet | X | prob twoclass multiclass |
output set to "softmax" by default. |
||||
classif.dcSVM dcSVM Divided-Conquer Support Vector Machines |
SwarmSVM e1071 |
X | twoclass | |||||
classif.earth fda Flexible Discriminant Analysis |
earth stats |
X | X | X | prob twoclass multiclass |
This learner performs flexible discriminant analysis using the earth algorithm. na.action is set to na.fail and only this is supported. | ||
classif.evtree evtree Evolutionary learning of globally optimal trees |
evtree | X | X | X | X | prob twoclass multiclass |
pmutatemajor , pmutateminor , pcrossover , psplit , and pprune , are scaled internally to sum to 100. |
|
classif.extraTrees extraTrees Extremely Randomized Trees |
extraTrees | X | X | prob twoclass multiclass |
||||
classif.fdausc.glm fdausc.glm Generalized Linear Models classification on FDA |
fda.usc | prob twoclass multiclass functionals |
model$C[[1]] is set to quote(classif.glm) | |||||
classif.fdausc.kernel fdausc.kernel Kernel classification on FDA |
fda.usc | prob twoclass multiclass single.functional |
Argument draw=FALSE is used as default. | |||||
classif.fdausc.knn fdausc.knn fdausc.knn |
fda.usc | X | prob twoclass multiclass single.functional |
Argument draw=FALSE is used as default. | ||||
classif.fdausc.np fdausc.np Nonparametric classification on FDA |
fda.usc | prob twoclass multiclass single.functional |
Argument draw=FALSE is used as default. Additionally, mod$C[[1]] is set to quote(classif.np) | |||||
classif.FDboost FDboost Functional linear array classification boosting |
FDboost mboost |
X | prob twoclass functionals |
Uses only one base learner per functional or scalar covariate. Uses the same hyperparameters for every baselearner. Currently does not support interaction between scalar covariates. Default for family has been set to ‘Binomial’, as ‘Gaussian’ is not applicable. | ||||
classif.featureless featureless Featureless classifier |
mlr | X | X | X | X | prob twoclass multiclass functionals |
||
classif.fgam FGAM functional general additive model |
refund | prob twoclass functionals single.functional |
||||||
classif.fnn fnn Fast k-Nearest Neighbour |
FNN | X | twoclass multiclass |
|||||
classif.gamboost gamboost Gradient boosting with smooth components |
mboost | X | X | X | prob twoclass |
family has been set to Binomial() by default. For ‘family’ ‘AUC’ and ‘AdaExp’ probabilities cannot be predicted. |
||
classif.gaterSVM gaterSVM Mixture of SVMs with Neural Network Gater Function |
SwarmSVM | X | twoclass |
m set to 3 and max.iter set to 1 by default. |
||||
classif.gausspr gausspr Gaussian Processes |
kernlab | X | X | prob twoclass multiclass |
Kernel parameters have to be passed directly and not by using the kpar list in gausspr . Note that fit has been set to FALSE by default for speed. |
|||
classif.gbm gbm Gradient Boosting Machine |
gbm | X | X | X | X | prob twoclass multiclass featimp |
keep.data is set to FALSE to reduce memory requirements. Param ‘n.cores’ has been to set to ‘1’ by default to suppress parallelization by the package. |
|
classif.geoDA geoda Geometric Predictive Discriminant Analysis |
DiscriMiner | X | twoclass multiclass |
|||||
classif.glmboost glmboost Boosting for GLMs |
mboost | X | X | X | prob twoclass |
family has been set to Binomial by default. For ‘family’ ‘AUC’ and ‘AdaExp’ probabilities cannot be predcited. |
||
classif.glmnet glmnet GLM with Lasso or Elasticnet Regularization |
glmnet | X | X | X | prob twoclass multiclass |
The family parameter is set to binomial for two-class problems and to multinomial otherwise. Factors automatically get converted to dummy columns, ordered factors to integer. Parameter s (value of the regularization parameter used for predictions) is set to 0.01 by default, but needs to be tuned by the user. glmnet uses a global control object for its parameters. mlr resets all control parameters to their defaults before setting the specified parameters and after training. If you are setting glmnet.control parameters through glmnet.control, you need to save and re-set them after running the glmnet learner. |
||
classif.h2o.deeplearning h2o.dl h2o.deeplearning |
h2o | X | X | X | X | prob twoclass multiclass featimp |
The default value of missing_values_handling is "MeanImputation" , so missing values are automatically mean-imputed. |
|
classif.h2o.gbm h2o.gbm h2o.gbm |
h2o | X | X | X | prob twoclass multiclass featimp |
‘distribution’ is set automatically to ‘gaussian’. | ||
classif.h2o.glm h2o.glm h2o.glm |
h2o | X | X | X | X | prob twoclass featimp |
family is always set to "binomial" to get a binary classifier. The default value of missing_values_handling is "MeanImputation" , so missing values are automatically mean-imputed. |
|
classif.h2o.randomForest h2o.rf h2o.randomForest |
h2o | X | X | X | prob twoclass multiclass featimp |
|||
classif.IBk ibk k-Nearest Neighbours |
RWeka | X | X | prob twoclass multiclass |
||||
classif.J48 j48 J48 Decision Trees |
RWeka | X | X | X | prob twoclass multiclass |
NAs are directly passed to WEKA with na.action = na.pass . |
||
classif.JRip jrip Propositional Rule Learner |
RWeka | X | X | X | prob twoclass multiclass |
NAs are directly passed to WEKA with na.action = na.pass . |
||
classif.kknn kknn k-Nearest Neighbor |
kknn | X | X | prob twoclass multiclass |
||||
classif.knn knn k-Nearest Neighbor |
class | X | twoclass multiclass |
|||||
classif.ksvm ksvm Support Vector Machines |
kernlab | X | X | prob twoclass multiclass class.weights |
Kernel parameters have to be passed directly and not by using the kpar list in ksvm . Note that fit has been set to FALSE by default for speed. |
|||
classif.lda lda Linear Discriminant Analysis |
MASS | X | X | prob twoclass multiclass |
Learner parameter predict.method maps to method in predict.lda . |
|||
classif.LiblineaRL1L2SVC liblinl1l2svc L1-Regularized L2-Loss Support Vector Classification |
LiblineaR | X | twoclass multiclass class.weights |
|||||
classif.LiblineaRL1LogReg liblinl1logreg L1-Regularized Logistic Regression |
LiblineaR | X | prob twoclass multiclass class.weights |
|||||
classif.LiblineaRL2L1SVC liblinl2l1svc L2-Regularized L1-Loss Support Vector Classification |
LiblineaR | X | twoclass multiclass class.weights |
|||||
classif.LiblineaRL2LogReg liblinl2logreg L2-Regularized Logistic Regression |
LiblineaR | X | prob twoclass multiclass class.weights |
type = 0 (the default) is primal and type = 7 is dual problem. |
||||
classif.LiblineaRL2SVC liblinl2svc L2-Regularized L2-Loss Support Vector Classification |
LiblineaR | X | twoclass multiclass class.weights |
type = 2 (the default) is primal and type = 1 is dual problem. |
||||
classif.LiblineaRMultiClassSVC liblinmulticlasssvc Support Vector Classification by Crammer and Singer |
LiblineaR | X | twoclass multiclass class.weights |
|||||
classif.linDA linda Linear Discriminant Analysis |
DiscriMiner | X | twoclass multiclass |
Set validation = NULL by default to disable internal test set validation. |
||||
classif.logreg logreg Logistic Regression |
stats | X | X | X | prob twoclass |
Delegates to glm with family = binomial(link = 'logit') . We set ‘model’ to FALSE by default to save memory. |
||
classif.lssvm lssvm Least Squares Support Vector Machine |
kernlab | X | X | twoclass multiclass |
fitted has been set to FALSE by default for speed. |
|||
classif.lvq1 lvq1 Learning Vector Quantization |
class | X | twoclass multiclass |
|||||
classif.mda mda Mixture Discriminant Analysis |
mda | X | X | prob twoclass multiclass |
keep.fitted has been set to FALSE by default for speed and we use start.method = "lvq" for more robust behavior / less technical crashes. |
|||
classif.mlp mlp Multi-Layer Perceptron |
RSNNS | X | prob twoclass multiclass |
|||||
classif.multinom multinom Multinomial Regression |
nnet | X | X | X | prob twoclass multiclass |
|||
classif.naiveBayes nbayes Naive Bayes |
e1071 | X | X | X | prob twoclass multiclass |
|||
classif.neuralnet neuralnet Neural Network from neuralnet |
neuralnet | X | prob twoclass |
err.fct has been set to ce and linear.output to FALSE to do classification. |
||||
classif.nnet nnet Neural Network |
nnet | X | X | X | prob twoclass multiclass |
linout=TRUE is hardcoded for regression. size has been set to 3 by default. |
||
classif.nnTrain nn.train Training Neural Network by Backpropagation |
deepnet | X | prob twoclass multiclass |
output set to softmax by default. max.number.of.layers can be set to control and tune the maximal number of layers specified via hidden . |
||||
classif.nodeHarvest nodeHarvest Node Harvest |
nodeHarvest | X | X | prob twoclass |
||||
classif.OneR oner 1-R Classifier |
RWeka | X | X | X | prob twoclass multiclass |
NAs are directly passed to WEKA with na.action = na.pass . |
||
classif.pamr pamr Nearest shrunken centroid |
pamr | X | prob twoclass |
Threshold for prediction (threshold.predict ) has been set to 1 by default. |
||||
classif.PART part PART Decision Lists |
RWeka | X | X | X | prob twoclass multiclass |
NAs are directly passed to WEKA with na.action = na.pass . |
||
classif.penalized penalized Penalized Logistic Regression |
penalized | X | X | X | prob twoclass |
trace=FALSE was set by default to disable logging output. | ||
classif.plr plr Logistic Regression with a L2 Penalty |
stepPlr | X | X | X | prob twoclass |
AIC and BIC penalty types can be selected via the new parameter cp.type . |
||
classif.plsdaCaret plsdacaret Partial Least Squares (PLS) Discriminant Analysis |
caret pls |
X | prob twoclass multiclass |
|||||
classif.probit probit Probit Regression |
stats | X | X | X | prob twoclass |
Delegates to glm with family = binomial(link = 'probit') . We set ‘model’ to FALSE by default to save memory. |
||
classif.qda qda Quadratic Discriminant Analysis |
MASS | X | X | prob twoclass multiclass |
Learner parameter predict.method maps to method in predict.qda . |
|||
classif.quaDA quada Quadratic Discriminant Analysis |
DiscriMiner | X | twoclass multiclass |
|||||
classif.randomForest rf Random Forest |
randomForest | X | X | X | prob twoclass multiclass class.weights featimp oobpreds |
Note that the rf can freeze the R process if trained on a task with 1 feature which is constant. This can happen in feature forward selection, also due to resampling, and you need to remove such features with removeConstantFeatures. | ||
classif.randomForestSRC rfsrc Random Forest |
randomForestSRC | X | X | X | X | X | prob twoclass multiclass featimp oobpreds |
na.action has been set to "na.impute" by default to allow missing data support. |
classif.ranger ranger Random Forests |
ranger | X | X | X | X | prob twoclass multiclass featimp oobpreds |
By default, internal parallelization is switched off (num.threads = 1 ), verbose output is disabled, respect.unordered.factors is set to order for all splitrules. If predict.type=‘prob’ we set ‘probability=TRUE’ in ranger. |
|
classif.rda rda Regularized Discriminant Analysis |
klaR | X | X | prob twoclass multiclass |
estimate.error has been set to FALSE by default for speed. |
|||
classif.rFerns rFerns Random ferns |
rFerns | X | X | X | twoclass multiclass oobpreds |
|||
classif.rknn rknn Random k-Nearest-Neighbors |
rknn | X | X | twoclass multiclass |
k restricted to < 99 as the code allocates arrays of static size | |||
classif.rotationForest rotationForest Rotation Forest |
rotationForest | X | X | X | prob twoclass |
|||
classif.rpart rpart Decision Tree |
rpart | X | X | X | X | X | prob twoclass multiclass featimp |
xval has been set to 0 by default for speed. |
classif.RRF RRF Regularized Random Forests |
RRF | X | X | prob twoclass multiclass featimp |
||||
classif.rrlda rrlda Robust Regularized Linear Discriminant Analysis |
rrlda | X | twoclass multiclass |
|||||
classif.saeDNN sae.dnn Deep neural network with weights initialized by Stacked AutoEncoder |
deepnet | X | prob twoclass multiclass |
output set to "softmax" by default. |
||||
classif.sda sda Shrinkage Discriminant Analysis |
sda | X | prob twoclass multiclass |
|||||
classif.sparseLDA sparseLDA Sparse Discriminant Analysis |
sparseLDA MASS elasticnet |
X | prob twoclass multiclass |
Arguments Q and stop are not yet provided as they depend on the task. |
||||
classif.svm svm Support Vector Machines (libsvm) |
e1071 | X | X | prob twoclass multiclass class.weights |
||||
classif.xgboost xgboost eXtreme Gradient Boosting |
xgboost | X | X | X | prob twoclass multiclass featimp |
All settings are passed directly, rather than through xgboost ’s params argument. nrounds has been set to 1 and verbose to 0 by default. num_class is set internally, so do not set this manually. |
Additional learner properties:
Class / Short Name / Name | Packages | Num. | Fac. | Ord. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
regr.bartMachine bartmachine Bayesian Additive Regression Trees |
bartMachine | X | X | X |
use_missing_data has been set to TRUE by default to allow missing data support. |
|||
regr.bcart bcart Bayesian CART |
tgp | X | X | se | ||||
regr.bgp bgp Bayesian Gaussian Process |
tgp | X | se | |||||
regr.bgpllm bgpllm Bayesian Gaussian Process with jumps to the Limiting Linear Model |
tgp | X | se | |||||
regr.blm blm Bayesian Linear Model |
tgp | X | se | |||||
regr.brnn brnn Bayesian regularization for feed-forward neural networks |
brnn | X | X | |||||
regr.bst bst Gradient Boosting |
bst rpart |
X | Renamed parameter learner to Learner due to nameclash with setHyperPars . Default changes: Learner = "ls" , xval = 0 , and maxdepth = 1 . |
|||||
regr.btgp btgp Bayesian Treed Gaussian Process |
tgp | X | X | se | ||||
regr.btgpllm btgpllm Bayesian Treed Gaussian Process with jumps to the Limiting Linear Model |
tgp | X | X | se | ||||
regr.btlm btlm Bayesian Treed Linear Model |
tgp | X | X | se | ||||
regr.cforest cforest Random Forest Based on Conditional Inference Trees |
party | X | X | X | X | X | featimp | See ?ctree_control for possible breakage for nominal features with missingness. |
regr.crs crs Regression Splines |
crs | X | X | X | se | |||
regr.ctree ctree Conditional Inference Trees |
party | X | X | X | X | X | See ?ctree_control for possible breakage for nominal features with missingness. |
|
regr.cubist cubist Cubist |
Cubist | X | X | X | ||||
regr.cvglmnet cvglmnet GLM with Lasso or Elasticnet Regularization (Cross Validated Lambda) |
glmnet | X | X | X | Factors automatically get converted to dummy columns, ordered factors to integer. glmnet uses a global control object for its parameters. mlr resets all control parameters to their defaults before setting the specified parameters and after training. If you are setting glmnet.control parameters through glmnet.control, you need to save and re-set them after running the glmnet learner. | |||
regr.earth earth Multivariate Adaptive Regression Splines |
earth | X | X | |||||
regr.evtree evtree Evolutionary learning of globally optimal trees |
evtree | X | X | X | X |
pmutatemajor , pmutateminor , pcrossover , psplit , and pprune , are scaled internally to sum to 100. |
||
regr.extraTrees extraTrees Extremely Randomized Trees |
extraTrees | X | X | |||||
regr.FDboost FDboost Functional linear array regression boosting |
FDboost mboost |
X | functionals | Only allow one base learner for functional covariate and one base learner for scalar covariate, the parameters for these base learners are the same. Also we currently do not support interaction between scalar covariates | ||||
regr.featureless featureless Featureless regression |
mlr | X | X | X | X | functionals | ||
regr.fgam FGAM functional general additive model |
refund | functionals single.functional |
||||||
regr.fnn fnn Fast k-Nearest Neighbor |
FNN | X | ||||||
regr.frbs frbs Fuzzy Rule-based Systems |
frbs | X | ||||||
regr.gamboost gamboost Gradient Boosting with Smooth Components |
mboost | X | X | X | ||||
regr.gausspr gausspr Gaussian Processes |
kernlab | X | X | se | Kernel parameters have to be passed directly and not by using the kpar list in gausspr . Note that fit has been set to FALSE by default for speed. |
|||
regr.gbm gbm Gradient Boosting Machine |
gbm | X | X | X | X | featimp |
keep.data is set to FALSE to reduce memory requirements, distribution has been set to "gaussian" by default.Param ‘n.cores’ has been to set to ‘1’ by default to suppress parallelization by the package. |
|
regr.glm glm Generalized Linear Regression |
stats | X | X | X | se | ‘family’ must be a character and every family has its own link, i.e. family = ‘gaussian’, link.gaussian = ‘identity’, which is also the default. We set ‘model’ to FALSE by default to save memory. | ||
regr.glmboost glmboost Boosting for GLMs |
mboost | X | X | X | ||||
regr.glmnet glmnet GLM with Lasso or Elasticnet Regularization |
glmnet | X | X | X | X | Factors automatically get converted to dummy columns, ordered factors to integer. Parameter s (value of the regularization parameter used for predictions) is set to 0.01 by default, but needs to be tuned by the user. glmnet uses a global control object for its parameters. mlr resets all control parameters to their defaults before setting the specified parameters and after training. If you are setting glmnet.control parameters through glmnet.control, you need to save and re-set them after running the glmnet learner. |
||
regr.GPfit GPfit Gaussian Process |
GPfit | X | se | (1) As the optimization routine assumes that the inputs are scaled to the unit hypercube [0,1]^d, the input gets scaled for each variable by default. If this is not wanted, scale = FALSE has to be set. (2) We replace the GPfit parameter ‘corr = list(type = ’exponential’,power = 1.95)’ to be seperate parameters ‘type’ and ‘power’, in the case of corr = list(type = ‘matern’, nu = 0.5), the seperate parameters are ‘type’ and ‘matern_nu_k = 0’, and nu is computed by ‘nu = (2 * matern_nu_k + 1) / 2 = 0.5’ | ||||
regr.h2o.deeplearning h2o.dl h2o.deeplearning |
h2o | X | X | X | X | The default value of missing_values_handling is "MeanImputation" , so missing values are automatically mean-imputed. |
||
regr.h2o.gbm h2o.gbm h2o.gbm |
h2o | X | X | X | ||||
regr.h2o.glm h2o.glm h2o.glm |
h2o | X | X | X | X |
family is always set to "gaussian" . The default value of missing_values_handling is "MeanImputation" , so missing values are automatically mean-imputed. |
||
regr.h2o.randomForest h2o.rf h2o.randomForest |
h2o | X | X | X | ||||
regr.IBk ibk K-Nearest Neighbours |
RWeka | X | X | |||||
regr.kknn kknn K-Nearest-Neighbor regression |
kknn | X | X | |||||
regr.km km Kriging |
DiceKriging | X | se | In predict, we currently always use type = "SK" . The extra parameter jitter (default is FALSE ) enables adding a very small jitter (order 1e-12) to the x-values before prediction, as predict.km reproduces the exact y-values of the training data points, when you pass them in, even if the nugget effect is turned on. We further introduced nugget.stability which sets the nugget to nugget.stability * var(y) before each training to improve numerical stability. We recommend a setting of 10^-8 |
||||
regr.ksvm ksvm Support Vector Machines |
kernlab | X | X | Kernel parameters have to be passed directly and not by using the kpar list in ksvm . Note that fit has been set to FALSE by default for speed. |
||||
regr.laGP laGP Local Approximate Gaussian Process |
laGP | X | se | |||||
regr.LiblineaRL2L1SVR liblinl2l1svr L2-Regularized L1-Loss Support Vector Regression |
LiblineaR | X | Parameter svr_eps has been set to 0.1 by default. |
|||||
regr.LiblineaRL2L2SVR liblinl2l2svr L2-Regularized L2-Loss Support Vector Regression |
LiblineaR | X |
type = 11 (the default) is primal and type = 12 is dual problem. Parameter svr_eps has been set to 0.1 by default. |
|||||
regr.lm lm Simple Linear Regression |
stats | X | X | X | se | |||
regr.mars mars Multivariate Adaptive Regression Splines |
mda | X | ||||||
regr.mob mob Model-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node |
party modeltools |
X | X | X | ||||
regr.nnet nnet Neural Network |
nnet | X | X | X |
size has been set to 3 by default. |
|||
regr.nodeHarvest nodeHarvest Node Harvest |
nodeHarvest | X | X | |||||
regr.pcr pcr Principal Component Regression |
pls | X | X | |||||
regr.penalized penalized Penalized Regression |
penalized | X | X | trace=FALSE was set by default to disable logging output. | ||||
regr.plsr plsr Partial Least Squares Regression |
pls | X | X | |||||
regr.randomForest rf Random Forest |
randomForest | X | X | X | featimp oobpreds se |
See the section about ‘regr.randomForest’ in ?makeLearner for information about se estimation. Note that the rf can freeze the R process if trained on a task with 1 feature which is constant. This can happen in feature forward selection, also due to resampling, and you need to remove such features with removeConstantFeatures. keep.inbag is NULL by default but if predict.type = ‘se’ and se.method = ‘jackknife’ (the default) then it is automatically set to TRUE. |
||
regr.randomForestSRC rfsrc Random Forest |
randomForestSRC | X | X | X | X | X | featimp oobpreds |
na.action has been set to "na.impute" by default to allow missing data support. |
regr.ranger ranger Random Forests |
ranger | X | X | X | X | featimp oobpreds se |
By default, internal parallelization is switched off (num.threads = 1 ), verbose output is disabled, respect.unordered.factors is set to order for all splitrules. All settings are changeable. mtry.perc sets mtry to mtry.perc*getTaskNFeats(.task) . Default for mtry is the floor of square root of number of features in task. SE estimation is mc bias-corrected jackknife after bootstrap, see the section about ‘regr.randomForest’ in ?makeLearner for more details. |
|
regr.rknn rknn Random k-Nearest-Neighbors |
rknn | X | X | |||||
regr.rpart rpart Decision Tree |
rpart | X | X | X | X | X | featimp |
xval has been set to 0 by default for speed. |
regr.RRF RRF Regularized Random Forests |
RRF | X | X | X | featimp | |||
regr.rsm rsm Response Surface Regression |
rsm | X | You select the order of the regression by using modelfun = "FO" (first order), "TWI" (two-way interactions, this is with 1st oder terms!) and "SO" (full second order). |
|||||
regr.rvm rvm Relevance Vector Machine |
kernlab | X | X | Kernel parameters have to be passed directly and not by using the kpar list in rvm . Note that fit has been set to FALSE by default for speed. |
||||
regr.svm svm Support Vector Machines (libsvm) |
e1071 | X | X | |||||
regr.xgboost xgboost eXtreme Gradient Boosting |
xgboost | X | X | X | featimp | All settings are passed directly, rather than through xgboost ’s params argument. nrounds has been set to 1 and verbose to 0 by default. |
Additional learner properties:
Class / Short Name / Name | Packages | Num. | Fac. | Ord. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
surv.cforest crf Random Forest based on Conditional Inference Trees |
party survival |
X | X | X | X | X | featimp | See ?ctree_control for possible breakage for nominal features with missingness. |
surv.coxph coxph Cox Proportional Hazard Model |
survival | X | X | X | ||||
surv.cvglmnet cvglmnet GLM with Regularization (Cross Validated Lambda) |
glmnet | X | X | X | X | Factors automatically get converted to dummy columns, ordered factors to integer. | ||
surv.gamboost gamboost Gradient boosting with smooth components |
survival mboost |
X | X | X | X |
family has been set to CoxPH() by default. |
||
surv.gbm gbm Gradient Boosting Machine |
gbm | X | X | X | X | featimp |
keep.data is set to FALSE to reduce memory requirements. |
|
surv.glmboost glmboost Gradient Boosting with Componentwise Linear Models |
survival mboost |
X | X | X | X |
family has been set to CoxPH() by default. |
||
surv.glmnet glmnet GLM with Regularization |
glmnet | X | X | X | X | Factors automatically get converted to dummy columns, ordered factors to integer.Parameter s (value of the regularization parameter used for predictions) is set to 0.1 by default, but needs to be tuned by the user. glmnet uses a global control object for its parameters. mlr resets all control parameters to their defaults before setting the specified parametersand after training. If you are setting glmnet.control parameters through glmnet.control,you need to save and re-set them after running the glmnet learner. |
||
surv.randomForestSRC rfsrc Random Forest |
survival randomForestSRC |
X | X | X | X | X | featimp oobpreds |
na.action has been set to "na.impute" by default to allow missing data support. |
surv.ranger ranger Random Forests |
ranger | X | X | X | X | featimp | By default, internal parallelization is switched off (num.threads = 1 ), verbose output is disabled, respect.unordered.factors is set to order for all splitrules. All settings are changeable. |
|
surv.rpart rpart Survival Tree |
rpart | X | X | X | X | X | featimp |
xval has been set to 0 by default for speed. |
Additional learner properties:
Class / Short Name / Name | Packages | Num. | Fac. | Ord. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
cluster.cmeans cmeans Fuzzy C-Means Clustering |
e1071 clue |
X | prob | The predict method uses cl_predict from the clue package to compute the cluster memberships for new data. The default centers = 2 is added so the method runs without setting parameters, but this must in reality of course be changed by the user. |
||||
cluster.Cobweb cobweb Cobweb Clustering Algorithm |
RWeka | X | ||||||
cluster.dbscan dbscan DBScan Clustering |
fpc | X | A cluster index of NA indicates noise points. Specify method = 'dist' if the data should be interpreted as dissimilarity matrix or object. Otherwise Euclidean distances will be used. |
|||||
cluster.EM em Expectation-Maximization Clustering |
RWeka | X | ||||||
cluster.FarthestFirst farthestfirst FarthestFirst Clustering Algorithm |
RWeka | X | ||||||
cluster.kkmeans kkmeans Kernel K-Means |
kernlab | X |
centers has been set to 2L by default. The nearest center in kernel distance determines cluster assignment of new data points. Kernel parameters have to be passed directly and not by using the kpar list in kkmeans
|
|||||
cluster.kmeans kmeans K-Means |
stats clue |
X | prob | The predict method uses cl_predict from the clue package to compute the cluster memberships for new data. The default centers = 2 is added so the method runs without setting parameters, but this must in reality of course be changed by the user. |
||||
cluster.MiniBatchKmeans MBatchKmeans MiniBatchKmeans |
ClusterR | X | prob | Calls MiniBatchKmeans of package ClusterR. Argument clusters has default value of 2 if not provided by user. |
||||
cluster.SimpleKMeans simplekmeans K-Means Clustering |
RWeka | X | ||||||
cluster.XMeans xmeans XMeans (k-means with automatic determination of k) |
RWeka | X | You may have to install the XMeans Weka package: WPM('install-package', 'XMeans') . |
For ordinary misclassification costs you can use all the standard classification methods listed above.
For example-dependent costs there are several ways to generate cost-sensitive learners from ordinary regression and classification learners. See section cost-sensitive classification and the documentation of makeCostSensClassifWrapper()
, makeCostSensRegrWrapper()
and makeCostSensWeightedPairsWrapper()
for details.
Class / Short Name / Name | Packages | Num. | Fac. | Ord. | NAs | Weights | Props | Note |
---|---|---|---|---|---|---|---|---|
multilabel.cforest cforest Random forest based on conditional inference trees |
party | X | X | X | X | X | prob | |
multilabel.randomForestSRC rfsrc Random Forest |
randomForestSRC | X | X | X | X | prob |
na.action has been set to na.impute by default to allow missing data support. |
|
multilabel.rFerns rFerns Random ferns |
rFerns | X | X | X |
Moreover, you can use the binary relevance method to apply ordinary classification learners to the multilabel problem. See the documentation of function makeMultilabelBinaryRelevanceWrapper()
and the tutorial section on multilabel classification for details.