fixed bug in resample when using predict = “train” (issue #1284)
update to irace 2.0 – there are algorithmic changes in irace that may affect performance
generateFilterValuesData: fixed a bug wrt feature ordering
imputeLearner: fixed a bug when data actually contained no NAs
print.Learner: if a learner hyperpar was set to value “NA” this was not displayed in printer
makeLearner, setHyperPars: if you mistype a learner or hyperpar name, mlr uses fuzzy matching to suggest the 3 closest names in the message
tuneParams: tuning with irace is now also parallelized, i.e., different learner configs are evaluated in parallel.
benchmark: mini fix, arg ‘learners’ now also accepts class strings
object printers: some mlr printers show head previews of data.frames. these now also print info on the total nr of rows and cols and are less confusing
aggregations: have better properties now, they know whether they require training or test set evals
the filter methods have better R docs
filter randomForestSRC.var.select: new arg “method”
filter mrmr: fixed some smaller bugs and updated properties
generateLearningCurveData: also accepts single learner, does not require a list
plotThreshVsPerf: added “measures” arg
plotPartialDependence: can create tile plots with joint partial dependence on two features for multiclass classification by facetting across the classes
generatePartialDependenceData and generateFunctionalANOVAData: expanded “fun” argument to allow for calculation of weights
new “?mlrFamilies” manual page which lists all families and the functions belonging to it
we are converging on data.table as a standard internally, this should not change any API behavior on the outside, though
generateHyperParsEffectData and plotHyperParsEffect now support more than 2 hyperparameters
linear.correlation, rank.correlation, anova.test: use Rfast instead of FSelector/custom implementation now, performance should be much better
use of our own colAUC function instead of the ROCR package for AUC calculation to improve performance
we output resample performance messages for every iteration now
performance improvements for the auc measure
createDummyFeatures supports vectors now
removed the pretty.names argument from plotHyperParsEffect – labels can be set though normal ggplot2 functions on the returned object
Fixed a bad bug in resample, the slot “runtime” or a ResampleResult, when the runtime was measured not in seconds but e.g. mins. R measures then potentially in mins, but mlr claimed it would be seconds.
New “dummy” learners (that disregard features completely) can be fitted now for baseline comparisons, see “featureless” learners below.
functions - new
filter: randomForest.importance
generateFeatureImportanceData: permutation-based feature importance and local importance
getFeatureImportanceLearner: new Learner API function
getFeatureImportance: top level function to extract feature importance information
calculateROCMeasures
calculateConfusionMatrix: new confusion-matrix like function that calculates and tables many receiver operator measures
classif.ada: fixed some param problem with rpart.control params
classif.cforest, regr.cforest, surv.cforest: removed parameters “minprob”, “pvalue”, “randomsplits” as these are set internally and cannot be changed by the user
regr.GPfit: some more params for correlation kernel
classif.xgboost, regr.xgboost: can now properly handle NAs (property was missing and other problems), added “colsample_bylevel” parameter
adapted {classif,regr,surv}.ranger parameters for new ranger version
learners - new
multilabel.cforest
surv.gbm
regr.cvglmnet
{classif,regr,surv}.gamboost
classif.earth
{classif,regr}.evtree
{classif,regr}.evtree
learners - removed
classif.randomForestSRCSyn, regr.randomForestSRCSyn: due to continued stability issues