Version 2.10
mlr 2.10:
CRAN release: 2017-02-07
functions - general
- fixed bug in resample when using predict = “train” (issue #1284)
- update to irace 2.0 – there are algorithmic changes in irace that may affect performance
- generateFilterValuesData: fixed a bug wrt feature ordering
- imputeLearner: fixed a bug when data actually contained no NAs
- print.Learner: if a learner hyperpar was set to value “NA” this was not displayed in printer
- makeLearner, setHyperPars: if you mistype a learner or hyperpar name, mlr uses fuzzy matching to suggest the 3 closest names in the message
- tuneParams: tuning with irace is now also parallelized, i.e., different learner configs are evaluated in parallel.
- benchmark: mini fix, arg ‘learners’ now also accepts class strings
- object printers: some mlr printers show head previews of data.frames. these now also print info on the total nr of rows and cols and are less confusing
- aggregations: have better properties now, they know whether they require training or test set evals
- the filter methods have better R docs
- filter randomForestSRC.var.select: new arg “method”
- filter mrmr: fixed some smaller bugs and updated properties
- generateLearningCurveData: also accepts single learner, does not require a list
- plotThreshVsPerf: added “measures” arg
- plotPartialDependence: can create tile plots with joint partial dependence on two features for multiclass classification by facetting across the classes
- generatePartialDependenceData and generateFunctionalANOVAData: expanded “fun” argument to allow for calculation of weights
- new “?mlrFamilies” manual page which lists all families and the functions belonging to it
- we are converging on data.table as a standard internally, this should not change any API behavior on the outside, though
- generateHyperParsEffectData and plotHyperParsEffect now support more than 2 hyperparameters
- linear.correlation, rank.correlation, anova.test: use Rfast instead of FSelector/custom implementation now, performance should be much better
- use of our own colAUC function instead of the ROCR package for AUC calculation to improve performance
- we output resample performance messages for every iteration now
- performance improvements for the auc measure
- createDummyFeatures supports vectors now
- removed the pretty.names argument from plotHyperParsEffect – labels can be set though normal ggplot2 functions on the returned object
- Fixed a bad bug in resample, the slot “runtime” or a ResampleResult, when the runtime was measured not in seconds but e.g. mins. R measures then potentially in mins, but mlr claimed it would be seconds.
- New “dummy” learners (that disregard features completely) can be fitted now for baseline comparisons, see “featureless” learners below.
functions - new
- filter: randomForest.importance
- generateFeatureImportanceData: permutation-based feature importance and local importance
- getFeatureImportanceLearner: new Learner API function
- getFeatureImportance: top level function to extract feature importance information
- calculateROCMeasures
- calculateConfusionMatrix: new confusion-matrix like function that calculates and tables many receiver operator measures
- makeLearners: create multiple learners at once
- getLearnerId, getLearnerType, getLearnerPredictType, getLearnerPackages
- getLearnerParamSet, getLearnerParVals
- getRRPredictionList
- addRRMeasure
- plotResiduals
- getLearnerShortName
- mergeBenchmarkResults
functions - renamed
- Renamed rf.importance filter (now deprecated) to randomForestSRC.var.rfsrc
- Renamed rf.min.depth filter (now deprecated) to randomForestSRC.var.select
- Renamed getConfMatrix (now deprecated) to calculateConfusionMatrix
- Renamed setId (now deprecated) to setLearnerId
learners - general
- classif.ada: fixed some param problem with rpart.control params
- classif.cforest, regr.cforest, surv.cforest: removed parameters “minprob”, “pvalue”, “randomsplits” as these are set internally and cannot be changed by the user
- regr.GPfit: some more params for correlation kernel
- classif.xgboost, regr.xgboost: can now properly handle NAs (property was missing and other problems), added “colsample_bylevel” parameter
- adapted {classif,regr,surv}.ranger parameters for new ranger version
learners - new
- multilabel.cforest
- surv.gbm
- regr.cvglmnet
- {classif,regr,surv}.gamboost
- classif.earth
- {classif,regr}.evtree
- {classif,regr}.evtree