mlr 2.10: 2017-02-07

functions - general

  • fixed bug in resample when using predict = “train” (issue #1284)
  • update to irace 2.0 – there are algorithmic changes in irace that may affect performance
  • generateFilterValuesData: fixed a bug wrt feature ordering
  • imputeLearner: fixed a bug when data actually contained no NAs
  • print.Learner: if a learner hyperpar was set to value “NA” this was not displayed in printer
  • makeLearner, setHyperPars: if you mistype a learner or hyperpar name, mlr uses fuzzy matching to suggest the 3 closest names in the message
  • tuneParams: tuning with irace is now also parallelized, i.e., different learner configs are evaluated in parallel.
  • benchmark: mini fix, arg ‘learners’ now also accepts class strings
  • object printers: some mlr printers show head previews of data.frames. these now also print info on the total nr of rows and cols and are less confusing
  • aggregations: have better properties now, they know whether they require training or test set evals
  • the filter methods have better R docs
  • filter randomForestSRC.var.select: new arg “method”
  • filter mrmr: fixed some smaller bugs and updated properties
  • generateLearningCurveData: also accepts single learner, does not require a list
  • plotThreshVsPerf: added “measures” arg
  • plotPartialDependence: can create tile plots with joint partial dependence on two features for multiclass classification by facetting across the classes
  • generatePartialDependenceData and generateFunctionalANOVAData: expanded “fun” argument to allow for calculation of weights
  • new “?mlrFamilies” manual page which lists all families and the functions belonging to it
  • we are converging on data.table as a standard internally, this should not change any API behavior on the outside, though
  • generateHyperParsEffectData and plotHyperParsEffect now support more than 2 hyperparameters
  • linear.correlation, rank.correlation, anova.test: use Rfast instead of FSelector/custom implementation now, performance should be much better
  • use of our own colAUC function instead of the ROCR package for AUC calculation to improve performance
  • we output resample performance messages for every iteration now
  • performance improvements for the auc measure
  • createDummyFeatures supports vectors now
  • removed the pretty.names argument from plotHyperParsEffect – labels can be set though normal ggplot2 functions on the returned object
  • Fixed a bad bug in resample, the slot “runtime” or a ResampleResult, when the runtime was measured not in seconds but e.g. mins. R measures then potentially in mins, but mlr claimed it would be seconds.
  • New “dummy” learners (that disregard features completely) can be fitted now for baseline comparisons, see “featureless” learners below.

functions - new

  • filter: randomForest.importance
  • generateFeatureImportanceData: permutation-based feature importance and local importance
  • getFeatureImportanceLearner: new Learner API function
  • getFeatureImportance: top level function to extract feature importance information
  • calculateROCMeasures
  • calculateConfusionMatrix: new confusion-matrix like function that calculates and tables many receiver operator measures
  • makeLearners: create multiple learners at once
  • getLearnerId, getLearnerType, getLearnerPredictType, getLearnerPackages
  • getLearnerParamSet, getLearnerParVals
  • getRRPredictionList
  • addRRMeasure
  • plotResiduals
  • getLearnerShortName
  • mergeBenchmarkResults

functions - renamed

  • Renamed rf.importance filter (now deprecated) to randomForestSRC.var.rfsrc
  • Renamed rf.min.depth filter (now deprecated) to randomForestSRC.var.select
  • Renamed getConfMatrix (now deprecated) to calculateConfusionMatrix
  • Renamed setId (now deprecated) to setLearnerId

functions - removed

  • mergeBenchmarkResultLearner, mergeBenchmarkResultTask

learners - general

  • classif.ada: fixed some param problem with rpart.control params
  • classif.cforest, regr.cforest, surv.cforest: removed parameters “minprob”, “pvalue”, “randomsplits” as these are set internally and cannot be changed by the user
  • regr.GPfit: some more params for correlation kernel
  • classif.xgboost, regr.xgboost: can now properly handle NAs (property was missing and other problems), added “colsample_bylevel” parameter
  • adapted {classif,regr,surv}.ranger parameters for new ranger version

learners - new

  • multilabel.cforest
  • surv.gbm
  • regr.cvglmnet
  • {classif,regr,surv}.gamboost
  • classif.earth
  • {classif,regr}.evtree
  • {classif,regr}.evtree

learners - removed

  • classif.randomForestSRCSyn, regr.randomForestSRCSyn: due to continued stability issues

measures - new

  • ssr, qsr, lsr
  • rrse, rae, mape
  • kappa, wkappa
  • msle, rmsle