A classifier is “calibrated” when the predicted probability of a class matches the expected frequency of that class.
mlr can visualize this by plotting estimated class probabilities (which are discretized) against the observed frequency of said class in the data using
generateCalibrationData() takes as input
BenchmarkResult(), or a named list of
ResampleResult() objects on a classification (multiclass or binary) task with learner(s) that are capable of outputting probabiliites (i.e., learners must be constructed with
predict.type = "prob"). The result is an object of class
generateCalibrationData()) which has elements
proportion gives the proportion of observations labelled with a given class for each predicted probability bin (e.g., for observations which are predicted to have class “A” with probability \((0, 0.1]\), what is the proportion of said observations which have class “A”?).
lrn = makeLearner("classif.rpart", predict.type = "prob") mod = train(lrn, task = sonar.task) pred = predict(mod, task = sonar.task) cal = generateCalibrationData(pred) cal$proportion ## Learner bin Class Proportion ## 1 prediction (0.1,0.2] M 0.1060606 ## 2 prediction (0.7,0.8] M 0.7333333 ## 3 prediction [0,0.1] M 0.0000000 ## 4 prediction (0.9,1] M 0.9333333 ## 5 prediction (0.2,0.3] M 0.2727273 ## 6 prediction (0.4,0.5] M 0.4615385 ## 7 prediction (0.8,0.9] M 0.0000000 ## 8 prediction (0.5,0.6] M 0.0000000
The manner in which the predicted probabilities are discretized is controlled by two arguments:
groups. By default
breaks = "Sturges" which uses the Sturges algorithm in
graphics::hist(). This argument can specify other algorithms available in
graphics::hist(), it can be a numeric vector specifying breakpoints for
base::cut(), or a single integer specifying the number of bins to create (which are evenly spaced). Alternatively,
groups can be set to a positive integer value (by default
groups = NULL) in which case
Hmisc::cut2() is used to create bins with an approximately equal number of observations in each bin.
cal = generateCalibrationData(pred, groups = 3) cal$proportion ## Learner bin Class Proportion ## 1 prediction [0.000,0.267) M 0.08860759 ## 2 prediction [0.267,0.925) M 0.51282051 ## 3 prediction [0.925,1.000] M 0.93333333
generateCalibrationData() objects can be plotted using
plotCalibration() by default plots a reference line which shows perfect calibration and a “rag” plot, which is a rug plot on the top and bottom of the graph, where the top pertains to “positive” cases, where the predicted class matches the observed class, and the bottom pertains to “negative” cases, where the predicted class does not match the observed class. Perfect classifier performance would result in all the positive cases clustering in the top right (i.e., the correct classes are predicted with high probability) and the negative cases clustering in the bottom left.
Because of the discretization of the probabilities, sometimes it is advantageous to smooth the calibration plot. Though
smooth = FALSE by default, setting this option to
TRUE replaces the estimated proportions with a loess smoother.
cal = generateCalibrationData(pred) plotCalibration(cal, smooth = TRUE) ## `geom_smooth()` using formula 'y ~ x'
All of the above functionality works with multi-class classification as well.
lrns = list( makeLearner("classif.randomForest", predict.type = "prob"), makeLearner("classif.nnet", predict.type = "prob", trace = FALSE) ) mod = lapply(lrns, train, task = iris.task) pred = lapply(mod, predict, task = iris.task) names(pred) = c("randomForest", "nnet") cal = generateCalibrationData(pred, breaks = c(0, .3, .6, 1)) plotCalibration(cal)