`vignettes/tutorial/classifier_calibration.Rmd`

`classifier_calibration.Rmd`

A classifier is “calibrated” when the predicted probability of a class matches the expected frequency of that class. `mlr`

can visualize this by plotting estimated class probabilities (which are discretized) against the observed frequency of said class in the data using `generateCalibrationData()`

and `plotCalibration()`

.

`generateCalibrationData()`

takes as input `Prediction()`

, `ResampleResult()`

, `BenchmarkResult()`

, or a named list of `Prediction()`

or `ResampleResult()`

objects on a classification (multiclass or binary) task with learner(s) that are capable of outputting probabiliites (i.e., learners must be constructed with `predict.type = "prob"`

). The result is an object of class `CalibrationData`

(`generateCalibrationData()`

) which has elements `proportion`

, `data`

, and `task`

. `proportion`

gives the proportion of observations labelled with a given class for each predicted probability bin (e.g., for observations which are predicted to have class “A” with probability \((0, 0.1]\), what is the proportion of said observations which have class “A”?).

```
lrn = makeLearner("classif.rpart", predict.type = "prob")
mod = train(lrn, task = sonar.task)
pred = predict(mod, task = sonar.task)
cal = generateCalibrationData(pred)
cal$proportion
## Learner bin Class Proportion
## 1 prediction (0.1,0.2] M 0.1060606
## 2 prediction (0.7,0.8] M 0.7333333
## 3 prediction [0,0.1] M 0.0000000
## 4 prediction (0.9,1] M 0.9333333
## 5 prediction (0.2,0.3] M 0.2727273
## 6 prediction (0.4,0.5] M 0.4615385
## 7 prediction (0.8,0.9] M 0.0000000
## 8 prediction (0.5,0.6] M 0.0000000
```

The manner in which the predicted probabilities are discretized is controlled by two arguments: `breaks`

and `groups`

. By default `breaks = "Sturges"`

which uses the Sturges algorithm in `graphics::hist()`

. This argument can specify other algorithms available in `graphics::hist()`

, it can be a numeric vector specifying breakpoints for `base::cut()`

, or a single integer specifying the number of bins to create (which are evenly spaced). Alternatively, `groups`

can be set to a positive integer value (by default `groups = NULL`

) in which case `Hmisc::cut2()`

is used to create bins with an approximately equal number of observations in each bin.

```
cal = generateCalibrationData(pred, groups = 3)
cal$proportion
## Learner bin Class Proportion
## 1 prediction [0.000,0.267) M 0.08860759
## 2 prediction [0.267,0.925) M 0.51282051
## 3 prediction [0.925,1.000] M 0.93333333
```

`generateCalibrationData()`

objects can be plotted using `plotCalibration()`

. `plotCalibration()`

by default plots a reference line which shows perfect calibration and a “rag” plot, which is a rug plot on the top and bottom of the graph, where the top pertains to “positive” cases, where the predicted class matches the observed class, and the bottom pertains to “negative” cases, where the predicted class does not match the observed class. Perfect classifier performance would result in all the positive cases clustering in the top right (i.e., the correct classes are predicted with high probability) and the negative cases clustering in the bottom left.

`plotCalibration(cal)`

Because of the discretization of the probabilities, sometimes it is advantageous to smooth the calibration plot. Though `smooth = FALSE`

by default, setting this option to `TRUE`

replaces the estimated proportions with a loess smoother.

```
cal = generateCalibrationData(pred)
plotCalibration(cal, smooth = TRUE)
```

All of the above functionality works with multi-class classification as well.

```
lrns = list(
makeLearner("classif.randomForest", predict.type = "prob"),
makeLearner("classif.nnet", predict.type = "prob", trace = FALSE)
)
mod = lapply(lrns, train, task = iris.task)
pred = lapply(mod, predict, task = iris.task)
names(pred) = c("randomForest", "nnet")
cal = generateCalibrationData(pred, breaks = c(0, .3, .6, 1))
plotCalibration(cal)
```