Generate feature importance.

Estimate how important individual features or groups of features are by contrasting prediction performances. For method “permutation.importance” compute the change in performance from permuting the values of a feature (or a group of features) and compare that to the predictions made on the unmcuted data.

Usage

generateFeatureImportanceData(
  task,
  method = "permutation.importance",
  learner,
  features = getTaskFeatureNames(task),
  interaction = FALSE,
  measure,
  contrast = function(x, y) x - y,
  aggregation = mean,
  nmc = 50L,
  replace = TRUE,
  local = FALSE,
  show.info = FALSE
)

Arguments

task: (Task)
The task.
method: (character(1))
The method used to compute the feature importance. The only method available is “permutation.importance”. Default is “permutation.importance”.
learner: (Learner | character(1))
The learner. If you pass a string the learner will be created via makeLearner.
features: (character)
The features to compute the importance of. The default is all of the features contained in the Task.
interaction: (logical(1))
Whether to compute the importance of the features argument jointly. For method = "permutation.importance" this entails permuting the values of all features together and then contrasting the performance with that of the performance without the features being permuted. The default is FALSE.
measure: (Measure)
Performance measure. Default is the first measure used in the benchmark experiment.
contrast: (function)
A difference function that takes a numeric vector and returns a numeric vector of the same length. The default is element-wise difference between the vectors.
aggregation: (function)
A function which aggregates the differences. This function must take a numeric vector and return a numeric vector of length 1. The default is mean.
nmc: (integer(1))
The number of Monte-Carlo iterations to use in computing the feature importance. If nmc == -1 and method = "permutation.importance" then all permutations of the features are used. The default is 50.
replace: (logical(1))
Whether or not to sample the feature values with or without replacement. The default is TRUE.
local: (logical(1))
Whether to compute the per-observation importance. The default is FALSE.
show.info: (logical(1))
Whether progress output (feature name, time elapsed) should be displayed.

Value

(FeatureImportance). A named list which contains the computed feature importance and the input arguments.

Object members:

res: (data.frame)
Has columns for each feature or combination of features (colon separated) for which the importance is computed. A row coresponds to importance of the feature specified in the column for the target.
interaction: (logical(1))
Whether or not the importance of the features was computed jointly rather than individually.
measure: (Measure)

The measure used to compute performance.

contrast: (function)
The function used to compare the performance of predictions.
aggregation: (function)
The function which is used to aggregate the contrast between the performance of predictions across Monte-Carlo iterations.
replace: (logical(1))
Whether or not, when method = "permutation.importance", the feature values are sampled with replacement.
nmc: (integer(1))
The number of Monte-Carlo iterations used to compute the feature importance. When nmc == -1 and method = "permutation.importance" all permutations are used.
local: (logical(1))
Whether observation-specific importance is computed for the features.

References

Jerome Friedman; Greedy Function Approximation: A Gradient Boosting Machine, Annals of Statistics, Vol. 29, No. 5 (Oct., 2001), pp. 1189-1232.

Examples


lrn = makeLearner("classif.rpart", predict.type = "prob")
fit = train(lrn, iris.task)
imp = generateFeatureImportanceData(iris.task, "permutation.importance",
  lrn, "Petal.Width", nmc = 10L, local = TRUE)

Usage

Arguments

Value

References

See also

Examples