Feature selection method used by selectFeatures.
The methods used here follow a wrapper approach, described in Kohavi and John (1997) (see references).

The following optimization algorithms are available:

FeatSelControlExhaustive

Exhaustive search. All feature sets (up to a certain number of features max.features) are searched.

FeatSelControlRandom

Random search. Features vectors are randomly drawn, up to a certain number of features max.features. A feature is included in the current set with probability prob. So we are basically drawing (0,1)-membership-vectors, where each element is Bernoulli(prob) distributed.

FeatSelControlSequential

Deterministic forward or backward search. That means extending (forward) or shrinking (backward) a feature set. Depending on the given method different approaches are taken.
sfs Sequential Forward Search: Starting from an empty model, in each step the feature increasing the performance measure the most is added to the model.
sbs Sequential Backward Search: Starting from a model with all features, in each step the feature decreasing the performance measure the least is removed from the model.
sffs Sequential Floating Forward Search: Starting from an empty model, in each step the algorithm chooses the best model from all models with one additional feature and from all models with one feature less.
sfbs Sequential Floating Backward Search: Similar to sffs but starting with a full model.

FeatSelControlGA

Search via genetic algorithm. The GA is a simple (mu, lambda) or (mu + lambda) algorithm, depending on the comma setting. A comma strategy selects a new population of size mu out of the lambda > mu offspring. A plus strategy uses the joint pool of mu parents and lambda offspring for selecting mu new candidates. Out of those mu features, the new lambda features are generated by randomly choosing pairs of parents. These are crossed over and crossover.rate represents the probability of choosing a feature from the first parent instead of the second parent. The resulting offspring is mutated, i.e., its bits are flipped with probability mutation.rate. If max.features is set, offspring are repeatedly generated until the setting is satisfied.

makeFeatSelControlExhaustive(
  same.resampling.instance = TRUE,
  maxit = NA_integer_,
  max.features = NA_integer_,
  tune.threshold = FALSE,
  tune.threshold.args = list(),
  log.fun = "default"
)

makeFeatSelControlGA(
  same.resampling.instance = TRUE,
  impute.val = NULL,
  maxit = NA_integer_,
  max.features = NA_integer_,
  comma = FALSE,
  mu = 10L,
  lambda,
  crossover.rate = 0.5,
  mutation.rate = 0.05,
  tune.threshold = FALSE,
  tune.threshold.args = list(),
  log.fun = "default"
)

makeFeatSelControlRandom(
  same.resampling.instance = TRUE,
  maxit = 100L,
  max.features = NA_integer_,
  prob = 0.5,
  tune.threshold = FALSE,
  tune.threshold.args = list(),
  log.fun = "default"
)

makeFeatSelControlSequential(
  same.resampling.instance = TRUE,
  impute.val = NULL,
  method,
  alpha = 0.01,
  beta = -0.001,
  maxit = NA_integer_,
  max.features = NA_integer_,
  tune.threshold = FALSE,
  tune.threshold.args = list(),
  log.fun = "default"
)

Arguments

same.resampling.instance

(logical(1))
Should the same resampling instance be used for all evaluations to reduce variance? Default is TRUE.

maxit

(integer(1))
Maximal number of iterations. Note, that this is usually not equal to the number of function evaluations.

max.features

(integer(1))
Maximal number of features.

tune.threshold

(logical(1))
Should the threshold be tuned for the measure at hand, after each feature set evaluation, via tuneThreshold? Only works for classification if the predict type is “prob”. Default is FALSE.

tune.threshold.args

(list)
Further arguments for threshold tuning that are passed down to tuneThreshold. Default is none.

log.fun

(function | character(1))
Function used for logging. If set to “default” (the default), the evaluated design points, the resulting performances, and the runtime will be reported. If set to “memory” the memory usage for each evaluation will also be displayed, with character(1) small increase in run time. Otherwise character(1) function with arguments learner, resampling, measures, par.set, control, opt.path, dob, x, y, remove.nas, stage and prev.stage is expected. The default displays the performance measures, the time needed for evaluating, the currently used memory and the max memory ever used before (the latter two both taken from gc). See the implementation for details.

impute.val

(numeric)
If something goes wrong during optimization (e.g. the learner crashes), this value is fed back to the tuner, so the tuning algorithm does not abort. Imputation is only active if on.learner.error is configured not to stop in configureMlr. It is not stored in the optimization path, an NA and a corresponding error message are logged instead. Note that this value is later multiplied by -1 for maximization measures internally, so you need to enter a larger positive value for maximization here as well. Default is the worst obtainable value of the performance measure you optimize for when you aggregate by mean value, or Inf instead. For multi-criteria optimization pass a vector of imputation values, one for each of your measures, in the same order as your measures.

comma

(logical(1))
Parameter of the GA feature selection, indicating whether to use a (mu, lambda) or (mu + lambda) GA. The default is FALSE.

mu

(integer(1))
Parameter of the GA feature selection. Size of the parent population.

lambda

(integer(1))
Parameter of the GA feature selection. Size of the children population (should be smaller or equal to mu).

crossover.rate

(numeric(1))
Parameter of the GA feature selection. Probability of choosing a bit from the first parent within the crossover mutation.

mutation.rate

(numeric(1))
Parameter of the GA feature selection. Probability of flipping a feature bit, i.e. switch between selecting / deselecting a feature.

prob

(numeric(1))
Parameter of the random feature selection. Probability of choosing a feature.

method

(character(1))
Parameter of the sequential feature selection. A character representing the method. Possible values are sfs (forward search), sbs (backward search), sffs (floating forward search) and sfbs (floating backward search).

alpha

(numeric(1))
Parameter of the sequential feature selection. Minimal required value of improvement difference for a forward / adding step. Default is 0.01.

beta

(numeric(1))
Parameter of the sequential feature selection. Minimal required value of improvement difference for a backward / removing step. Negative values imply that you allow a slight decrease for the removal of a feature. Default is -0.001.

Value

(FeatSelControl). The specific subclass is one of FeatSelControlExhaustive, FeatSelControlRandom, FeatSelControlSequential, FeatSelControlGA.

References

Ron Kohavi and George H. John, Wrappers for feature subset selection, Artificial Intelligence Volume 97, 1997, 273-324. http://ai.stanford.edu/~ronnyk/wrappersPrint.pdf.

See also