R/FeatSelControlGA.R, and 3 more
Feature selection method used by selectFeatures.
The methods used here follow a wrapper approach, described in Kohavi and John (1997) (see references).
The following optimization algorithms are available:
Exhaustive search. All feature sets (up to a certain number
max.features) are searched.
Random search. Features vectors are randomly drawn,
up to a certain number of features
A feature is included in the current set with probability
So we are basically drawing (0,1)-membership-vectors, where each element
Deterministic forward or backward search. That means extending
(forward) or shrinking (backward) a feature set.
Depending on the given
method different approaches are taken.
sfs Sequential Forward Search: Starting from an empty model, in each step the feature increasing
the performance measure the most is added to the model.
sbs Sequential Backward Search: Starting from a model with all features, in each step the feature
decreasing the performance measure the least is removed from the model.
sffs Sequential Floating Forward Search: Starting from an empty model, in each step the algorithm
chooses the best model from all models with one additional feature and from all models with one
sfbs Sequential Floating Backward Search: Similar to
sffs but starting with a full model.
Search via genetic algorithm.
The GA is a simple (
lambda) or (
depending on the
A comma strategy selects a new population of size
mu out of the
A plus strategy uses the joint pool of
mu parents and
mu new candidates.
Out of those
mu features, the new
lambda features are generated
by randomly choosing pairs of parents. These are crossed over and
represents the probability of choosing a feature from the first parent instead of
the second parent.
The resulting offspring is mutated, i.e., its bits are flipped with
max.features is set, offspring are
repeatedly generated until the setting is satisfied.
makeFeatSelControlExhaustive(same.resampling.instance = TRUE, maxit = NA_integer_, max.features = NA_integer_, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default") makeFeatSelControlGA(same.resampling.instance = TRUE, impute.val = NULL, maxit = NA_integer_, max.features = NA_integer_, comma = FALSE, mu = 10L, lambda, crossover.rate = 0.5, mutation.rate = 0.05, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default") makeFeatSelControlRandom(same.resampling.instance = TRUE, maxit = 100L, max.features = NA_integer_, prob = 0.5, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default") makeFeatSelControlSequential(same.resampling.instance = TRUE, impute.val = NULL, method, alpha = 0.01, beta = -0.001, maxit = NA_integer_, max.features = NA_integer_, tune.threshold = FALSE, tune.threshold.args = list(), log.fun = "default")
(FeatSelControl). The specific subclass is one of FeatSelControlExhaustive, FeatSelControlRandom, FeatSelControlSequential, FeatSelControlGA.
Ron Kohavi and George H. John,
Wrappers for feature subset selection, Artificial Intelligence Volume 97, 1997, 273-324.