First, calls generateFilterValuesData.
Features are then selected via select
and val
.
Usage
filterFeatures(
task,
method = "FSelectorRcpp_information.gain",
fval = NULL,
perc = NULL,
abs = NULL,
threshold = NULL,
fun = NULL,
fun.args = NULL,
mandatory.feat = NULL,
select.method = NULL,
base.methods = NULL,
cache = FALSE,
...
)
Arguments
- task
(Task)
The task.- method
(
character(1)
)
See listFilterMethods. Default is “FSelectorRcpp_information.gain”.- fval
(FilterValues)
Result of generateFilterValuesData. If you pass this, the filter values in the object are used for feature filtering.method
and...
are ignored then. Default isNULL
and not used.- perc
(
numeric(1)
)
If set, selectperc
*100 top scoring features.perc = 1
means to select all features.Mutually exclusive with arguments
abs,
thresholdand
fun`.- abs
(
numeric(1)
)
If set, selectabs
top scoring features. Mutually exclusive with argumentsperc
,threshold
andfun
.- threshold
(
numeric(1)
)
If set, select features whose score exceedsthreshold
. Mutually exclusive with argumentsperc
,abs
andfun
.- fun
(
function
)
If set, select features via a custom thresholding function, which must return the number of top scoring features to select. Mutually exclusive with argumentsperc
,abs
andthreshold
.- fun.args
(any)
Arguments passed to the custom thresholding function.- mandatory.feat
(character)
Mandatory features which are always included regardless of their scores- select.method
If multiple methods are supplied in argument
method
, specify the method that is used for the final subsetting.- base.methods
If
method
is an ensemble filter, specify the base filter methods which the ensemble method will use.- cache
(
character(1)
| logical)
Whether to use caching during filter value creation. See details.- ...
(any)
Passed down to selected filter method.
Value
Task.
Caching
If cache = TRUE
, the default mlr cache directory is used to cache
filter values. The directory is operating system dependent and can be
checked with getCacheDir()
.
The default cache can be cleared with deleteCacheDir()
.
Alternatively, a custom directory can be passed to store the cache.
Note that caching is not thread safe. It will work for parallel computation on many systems, but there is no guarantee.
Simple and ensemble filters
Besides passing (multiple) simple filter methods you can also pass an
ensemble filter method (in a list). The ensemble method will use the simple
methods to calculate its ranking. See listFilterEnsembleMethods()
for
available ensemble methods.
See also
Other filter:
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
makeFilter()
,
plotFilterValues()
Examples
# simple filter
filterFeatures(iris.task, method = "FSelectorRcpp_gain.ratio", abs = 2)
#> Supervised task: iris-example
#> Type: classif
#> Target: Species
#> Observations: 150
#> Features:
#> numerics factors ordered functionals
#> 2 0 0 0
#> Missings: FALSE
#> Has weights: FALSE
#> Has blocking: FALSE
#> Has coordinates: FALSE
#> Classes: 3
#> setosa versicolor virginica
#> 50 50 50
#> Positive class: NA
# ensemble filter
filterFeatures(iris.task, method = "E-min",
base.methods = c("FSelectorRcpp_gain.ratio",
"FSelectorRcpp_information.gain"), abs = 2)
#> Supervised task: iris-example
#> Type: classif
#> Target: Species
#> Observations: 150
#> Features:
#> numerics factors ordered functionals
#> 2 0 0 0
#> Missings: FALSE
#> Has weights: FALSE
#> Has blocking: FALSE
#> Has coordinates: FALSE
#> Classes: 3
#> setosa versicolor virginica
#> 50 50 50
#> Positive class: NA