Allows imputation of missing feature values through various techniques. Note that you have the possibility to re-impute a data set in the same way as the imputation was performed during training. This especially comes in handy during resampling when one wants to perform the same imputation on the test set as on the training set.
The function impute
performs the imputation on a data set and returns,
alongside with the imputed data set, an “ImputationDesc” object
which can contain “learned” coefficients and helpful data.
It can then be passed together with a new data set to reimpute.
The imputation techniques can be specified for certain features or for feature classes, see function arguments.
You can either provide an arbitrary object, use a built-in imputation method listed under imputations or create one yourself using makeImputeMethod.
Arguments
- obj
(data.frame | Task)
Input data.- target
(character)
Name of the column(s) specifying the response. Default ischaracter(0)
.- classes
(named list)
Named list containing imputation techniques for classes of columns. E.g.list(numeric = imputeMedian())
.- cols
(named list)
Named list containing names of imputation methods to impute missing values in the data column referenced by the list element's name. Overrules imputation set viaclasses
.- dummy.classes
(character)
Classes of columns to create dummy columns for. Default ischaracter(0)
.- dummy.cols
(character)
Column names to create dummy columns (containing binary missing indicator) for. Default ischaracter(0)
.- dummy.type
(
character(1)
)
How dummy columns are encoded. Either as 0/1 with type “numeric” or as “factor”. Default is “factor”.- force.dummies
(
logical(1)
)
Force dummy creation even if the respective data column does not contain any NAs. Note that (a) most learners will complain about constant columns created this way but (b) your feature set might be stochastic if you turn this off. Default isFALSE
.- impute.new.levels
(
logical(1)
)
If new, unencountered factor level occur during reimputation, should these be handled as NAs and then be imputed the same way? Default isTRUE
.- recode.factor.levels
(
logical(1)
)
Recode factor levels after reimputation, so they match the respective element oflvls
(in the description object) and therefore match the levels of the feature factor in the training data after imputation?. Default isTRUE
.
Details
The description object contains these slots
target (character): See argument
features (character): Feature names (column names of
data
)classes (character): Feature classes (storage type of
data
)lvls (named list): Mapping of column names of factor features to their levels, including newly created ones during imputation
impute (named list): Mapping of column names to imputation functions
dummies (named list): Mapping of column names to imputation functions
impute.new.levels (
logical(1)
): See argumentrecode.factor.levels (
logical(1)
): See argument
See also
Other impute:
imputations
,
makeImputeMethod()
,
makeImputeWrapper()
,
reimpute()
Examples
df = data.frame(x = c(1, 1, NA), y = factor(c("a", "a", "b")), z = 1:3)
imputed = impute(df, target = character(0), cols = list(x = 99, y = imputeMode()))
print(imputed$data)
#> x y z
#> 1 1 a 1
#> 2 1 a 2
#> 3 99 b 3
reimpute(data.frame(x = NA_real_), imputed$desc)
#> x y z
#> 1 99 a NA