Skip to contents

Summarizes a data.frame, somewhat differently than the normal summary function of R. The function is mainly useful as a basic EDA tool on data.frames before they are converted to tasks, but can be used on tasks as well.

Columns can be of type numeric, integer, logical, factor, or character. Characters and logicals will be treated as factors.

Usage

summarizeColumns(obj)

Arguments

obj

(data.frame | Task)
Input data.

Value

(data.frame). With columns:

name

Name of column.

type

Data type of column.

na

Number of NAs in column.

disp

Measure of dispersion, for numerics and integers sd is used, for categorical columns the qualitative variation.

mean

Mean value of column, NA for categorical columns.

median

Median value of column, NA for categorical columns.

mad

MAD of column, NA for categorical columns.

min

Minimal value of column, for categorical columns the size of the smallest category.

max

Maximal value of column, for categorical columns the size of the largest category.

nlevs

For categorical columns, the number of factor levels, NA else.

Examples

summarizeColumns(iris)
#>           name    type na     mean      disp median     mad  min  max nlevs
#> 1 Sepal.Length numeric  0 5.843333 0.8280661   5.80 1.03782  4.3  7.9     0
#> 2  Sepal.Width numeric  0 3.057333 0.4358663   3.00 0.44478  2.0  4.4     0
#> 3 Petal.Length numeric  0 3.758000 1.7652982   4.35 1.85325  1.0  6.9     0
#> 4  Petal.Width numeric  0 1.199333 0.7622377   1.30 1.03782  0.1  2.5     0
#> 5      Species  factor  0       NA 0.6666667     NA      NA 50.0 50.0     3