Skip to contents

Preprocess data so they can be used as input for train_frm().

Usage

preprocess_data(
  data,
  degree_polynomial = 1,
  interaction_terms = FALSE,
  verbose = 1,
  nw = 1,
  rm_near_zero_var = TRUE,
  rm_na = TRUE
)

Arguments

data

Dataframe with columns RT, NAME, SMILES

degree_polynomial

Defines how many polynomials get added (if 3 quadratic and cubic terms get added).

interaction_terms

If TRUE all interaction terms get added to data set.

verbose

0 == no output, 1 == show progress, 2 == show progress and warnings

nw

number of workers to use for parallel processing

rm_near_zero_var

A logical value indicating whether to remove near zero variance predictors. Setting this to TRUE can cause the CV results to be overoptimistic, as the variance filtering is done on the whole dataset, i.e. information from the test folds is used for feature selection.

rm_na

A logical value indicating whether to remove NA values. Setting this to TRUE can cause the CV results to be overoptimistic, as the filtering is done on the whole dataset, i.e. information from the test folds is used for feature selection.

Value

A dataframe with the preprocessed data

Examples

data <- head(RP, 3) # Only use first three rows to speed up example runtime
pre <- preprocess_data(data, verbose = 0)