Preprocess data so they can be used as input for train_frm()
.
Usage
preprocess_data(
data,
degree_polynomial = 1,
interaction_terms = FALSE,
verbose = 1,
nw = 1,
rm_near_zero_var = TRUE,
rm_na = TRUE
)
Arguments
- data
Dataframe with columns RT, NAME, SMILES
- degree_polynomial
Defines how many polynomials get added (if 3 quadratic and cubic terms get added).
- interaction_terms
If TRUE all interaction terms get added to data set.
- verbose
0 == no output, 1 == show progress, 2 == show progress and warnings
- nw
number of workers to use for parallel processing
- rm_near_zero_var
A logical value indicating whether to remove near zero variance predictors. Setting this to TRUE can cause the CV results to be overoptimistic, as the variance filtering is done on the whole dataset, i.e. information from the test folds is used for feature selection.
- rm_na
A logical value indicating whether to remove NA values. Setting this to TRUE can cause the CV results to be overoptimistic, as the filtering is done on the whole dataset, i.e. information from the test folds is used for feature selection.
Examples
data <- head(RP, 3) # Only use first three rows to speed up example runtime
pre <- preprocess_data(data, verbose = 0)