Changelog
Source:NEWS.md
FastRet 1.4.0
Chemical-descriptor cache:
The chemical-descriptor cache is now a pure on-disk SQLite database (
CDs.sqlite) instead of an in-memory R option backed byCDs.rds. On first use the shipped database is copied into the per-user cache directory (tools::R_user_dir("FastRet", "cache")) and all reads and writes go there. Because the cache lives on disk, descriptors computed in one process are immediately visible to every other process and worker, so parallel training and the GUI’s background workers now share a single cache. Concurrent access is made safe via SQLite WAL mode andINSERT OR IGNORE.getCDs(),preprocess_data(),train_frm(),predict.frm()andadjust_frm()gained acacheargument. Withcache = TRUE(default) descriptors are read from and written to the on-disk cache; withcache = FALSEthe cache is bypassed and every descriptor is recomputed via rCDK (useful for benchmarking the uncached runtime).Removed the internal
FastRet.cachedCDsoption and theN_SMI_CACHEDconstant. CDK /rcdklibs>= 2.9 is required (already enforced at GUI startup bycheck_cdk_version()), as it provides all descriptors inCDFeatures.
GUI:
Reworked the model-adjustment controls in the Adjust tab. The old “Components of linear model” checkbox group (which let users toggle the
RT,RT^2,RT^3,log(RT),exp(RT), andsqrt(RT)predictors of a linear adjustment model) has been replaced by an “Adjustment method” radio button that lets users pick the model family used for the correction: Lasso (recommended), Linear model, or XGBoost. Lasso and XGBoost adjust using the base retention time together with the molecular descriptors and can therefore capture compound-specific shifts, whereas the linear model fits a simple straight-line correction from the base retention time only. The GUI now passesadj_typetoadjust_frm()accordingly; the deprecated RT-transformation predictors are no longer exposed.Promoted the four workflows to dedicated navbar tabs (Train, Select, Adjust, Predict) instead of selecting them from a single drop-down. Each tab shows a short label with the full mode description on hover, and carries only its own controls and results. The Privacy Policy, Contact, and About pages remain as separate tabs.
Simplified the Train sidebar to the essentials: data upload, method (XGBoost/Lasso), seed, and the console log. The “Show advanced settings” toggle and the “Preprocessing Options” checkbox group were removed – GUI training now always uses the package defaults of
train_frm().GUI-trained models are now reproducible by default. The training seed previously defaulted to the current system time and was never actually passed to
train_frm(); it now defaults to a fixed value (42) and is applied, so training the same data twice yields identical models. The seed remains user-editable.The console log is now always visible in every mode (the per-mode “Show console logs” checkboxes were removed), including a new console-log box for Selective Measuring.
The Selective Measuring tab now exposes the selection variant (named as in the paper: SMmax, SM1, SM0, SMinf, with an info button explaining each) and the random seed, so users can pick the variant and reproduce a selection from the GUI.
Reviewed the GUI help texts for clarity, tone and formatting, and reworded the prediction console output so it no longer implies that descriptors are recomputed when they are served from the cache.
Each tab’s sidebar now starts with a heading (the mode name) and a help button that explains, in a few sentences, what the mode is for and when to use it.
Testing:
- Added
shinytest2end-to-end GUI tests (tests/testthat/test-gui-e2e.R) that drive the real app via headless Chrome and exercise every mode and model type (Train: Lasso + XGBoost; Selective Measuring; Predict; Adjust: Lasso + Linear model + XGBoost). They are skipped on CRAN and when Chrome/Java are unavailable.
Documentation:
- Rewrote the GUI Usage vignette to match the new navbar-tab layout and the simplified sidebars, with up-to-date field/button names and a fresh screenshot of every mode (Train/Select/Adjust/Predict). The screenshots are regenerated reproducibly via
misc/scripts/make-gui-screenshots.R.
FastRet 1.3.7
Bugfix:
- Removed a leftover
toscutil::stub()development call from the cross-validation path oftrain_frm(). The call had no effect on results (it only assignedpredict.frmargument defaults into a function environment that was never read again), but it madetoscutil— aSuggestspackage — a hard runtime dependency. As a result, model training failed with “there is no package called ‘toscutil’” wheneverSuggestspackages were not installed (e.g. a plain CRAN install).
Dependencies:
- Now requires
future (>= 1.40.0).start_gui()useswith(future::plan(...), local = TRUE), which relies on thewith.FutureStrategyListmethod introduced in future 1.40.0.
FastRet 1.3.6
Documentation:
- Improved the GUI documentation: corrected the description of the Selective Measuring algorithm (PAM / k-medoids clustering, not k-means), added step-by-step instructions and a Java/CDK troubleshooting note to the
GUI-Usagevignette, and fixed minor wording in the GUI help texts.
FastRet 1.3.5
Bugfix:
-
predict.frm()now imputes missing/non-finite values not only for base model predictors, but also for adjustment model predictors. This preventsNApredictions if the adjustment model depends on predictors that are missing/non-finite in the new data.
Internal Improvements:
- Consolidated the imputation logic in
predict.frm()into a shared internal helper.
FastRet 1.3.4
Internal Improvements:
-
start_gui()andstart_gui_in_devmode()now rely onwith(future::plan(...), local = TRUE)so that temporary future plans are restored automatically without manual bookkeeping.
FastRet 1.3.3
API Improvements:
-
selective_measuring()accepts"max"and"inf"as additional values for itsrt_coefparameter:-
"max"is an alias for the existing"max_ridge_coef" -
"inf"sets all chemical descriptor features to zero before clustering so that RT alone drives the distance metric (i.e., it is “infinitely” more important than the chemical descriptors).
-
FastRet 1.3.2
Bugfix:
-
adjust_frm()automatically switched to “lm” adjustment ifpredictorscontained only one predictor, regardless of whetheradd_cdswas TRUE or FALSE. This has now been fixed, so that “lasso”, “ridge” and “gbtree” adjustment is now possible if eitheradd_cdsis TRUE,add_cdsisNULLorpredictorscontains more than one predictor.
FastRet 1.3.1
API Improvements:
- Added arguments
match_rtsandmatch_keystoadjust_frm():- If
match_rts=TRUE(default), RTs are obtained by matching rows innew_datato rows infrm$dfbased onmatch_keys. - If
match_rts=FALSE, RTs are obtained by applying the base model tonew_data. -
match_keyscan be any combination of “INCHIKEY”, “SMILES” and “NAME”. If left at default NULL, SMILES+INCHIKEY is used if both columns are present in the adjusted and the original training data. Otherwise, SMILES+NAME is used.
- If
FastRet 1.3.0
CRAN release: 2025-12-17
API Improvements:
-
getCDs():- Calculation of CDs is skipped, if all CDs are already present in the input dataframe.
- In addition to a dataframe with column “SMILES”, a plain character vector of SMILES strings can now be provided as input.
- Improved progress output.
- More Smiles are now pre-cached internally to speed up retrieval of CDs for large datasets.
-
plot_frm():- Now available as exported function (the function existed before, but was only exposed via the Graphical User Interface). Now users can call it directly from R scripts.
- Added semi-transparent background to legends for better readability
- Changed the point character from unfilled circles with colored borders to filled circles with black borders for better visibility.
- Constant predictions (causing correlation to be NA) do no longer cause a warning.
-
preprocess_data():- Added argument
add_cdsto control whether chemical descriptors should be added to the input data usinggetCDs(). - Added argument
rm_ucsto control whether unsupported columns (i.e. columns that are neither mandatory nor optional) should be removed from the input data. - Added argument
rt_termsto control whether transformations of the RT column (square, cube, log, exp, sqrt) should be added to the input data. - In case of missing mandatory columns (SMILES, RT, NAME) an error is now raised.
- INCHIKEY and Chemical Descriptors listed in
CDFeaturesare allowed as optional columns. - Columns that are neither mandatory nor optional are automatically removed.
- Improved runtime for generation of polynomial features and/or interaction terms.
- Removal of NA and/or near-zero-variance predictors is now done after adding polynomial features and/or interaction terms.
- Improved progress output.
- Added argument
-
train_frm():- Added argument
do_cvto control whether cross-validation should be performed for performance estimation. Default is TRUE. - Removal of near-zero-variance predictors and/or removal of NA values is now done as part of the internal model training, i.e. it happens separately for each fold during cross-validation. This prevents data leakage from the training set to the validation set. The corresponding hint about “overoptimistic cross-validation results” has consequently been removed from the documentation.
- Argument
methodnow accepts two values for training models with xgbtree base: “gbtreeDefault” (train xgboost with default params) and “gbtreeRP” (train xgboost with parameters optimized for the RP dataset). The old value “gbtree” still works and is now an alias for “gbtreeDefault”. - Improved documentation of the return value (i.e.
frmobjects are fully specified now). - Added type checking for each user input.
- Performance estimation via cross-validation now uses the new clipping mechanism provided by
clip_predictions(). Of course, the clipping is always based on the RT range of training folds, not the whole original training data. - Calculating proportion of variance explained (R²) no longer throws a warning for constant predictions (causing correlation to be NA). Instead, R² is set to 0 in such cases.
- Added argument
-
predict.frm():- Data transformations applied to the training data (adding polynomial features and/or adding interaction terms) are now automatically applied to new data as well. This was not the case before, leading to errors when trying to predict RTs using models trained with
degree_polynomial>1and/orinteraction_terms=TRUE, unless the transformations were manually applied to the new data beforehand. - Added argument
clipto allow clipping of predictions to be within the RT range of the training data. Works for both adjusted and unadjusted models. - Predictions are now clipped to be within a sensible range by default. To produce unclipped predictions, set
clip=FALSE. Seeclip_predictions()for details. - If a chemical descriptor is NA in the new data, but required by the model, the NA value is now replaced by the mean of that descriptor in the training data. Previously, the prediction for such entries was NA. The old behavior can be restored by setting
impute=FALSEinpredict.frm().
- Data transformations applied to the training data (adding polynomial features and/or adding interaction terms) are now automatically applied to new data as well. This was not the case before, leading to errors when trying to predict RTs using models trained with
-
selective_measuring():- Added argument
rt_coef, allowing user to control the influence of RT on the clustering. A value of 0 means that RT is ignored, a value of “max_ridge_coefficient” means that RT has the same weight as the most important chemical descriptor and a value of 1 means no scaling at all (except standardization to z-scores, which is applied before to the whole dataset before the ridge regression is trained).
- Added argument
-
adjust_frm():- Added argument
seedto allow reproducible results. - Added argument
do_cvto control whether cross-validation should be performed for performance estimation. Default is TRUE. - Added argument
adj_typeto control which model should be trained for adjustment: supported options are “lm”, “lasso”, “ridge”, or “gbtree”. Previously, only “lm” was supported. To stay backwards compatible, the default is “lm”. - Added argument
add_cdsto control whether chemical descriptors should be added to the input data usinggetCDs(). Only recommended for adj_type other than “lm”. - Added support for mapping by SMILES+INCHIKEY in addition to SMILES+NAME. SMILES+INCHIKEY is used by default if both columns are present in the adjusted and the original training data. Otherwise SMILES+NAME is used as before.
- Improved error handling. Previously, unmappable entries in the new data had been ignored silently. Now, an error is raised in such cases.
- Function arguments are now stored in the returned frm object for better reproducibility.
- Mapping is now performed by matching each new entry to the average RT of all original training entries with the same key (SMILES+INCHIKEY or SMILES+NAME). Example: if the new dataset contains a key twice, and the original training data contains the key three times, both new entries are mapped to the average RT of the three original entries.
- Performance estimation via cross-validation now uses the new clipping mechanism provided by
clip_predictions(). Of course, the clipping is always based on the RT range of training folds, not the whole original training data. - Calculating proportion of variance explained (R²) no longer throws a warning for constant predictions (causing correlation to be NA). Instead, R² is set to 0 in such cases.
- Added argument
-
print.frm():- frm objects can now be printed directly to the console in a user-friendly format.
-
clip_predictions():- New utility function for clipping predicted RTs to be within a sensible range. Used internally by
train_frm(),predict.frm()andadjust_frm().
- New utility function for clipping predicted RTs to be within a sensible range. Used internally by
-
get_predictors():- Added arguments
baseandadjustto control whether predictors for the base model, the adjustment model or both should be returned.
- Added arguments
Bugfixes:
- Interaction terms generated by
preprocess_data()are now generated correctly as product of the involved features instead of a division. This follows common practice in regression modeling and avoids division by zero issues. Passing older models, trained with division-based interaction terms, to downstream functions likepredict.frm()oradjust_frm()will now lead to an error. (This is not a breaking change, aspredict.frm()and friends have in fact never been able to handle such models). -
plot_frm()with type “scatter.cv.adj” or “scatter.train.adj” now correctly shows retention times from the new data (used for model adjustment) as x-axis values instead of the original training retention times. -
catf()now only emits escape codes (i.e. colored output), it the output is directed to a terminal. If the output is redirected to a file or a pipe, no escape codes are emitted anymore. Sincecatf()is used throughout the package for logging, this fixes the output for the whole package.
Internal Improvements:
- Added or improved unit tests for:
adjust_frm()fit_gbtree()fit_glmnet()get_param_grid()get_predictors()getCDs()plot_frm()predict_frm()preprocess_data()selective_measuring()train_frm()validate_inputdata()
- Removed
caretdependency by adding custom implementations for:createFolds()nearZeroVar()
- Extract mapping and merging part of
adjust_frm()into a private functionmerge_dfs(). - Replaced
fit_glmnet(),fit_lasso()andfit_ridge()with a single functionfit_glmnet(), that takes the method (“lasso” or “ridge”) as parameter. Instead of a dataframedfthat has to contain only predictors plus the RT column (as response), the function now takes a matrix of predictorsXand a vector of responsesy. This makes the function more flexible and easier to test. - Replaced
fit_gbtree_grid()with a much simpler functionfind_params_best(). Instead of allowing the specification of every grid parameter, the new function instead accepts a keywordsearchspacefor specifying predefined grids to choose from. - Improved
fit_gbtreeby exposing lots of hardcoded internal xgboost parameters as function parameters with sensible defaults. In particular, the user can now setxparto “default”, “rpopt” or a predefined grid-size to train the model with different hyperparameter settings. Furthermore, the function is now written in a way that works with both, version 1.7.9.1 and the new 3.1.2.1 version published on 2025/12/03 (yes, version 2.x was skipped completely). - Added helper function
get_param_grid()for returning predefined hyperparameter grids for xgboost model training based on keywords like “tiny”, “small” or “large”. - Added function
benchmark_find_params()to benchmark runtime offind_params_best()for different numbers of cores and/or threads. As it turns out, choosing a higher number of cores is usually more efficient (at the cost of worse progress output). - Added utility functions
named(),as_str(),is_valid_smiles()andas_canonical()
FastRet 1.2.2
- Improved
selective_measuring()by aligning glmnet coefficients to columns by name (more stable) and by including RT, scaled bymax(abs(coefs)), in PAM clustering. - Added
libwebp-devas dependency to Dockerfile.
FastRet 1.2.1
- Add updated Measurements
Measurements_v8.xlsxtoinst/extdata/. The new list contains corrections to the oldRPdataset plus 1660 new measurements measured on a total of 18 different chromatographic environments. - Reintroduced RAM caching (although hugely simplified).
FastRet 1.2.0
- Added
seedparameter toselective_measuring()function for reproducible clustering results - Enhanced documentation for
train_frm()function - Removed
digestandshinybusydependencies - Major refactoring of caching system and related functions
- Removed mock files from
inst/mockdata/ - Removed objects:
getCDsFor1Molecule(),get_cache_dir(),ram_cache(these were exported, but declared as internal) - Added private function
parLapply2 - Added comprehensive GitHub Copilot instructions file
- Improved code organization and documentation across multiple R files
FastRet 1.1.5
Improved
read_retip_hilic_data(): the dataset is now only downloaded from GitHub if the package is not installed. If it is installed, the dataset is loaded directly.-
Internal Changes:
- Removed
TODOS.md - Bumped version to 1.1.5
- Moved all data related functions from
util.Rtodata.R - Added a README to
misc/datasets - Added functions
load_all()anddocument()toutil.R - Replaced
xlsxandreadxlpackages withopenxlsx
- Removed
FastRet 1.1.4
CRAN release: 2025-02-10
Added a cache cleanup handler that gets registered via
reg.finalizer()upon package loading to ensure that the cache directory is removed if it doesn’t contain any files that should persist between R sessions.Added an article about installation details incl. a troubleshooting section
Improved function docs
Improved examples by removing
donttestblocksImproved examples & tests by using smaller example datasets to reduce runtime
FastRet 1.1.3
CRAN release: 2024-06-25
Moved
patch.Rfrom theRfolder tomisc/scripts, which is excluded from the package build using.Rbuildignore. The file is conditionally sourced by the private functionstart_gui_in_devmode()if available, allowing its use during development without including it in the package.Added
\valuetags to the mentioned.Rdfiles describing the functions’ return values.Added Bonini et al. (2020) doi:10.1021/acs.analchem.9b05765 as reference to the description part of the DESCRIPTION file, listing it as Related work. This reference is used in the documentation for
read_retip_hilic_data()andram_cache. No additional references are used in the package documentation.Added Fadi Fadil as a contributor. Fadi measured the example datasets shipped with FastRet.
Added ORCID IDs for contributors as described in CRAN’s checklist for submissions.
FastRet 1.1.2
- Wrapped examples of
read_rp_xlsx()andread_rpadj_xlsx()intodonttestto prevent note “Examples with CPU time > 2.5 times elapsed time: …”. By now I’m pretty sure the culprit is thexlsxpackage, which uses a java process for reading the file. Maybe we should switch to openxlsx or readxl in the future.
FastRet 1.1.1
- Improved examples of
preprocess_data()to prevent note “Examples with CPU time > 2.5 times elapsed time: preprocess_data (CPU=2.772, elapsed=0.788)”.
FastRet 1.1.0
- Added RAM caching to
getCDs()
FastRet 1.0.3
Added examples to
start_gui(),fastret_app(),getCDsFor1Molecule(),analyzeCDNames(),check_lm_suitabilitym(),plot_lm_suitability(),extendedTask(),selective_measuring(),train_frm(),adjust_frm(),get_predictors()Improved lots of existing examples
Added additional logging messages at various places
-
Submitted to CRAN, but rejected because the following examples caused at least one of the following notes on the CRAN testing machines: (1) “CPU time > 5s”, (2) “CPU time > 2.5 times elapsed time”. In this context, “CPU time” is calculated as the sum of the measured “user” and “system” times.
function user system elapsed ratio check_lm_suitability 5.667 0.248 2.211 2.675 predict.frm 2.477 0.112 0.763 3.393 getCDs 2.745 0.089 0.961 2.949
FastRet 1.0.0
Completely refactored source code, e.g.:
Added a test suite covering all important functions
The UI now uses Extended Tasks for background processing, allowing GUI usage by multiple users at the same time
The clustering now uses Partitioning Around Medoids (PAM) instead of k-means, which is faster and much better suited for our use case
The training of the Lasso and/or XGBoost models is no longer done using
caretbut usingglmnetandxgboostdirectly. The new implementation is much faster and allows for full control over the number of workers started.Function
getCDsnow caches the results on Disk, making the retrieval of chemical descriptors much fasterThe GUI now has a console element, showing the progress of the background tasks like clustering and model training
The GUI has a cleaner interface, because lots of the options are now hidden in the “Advanced” tab by default and are only displayed upon user request
FastRet 0.99.3
- Reduce required R version in DESCRIPTION from 4.2 to 4.1
- Added Dockerfile
- Fixed R CMD check warnings
- Fixed R CMD check action
FastRet 0.99.2
- Added documentation website at: https://spang-lab.github.io/FastRet/
FastRet 0.99.1
-
Initial version.
Copy of commit
cd243aa82a56df405df8060b84535633cf06b692of Christian Amesöders Repository. (Christian wrote this initial version of FastRet as part of his master thesis at the Institute of functional Genomics, University of Regensburg).