The metabodecon repository contains a selection of example datasets. This article describes each of these datasets in details, i.e.
- which and how many samples are included
- how they were measured
- how you can access the dataset
For package users, the canonical way to access file-based example
datasets is download_example_datasets().
The Blood dataset
The blood dataset contains 16 one-dimensional CPMG NMR-spectra of human blood plasma in Bruker format. It can be found in folder misc/datasets/blood in the metabodecon repository.
The Urine dataset
The urine dataset contains two one-dimensional NOESY NMR-spectra of urine, available in both Bruker and jcamp-dx format. They can be found in folder misc/datasets/urine in the metabodecon repository.
The Sim dataset
There are scenarios where it is useful to work with simulated datasets instead of real data, such as:
- When you need to know the underlying distribution of the data to check whether a function works as expected.
- To speed up test cases and examples where a few data points are sufficient to test a function.
For such cases, metabodecon includes a simulated dataset
called sim, which was generated by applying the following steps
to each spectrum of the blood dataset:
- Deconvolute spectrum using
generate_lorentz_curves()with default parameters - Extract Lorentz curve parameters for all peaks between 3.52 and 3.37 ppm
- Generate 2048 equidistant chemical shift values between 3.59 and 3.28 ppm1
- Calculate the signal intensity at each chemical shift as superposition of Lorentz curves
- Add random noise to the simulated spectrum 2
The first two of the 16 simulated spectra are plotted below. For further details about the simulation process, see the source code of function simulate_spectrum().
The AKI dataset
The aki dataset contains 106 one-dimensional urine NMR
spectra in Bruker format from the AKI study by Zacharias et al. (2012).
The measured samples come from a clinical cohort collected 24 hours
after surgery and were analyzed to compare patients who developed acute
kidney injury with those who did not. In this dataset, 72 spectra are
labeled as controls (Biopsy kidney normal) and 34 spectra
as AKI cases (Acute Kidney Injury).
Within misc/example_datasets/bruker/aki, the full
phenotypic metadata is stored in aki/s_MTBLS24.txt, and the
corresponding spectra are stored in the remaining sample directories of
the aki folder.
The phenotype table and spectra files originate from the public
MetaboLights study MTBLS24 (https://www.ebi.ac.uk/metabolights/MTBLS24). For
metabodecon, these files are filtered to the relevant
subset (phenodata plus required Bruker files for reading spectra), then
packaged into example_datasets.zip and re-distributed as a
convenience download.
How to download datasets
Due to the size constraints for R packages, most of the above
mentioned datasets are not included by default when the package is
installed, but must be explicitly downloaded afterwards. This can be
done via command download_example_datasets():
library(metabodecon)
# Set persistent = TRUE to store the files at a persistent location. This way,
# the next time you call `download_example_datasets()`, the files will not be
# downloaded again.
path <- download_example_datasets(persistent = FALSE)
tree(path)Spectra that come pre-installed with the package and do not require a separate download, are: