metabodecon
metabodecon.Rmd
Introduction to metabodecon
The goal of metabodecon is to make deconvolution and alignment of 1D NMR spectra as easy as possible. The deconvolution part uses the codebase from MetaboDecon1D. The alignment of the deconvoluted spectra is done using functions from the speaq package (Beirnaert et al., 2018) that have been adapted for metabodecon.
Download the example data
The metabodecon
repository contains two example datasets: blood1 and urine.
The blood dataset contains 16 1D CPMG NMR spectra of blood
plasma in Bruker format. The urine dataset contains two 1D
NOESY NMR spectra of urine. These spectra are available in both Bruker
and jcamp-dx format. Due to the size constraints for R packages, these
datasets are not included by default when the package is installed, but
must be explicitly downloaded afterwards via command
download_example_datasets()
.
library(metabodecon)
download_example_datasets(persistent = FALSE)
# Set persistent to TRUE to keep the data after the R session ends
Use metabodecon to convert a spectrum into Lorentz curves
spectrum_data <- generate_lorentz_curves(
data_path = datadir("example_datasets/bruker/blood"),
file_format = "bruker"
)
After calling function generate_lorentz_curves()
, R will
request some interactive input from the user. Note that the data path
i.e. the folder where the samples with their spectra are stored is given
by the data_path
argument. All samples in this folder will
be automatically analyzed. Therefore, you do not have to specify the
individual sample names. But you have to specify which spectrum and
which processing of each sample should be used. The answers you should
give for the provided examples are written in bold:
- What is the name of the subfolder of your filepath? The different spectra of a sample are specified by numbers, here you have to specify which spectrum should be used (e.g. 10 for C:/Users/Username/Desktop/spectra_folder/spectrum_name/10) 10
- What is the name of the subsubsubfolder of your filepath? Each spectrum can be processed with different settings, each of these computations is stored under a different number (e.g. 10 for C:/Users/Username/Desktop/spectra_folder/spectrum_name/10/pdata/10): 10
- In case that more than one spectrum should be analyzed, you will be asked: Do you want to use the same parameters (signal_free_region, range_water_signal_ppm) for all spectra? (y/n) y
- “Test_01” “Test_02” “Test_03” “Test_04” “Test_05” … “Test_16”. Choose number of file which is used to adjust all parameters: (e.g. 1) 1
After providing the required input, metabodecon will show you some plots and ask some more questions regarding the plots. Again, the answers are shown in bold.
- Signal free region borders correct selected? (Area left and right of the green lines) (y/n) y
- Water artefact fully inside red vertical lines? (y/n) y
Now the actual deconvolution will start. The deconvolution of one single spectra usually takes approx. 1-10 minutes.
Look up global max and minimum ppm values
ppm_range <- get_ppm_range(
spectrum_data = spectrum_data
)
Generate matrix of features based on spectrum data
feat <- gen_feat_mat(
data_path = data_path,
ppm_range = ppm_range,
si_size_real_spectrum = 131072, # 1)
scale_factor_x = 1000 # 2)
)
# 1) Specify how many points were used to process the real spectrum? Often
# called "si" inside NMR software (TopSpin).
# 2) A factor which is used to avoid rounding errors due to numbers becoming
# too small for R to handle e.g., 1000.
Start alignment by using speaq package
after_speaq_mat <- speaq_align(
feat = feat,
maxShift = 50 # 1)
)
# 1) Maximum number of points along the "ppm-axis" a value can be moved by the
# speaq package. A value of 50 may be used as start value for plasma spectra.
# However, depending on your spectra and the used digital resolution this value
# may be be adapted.
Further optimize alignment by calling
combine_peaks
Even, after alignment by speaq data of some signals are spread over adjacent columns. Combination of this data is the purpose of the following routine.
<- combine_peaks(
aligned_res shifted_mat = after_speaq_mat,
range = 5, # 1) number of adjacent columns to be used for improving alignment
lower_bound = 1 2) # amount of columns that need to be skipped
= spectrum_data,
spectrum_data = data_path
data_path )
# 1) Number of columns of which data may be combined in one column
# 2) When results from speaq are used first column should be skipped
The returned results after step 5 contain two matrices
aligned_res$long
and aligned_res$short
where
in the short version all columns containing only zeros have been removed
Furthermore, results will be written into two .csv files in your
data_path
directory “aligned_res_short.csv” and
“aligned_res_long.csv”.
References
Beirnaert et al 2018
Beirnaert C, Meysman P, Vu TN, Hermans N, Apers S, Pieters L, et al. (2018) speaq 2.0: A complete workflow for high-throughput 1D NMR spectra processing and quantification. PLoS Comput Biol 14(3): e1006018. https://www.doi.org/ 10.1371/journal.pcbi.1006018