metabodecon

Introduction to metabodecon

The goal of metabodecon is to make deconvolution and alignment of 1D NMR spectra as easy as possible. The deconvolution part uses the codebase from MetaboDecon1D. The alignment of the deconvoluted spectra is done using functions from the speaq package (Beirnaert et al., 2018) that have been adapted for metabodecon.

Download the example data

The metabodecon repository contains two example datasets: blood ¹ and urine. The blood dataset contains 16 1D CPMG NMR spectra of blood plasma in Bruker format. The urine dataset contains two 1D NOESY NMR spectra of urine. These spectra are available in both Bruker and jcamp-dx format. Due to the size constraints for R packages, these datasets are not included by default when the package is installed, but must be explicitly downloaded afterwards via command download_example_datasets().

library(metabodecon)
download_example_datasets(persistent = FALSE)
# Set persistent to TRUE to keep the data after the R session ends

Use metabodecon to convert a spectrum into Lorentz curves

spectrum_data <- generate_lorentz_curves(
  data_path = datadir("example_datasets/bruker/blood"),
  file_format = "bruker"
)

After calling function generate_lorentz_curves(), R will request some interactive input from the user. Note that the data path i.e. the folder where the samples with their spectra are stored is given by the data_path argument. All samples in this folder will be automatically analyzed. Therefore, you do not have to specify the individual sample names. But you have to specify which spectrum and which processing of each sample should be used. The answers you should give for the provided examples are written in bold:

What is the name of the subfolder of your filepath? The different spectra of a sample are specified by numbers, here you have to specify which spectrum should be used (e.g. 10 for C:/Users/Username/Desktop/spectra_folder/spectrum_name/10) 10
What is the name of the subsubsubfolder of your filepath? Each spectrum can be processed with different settings, each of these computations is stored under a different number (e.g. 10 for C:/Users/Username/Desktop/spectra_folder/spectrum_name/10/pdata/10): 10
In case that more than one spectrum should be analyzed, you will be asked: Do you want to use the same parameters (signal_free_region, range_water_signal_ppm) for all spectra? (y/n) y
“Test_01” “Test_02” “Test_03” “Test_04” “Test_05” … “Test_16”. Choose number of file which is used to adjust all parameters: (e.g. 1) 1

After providing the required input, metabodecon will show you some plots and ask some more questions regarding the plots. Again, the answers are shown in bold.

Signal free region borders correct selected? (Area left and right of the green lines) (y/n) y
Water artefact fully inside red vertical lines? (y/n) y

Now the actual deconvolution will start. The deconvolution of one single spectra usually takes approx. 1-10 minutes.

Look up global max and minimum ppm values

ppm_range <- get_ppm_range(
  spectrum_data = spectrum_data
)

Generate matrix of features based on spectrum data

feat <- gen_feat_mat(
  data_path = data_path,
  ppm_range = ppm_range,
  si_size_real_spectrum = 131072, # 1)
  scale_factor_x = 1000 # 2)
)

# 1) Specify how many points were used to process the real spectrum? Often
# called "si" inside NMR software (TopSpin).
# 2) A factor which is used to avoid rounding errors due to numbers becoming
# too small for R to handle e.g., 1000.

Start alignment by using speaq package

after_speaq_mat <- speaq_align(
  feat = feat,
  maxShift = 50 # 1)
)

# 1) Maximum number of points along the "ppm-axis" a value can be moved by the
# speaq package. A value of 50 may be used as start value for plasma spectra.
# However, depending on your spectra and the used digital resolution this value
# may be be adapted.

Further optimize alignment by calling `combine_peaks`

Even, after alignment by speaq data of some signals are spread over adjacent columns. Combination of this data is the purpose of the following routine.

aligned_res <- combine_peaks(
  shifted_mat = after_speaq_mat,
  range = 5, # 1) number of adjacent columns to be used for improving alignment
  lower_bound = 1 2) # amount of columns that need to be skipped
  spectrum_data = spectrum_data,
  data_path = data_path
)

# 1) Number of columns of which data may be combined in one column
# 2) When results from speaq are used first column should be skipped

The returned results after step 5 contain two matrices aligned_res$long and aligned_res$short where in the short version all columns containing only zeros have been removed Furthermore, results will be written into two .csv files in your data_path directory “aligned_res_short.csv” and “aligned_res_long.csv”.

References

Beirnaert et al 2018

Beirnaert C, Meysman P, Vu TN, Hermans N, Apers S, Pieters L, et al. (2018) speaq 2.0: A complete workflow for high-throughput 1D NMR spectra processing and quantification. PLoS Comput Biol 14(3): e1006018. https://www.doi.org/ 10.1371/journal.pcbi.1006018