MetaboAnalystR Package

MetaboAnalystR package is synchronized with the MetaboAnalyst website and is designed for metabolomics researchers who are comfortable using R to perform raw spectra processing and batch analysis. An optimized global metabolomics analysis workflow starting from raw spectra has been established. The following tutorials are meant to complement our web-based functions by providing step-by-step instructions for several of the most common tasks using the R package.

1. Overview

1.1 Introduction

MetaboAnalystR 3 contains the R functions and libraries underlying the popular MetaboAnalyst web server, including metabolomic data analysis, visualization, and functional interpretation. The package is synchronized with the MetaboAnalyst web server. After installing and loading the package, users will be able to reproduce the same results from their local computers using the corresponding R command history downloaded from MetaboAnalyst web site, thereby achieving maximum flexibility and reproducibility.

The version 3 aims to improve the current global metabolomics workflow by implementing a fast parameter optimization algorithm for peak picking, and automated identification of the most suitable method for batch effect correction from 12 well-established approaches. In addition, more support for functional interpretation directly from m/z peaks via mummichog2 (PMID: 23861661), and a new pathway-based method to integrate multiomics data has been added. To demonstrate this new functionality, we perform end-to-end metabolomics data analysis on the clinical IBD samples in this tutorial. More case study have been provided in the vignette of this R package. Here we'd prefer to provide a comprehensive starting tutorial from installation of MetaboAnalystR 3.2 (Updated at 16-Dec-2021).

1.2 Installation

Step 1. Install package dependencies

To use MetaboAnalystR 3.2, first install all package dependencies. Ensure that you have necessary system environment configured.

For Linux (e.g. Ubuntu 18.04/20.04): libcairo2-dev, libnetcdf-dev, libxml2, libxt-dev and libssl-dev should be installed at frist;

For Windows (e.g. 7/8/8.1/10): Rtools should be installed.

For Mac OS: In order to compile R for Mac OS, you need Xcode and GNU Fortran compiler installed ( We suggest you follow these steps: to help with your installation.

R base with version > 4.0 is required. The compatibility of latest version (v4.2) is under evaluation. As for installation of package dependencies, there are two options:

Option 1

Enter the R function (metanr_packages) and then use the function. A printed message will appear informing you whether or not any R packages were installed.

Function to download packages:

metanr_packages <- function(){
metr_pkgs <- c("impute", "pcaMethods", "globaltest", "GlobalAncova", "Rgraphviz", "preprocessCore", "genefilter", "SSPA", "sva", "limma", "KEGGgraph", "siggenes","BiocParallel", "MSnbase", "multtest", "RBGL", "edgeR", "fgsea", "devtools", "crmn")
list_installed <- installed.packages()
new_pkgs <- subset(metr_pkgs, !(metr_pkgs %in% list_installed[, "Package"]))
if(length(new_pkgs)!=0){if (!requireNamespace("BiocManager", quietly = TRUE))
        print(c(new_pkgs, " packages added..."))

        print("No new packages added...")

Usage of function:


Option 2

Use the pacman R package (for those with >R 3.5.1).



pacman::p_load(c("impute", "pcaMethods", "globaltest", "GlobalAncova", "Rgraphviz", "preprocessCore", "genefilter", "SSPA", "sva", "limma", "KEGGgraph", "siggenes","BiocParallel", "MSnbase", "multtest", "RBGL", "edgeR", "fgsea"))

Step 2. Install the package

MetaboAnalystR 3.2 is freely available from GitHub. The package documentation, including the vignettes for each module and user manual is available within the downloaded R package file. You can install the MetaboAnalylstR 3.0 via any of the three options: A) using the R package devtools, B) cloning the github, C) manually downloading the .tar.gz file. Note that the MetaboAnalystR 3.2 github will have the most up-to-date version of the package.

Option A) Install the package directly from github using the devtools package. Open R and enter:

Due to issues with Latex, some users may find that they are only able to install MetaboAnalystR 3.2 without any documentation (i.e. vignettes).

# Step 1: Install devtools

# Step 2: Install MetaboAnalystR without documentation
devtools::install_github("xia-lab/MetaboAnalystR", build = TRUE, build_vignettes = FALSE)

# Step 2: Install MetaboAnalystR with documentation
devtools::install_github("xia-lab/MetaboAnalystR", build = TRUE, build_vignettes = TRUE, build_manual =T)

Option B) Install from a pre-built source package

install.packages("", repos = NULL, method = "wget")

Option C) Clone Github and install locally

The * must be replaced by what is actually downloaded and built.

git clone
R CMD build MetaboAnalystR
R CMD INSTALL MetaboAnalystR_3.2.0.tar.gz

2. Case Study IBD

Inflammatory bowel diseases, which include Crohn’s disease and ulcerative colitis, have affected several million individuals around the world. Jason L. et al. have performed a longtitude multiomics study on the role of microbiome in the pathogenesis of IBD. Metabolomics study on the facal samples is introduced here for example case study of this novel parameters optimization pipeline.

2.1 Raw Data Processing

Raw data processing is the first for all metabolomics studies. MetaboAnalystR 3.2 support “.mzXML”, “.mzML” and “.CDF” formats. The original formats (e.g. “.raw” and “.RAW”) generated by vendors will be supported soon. You can convert your original data with ProteoWizard (PMID: 23051804) as the supported formats. Centroided data is preferred.

2.1.1 Load MetaboAnalystR

If you have finished the installation and been ready to use the package. Use the library() function to load the package into R.

# Load the MetaboAnalystR package

2.1.2 Download IBD Example QC Data

The example Quality Control (QC) samples will be used for the next steps. Download the MS data for parameters' optimization at this step. To reach a goal of quicking learning and avoid the long time running for the whole samples (over 600 samples). We only provide 5 samples for each CD and nonIBD group here. If you want to repeat and verify the results in our manuscript, please go IBD MultiOmics Database, download the full batch data and run them using this pipeline.

## Setting the data depositing folder
data_folder_Sample <- "~/Data_IBD"
data_folder_QC <- "~/QC_IBD"  
# Use Google API for data downloading here. 
# Please "install.packages('googledrive')" and "install.packages('httpuv')"first.
temp <- tempfile(fileext = ".zip")
# Please authorize your google account to access the data
dl <- drive_download(
  as_id("10DBpPEWy2cZyXvmlLOIfpwqQYwFplKYK"), path = temp, overwrite = TRUE)
# Setting your own date file folder
out <- unzip(temp, exdir = data_folder_QC)
# Date files for parameters optimization are deposited below
# Now, download the small example data for comparison between CD vs. nonIBD
temp <- tempfile(fileext = ".zip")
dl <- drive_download(
  as_id("1-wlFUkzEwWX1afRWLJlY_KEJs7BfsZim"), path = temp, overwrite = TRUE)
# Setting the date file folder
out <- unzip(temp, exdir = data_folder_Sample)
# Date files for normal processing example are deposited below

2.1.3 Data Inspectation

Before running the data analysis, the general data structure and information can be inspected with PerformDataInspect. If there are some extremly significant contaminats, it will be discovered directly.

# Inspect the MS data via a 3D image. "res" are used to specify the resolution for the MS data.
PerformDataInspect(data_folder_QC,res = 50)
## 0020a_XAV_iHMP2_FFA_PREFA02.mzXML
## [1] "RT range is: 18.0985 and 1139.81 seconds !"
## [1] "MZ range is: 69.999908447266 and 849.902038574219 Thomson !"