Enrichment Analysis

Zhqiiang Pang, Jasmine Chong, Jeff Xia

2023-07-24

1. Introduction

The enrichment analysis module performs metabolite set enrichment analysis (MSEA) for human and mammalian species based on several libraries containing ~6300 groups of metabolite sets. Users can upload either 1) a list of compounds, 2) a list of compounds with concentrations, or 3) a concentration table.

2. Enrichment Analysis Workflow

Below we will go over 2 use-cases to perform Enrichment Analysis, the first using as input a list of compounds, and the second as input a concentration table.

2.1 Over representation analysis

We will go over two analysis workflows, the first is when the input is a list to perform over representation analysis. The first step is to create a vector containing a list of compound names. The list will then be cross-referenced (CrossReferencing) against the MetaboAnalyst compound libraries (HMDB, PubChem, KEGG, etc.), and any compounds without a hit will have NA. This step may take long due to downloading of libraries if they do not already exist in your working directory.

library(MetaboAnalystR)

## When input is a list

# Create vector consisting of compounds for enrichment analysis 
tmp.vec <- c("Acetoacetic acid", "Beta-Alanine", "Creatine", "Dimethylglycine", "Fumaric acid", "Glycine", "Homocysteine", "L-Cysteine", "L-Isolucine", "L-Phenylalanine", "L-Serine", "L-Threonine", "L-Tyrosine", "L-Valine", "Phenylpyruvic acid", "Propionic acid", "Pyruvic acid", "Sarcosine", "Arsenic", "Benzene", "Caffeic acid", "Cotinine", "Cadmium", "Lead", "Thiocyanate")

# Create mSetObj
mSet<-InitDataObjects("conc", "msetora", FALSE)
## [1] "MetaboAnalyst R objects initialized ..."
#Set up mSetObj with the list of compounds
mSet<-Setup.MapData(mSet, tmp.vec);

# Cross reference list of compounds against libraries (hmdb, pubchem, chebi, kegg, metlin)
mSet<-CrossReferencing(mSet, "name");
## [1] "Loaded files from MetaboAnalyst web-server."
## [1] "Loaded files from MetaboAnalyst web-server."
## [1] "1"                                                                              
## [2] "Name matching OK, please inspect (and manual correct) the results then proceed."

To view the compound name map to identify any compounds within the uploaded list without hits…

# Example compound name map
mSet$name.map 

$query.vec
 [1] "Acetoacetic acid"   "Beta-Alanine"       "Creatine"           "Dimethylglycine"    "Fumaric acid"      
 [6] "Glycine"            "Homocysteine"       "L-Cysteine"         "L-Isolucine"        "L-Phenylalanine"   
[11] "L-Serine"           "L-Threonine"        "L-Tyrosine"         "L-Valine"           "Phenylpyruvic acid"
[16] "Propionic acid"     "Pyruvic acid"       "Sarcosine"          "Arsenic"            "Benzene"           
[21] "Caffeic acid"       "Cotinine"           "Cadmium"            "Lead"               "Thiocyanate"       

$hit.inx
 [1]  42  40  46  62  88  78 588 446  NA 104 120 109 103 702 131 159 164 185

$hit.values
 [1] "Acetoacetic acid"   "Beta-Alanine"       "Creatine"           "Dimethylglycine"    "Fumaric acid"      
 [6] "Glycine"            "Homocysteine"       "L-Cysteine"         NA                   "L-Phenylalanine"   
[11] "L-Serine"           "L-Threonine"        "L-Tyrosine"         "L-Valine"           "Phenylpyruvic acid"
[16] "Propionic acid"     "Pyruvic acid"       "Sarcosine"         

$match.state
 [1] 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1

Continute with the enrichment analysis…

# Create the mapping results table
mSet<-CreateMappingResultTable(mSet)
## [1] "Loaded files from MetaboAnalyst web-server."
# Input the name of the compound without any matches 
mSet<-PerformDetailMatch(mSet, "L-Isolucine");
## [1] "Loaded files from MetaboAnalyst web-server."
## [1] "Loaded files from MetaboAnalyst web-server."
# Create list of candidates to replace the compound
mSet <- GetCandidateList(mSet);
## [1] "Loaded files from MetaboAnalyst web-server."
# Identify the name of the compound to replace
mSet<-SetCandidate(mSet, "L-Isolucine", "L-Isoleucine");
## [1] "Loaded files from MetaboAnalyst web-server."
# Set the metabolite filter
mSet<-SetMetabolomeFilter(mSet, F);

# Select metabolite set library, refer to 
mSet<-SetCurrentMsetLib(mSet, "smpdb_pathway", 0);

# Calculate hypergeometric score, results table generated in your working directory
mSet<-CalculateHyperScore(mSet)
## [1] "Loaded files from MetaboAnalyst web-server."
# Plot the ORA, bar-graph
mSet<-PlotORA(mSet, "ora_0_", "bar", "png", 72, width=NA)
## Warning in space + width: longer object length is not a multiple of shorter
## object length

## Warning in space + width: longer object length is not a multiple of shorter
## object length
Figure 1. Enrichment Analysis results (ORA).

Figure 1. Enrichment Analysis results (ORA).

2.2 Quantitative Enrichment Analysis

Below, we will go over a second analysis workflow to perform QEA, where the data input is a concentration table consisting of concentrations of 77 urine samples from cancer patients (cachexic vs. control) measured by 1H NMR - Eisner et al. 2010.

# Load MetaboAnalystR and clean environment
library(MetaboAnalystR)
rm(list = ls())

# Create mSetObj
mSet<-InitDataObjects("conc", "msetqea", FALSE)
## Starting Rserve:
##  /opt/R/4.2.2/lib/R/bin/R CMD /opt/R/4.2.2/lib/R/library/Rserve/libs//Rserve --no-save 
## 
## [1] "MetaboAnalyst R objects initialized ..."
# Read in data table
mSet<-Read.TextData(mSet, "https://www.xialab.ca/api/download/metaboanalyst/human_cachexia.csv", "rowu", "disc");

# Perform cross-referencing of compound names
mSet<-CrossReferencing(mSet, "name");
## [1] "Loaded files from MetaboAnalyst web-server."
## [1] "Loaded files from MetaboAnalyst web-server."
## [1] "1"                                                                              
## [2] "Name matching OK, please inspect (and manual correct) the results then proceed."
# Create mapping results table
mSet<-CreateMappingResultTable(mSet)
## [1] "Loaded files from MetaboAnalyst web-server."
# Mandatory check of data 
mSet<-SanityCheckData(mSet);
##  [1] "Successfully passed sanity check!"                                                                                
##  [2] "Samples are not paired."                                                                                          
##  [3] "2 groups were detected in samples."                                                                               
##  [4] "Only English letters, numbers, underscore, hyphen and forward slash (/) are allowed."                             
##  [5] "<font color=\"orange\">Other special characters or punctuations (if any) will be stripped off.</font>"            
##  [6] "All data values are numeric."                                                                                     
##  [7] "A total of 0 (0%) missing values were detected."                                                                  
##  [8] "<u>By default, missing values will be replaced by 1/5 of min positive values of their corresponding variables</u>"
##  [9] "Click the <b>Proceed</b> button if you accept the default practice;"                                              
## [10] "Or click the <b>Missing Values</b> button to use other methods."
# Replace missing values with minimum concentration levels
mSet<-ReplaceMin(mSet);

# Perform no normalization
mSet<-PreparePrenormData(mSet)
mSet<-Normalization(mSet, "NULL", "NULL", "NULL", "PIF_178", ratio=FALSE, ratioNum=20)
## [1] 77 63
# Plot normalization
mSet<-PlotNormSummary(mSet, "norm_0_", "png", 72, width=NA)

# Plot sample-wise normalization
mSet<-PlotSampleNormSummary(mSet, "snorm_0_", "png", 72, width=NA)

# Set the metabolome filter
mSet<-SetMetabolomeFilter(mSet, F);

# Set the metabolite set library to pathway
mSet<-SetCurrentMsetLib(mSet, "smpdb_pathway", 0);

# Calculate the global test score
mSet<-CalculateGlobalTestScore(mSet)
## [1] "Loaded files from MetaboAnalyst web-server."
# Plot the QEA
mSet<-PlotQEA.Overview(mSet, "qea_0_", "bar", "png", 72, width=NA)
## Warning in space + width: longer object length is not a multiple of shorter
## object length

## Warning in space + width: longer object length is not a multiple of shorter
## object length
Figure 2. Enrichment Analysis results (QEA).

Figure 2. Enrichment Analysis results (QEA).