Computational Methods for Human Microbiome Analysis

Matthieu J. Miossec1, Sandro L. Valenzuela1, Katterinne N. Mendez1, Eduardo Castro‐Nallar1

1 Center for Bioinformatics and Integrative Biology, Faculty of Biological Sciences, Universidad Andrés Bello, Santiago
Publication Name:  Current Protocols in Microbiology
Unit Number:  Unit 1E.14
DOI:  10.1002/cpmc.41
Online Posting Date:  November, 2017
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

As the field of microbiomics advances, the burden of computational work that scientists need to perform in order to extract biological insight has grown accordingly. Likewise, while human microbiome analyses are increasingly shifting toward a greater integration of various high‐throughput data types, a core number of methods form the basis of nearly every study. In this unit, we present step‐by‐step protocols for five core stages of human microbiome research. The protocols presented in this unit provide a base case for human microbiome analysis, complete with sufficient detail for researchers to tailor certain aspects of the protocols to the specificities of their data. © 2017 by John Wiley & Sons, Inc.

Keywords: alpha and beta diversity; differential abundance testing; human microbiome; metagenome decontamination; metagenome reads; read mapping

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Loading the Example Data and Performing Quality Control
  • Basic Protocol 2: Decontaminating Reads by Mapping Them to the Human Genome Using Bowtie2
  • Basic Protocol 3: Setting Up a Local Microbial Reference Library and Mapping Metagenome Reads to the References Using Pathoscope 2.0
  • Basic Protocol 4: Exploratory Data Analysis Using PathoStat
  • Basic Protocol 5: Testing Differential Abundance in Bacteria Using Phyloseq
  • Commentary
  • Literature Cited
  • Figures
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

Basic Protocol 1: Loading the Example Data and Performing Quality Control

  Materials
  • To run programs that are essential to later protocols (i.e., Bowtie2, Pathoscope 2.0) a high‐performance computer (HPC) cluster running on Linux is strongly recommended. Programs must be run from the command line terminal corresponding to the login node of the cluster. For this protocol, FastQC and PRINSEQ must be installed. The tools and their installation guidelines are available at the following URLs:
  • FastQC, v.0.11.5+:
  • https://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc
  • PRINSEQ Lite v.0.20.4+:
  • https://sourceforge.net/projects/prinseq/files/standalone/
  • Additionally, to acquire the 14 examples stored at NCBI's Sequence Read Archive (SRA), NCBI SRA Toolkit must be installed and configured. The toolkit [SRA Toolkit (v.2.8.2‐1)] and its documentation are available at https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software

Basic Protocol 2: Decontaminating Reads by Mapping Them to the Human Genome Using Bowtie2

  Materials
  • To run Bowtie2, a high‐performance computer (HPC) cluster running on Linux is strongly recommended. The user must install Bowtie2 and the tool twoBitToFa, following the installation guidelines available at:
  • Bowtie2, v.2.2.9+: http://bowtie‐bio.sourceforge.net/bowtie2/index.shtml
  • twoBitToFa: https://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa

Basic Protocol 3: Setting Up a Local Microbial Reference Library and Mapping Metagenome Reads to the References Using Pathoscope 2.0

  Materials List
  • To run PathoScope 2.0 a high‐performance computer (HPC) cluster running on Linux is strongly recommended. To build a local library of bacterial reference genomes, the user must install pasteTaxID and Bowtie2 (if it is not already the case) and PathoScope 2.0 (for mapping) using the installation guidelines available at the following URLs:
  • pasteTaxID: https://github.com/microgenomics/pasteTaxID
  • Bowtie2, v.2.2.9+: http://bowtie‐bio.sourceforge.net/bowtie2/index.shtml
  • PathoScope 2.0: https://github.com/PathoScope/PathoScope
  • The mapping of metagenome reads will be performed against reference sequences downloaded from NCBI Reference Sequence Database (RefSeq). Below are the steps necessary to download and format the required files.

Basic Protocol 4: Exploratory Data Analysis Using PathoStat

  Materials
  • The user must install the statistical software environment R and from the console thus provided download the PathoStat package available through Bioconductor. The R package and installation guidelines for PathoStat are available at the following URLs:
  • R (v.3.3.1):
  • https://www.r‐project.org/
  • PathoStat:
  • https://www.bioconductor.org/packages/release/bioc/html/PathoStat.html

Basic Protocol 5: Testing Differential Abundance in Bacteria Using Phyloseq

  Materials
  • The user must install the statistical software environment R and, from the console thus provided, download the DEseq2, Phyloseq, and ggplot2 packages available through Bioconductor and R‐CRAN at the following URLs:
  • R (v.3.4.0):
  • https://www.r‐project.org/
  • Independently of R, the user must also download parseMethods available at:
  • parseMethods:
  • https://github.com/microgenomics/parseMethods
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

Videos

Literature Cited

Literature Cited
  Andrews, S. (2010). FastQC: A quality control tool for high throughput sequence data. Available at http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
  Bik, E. M. (2016). The hoops, hopes, and hypes of human microbiome research. Yale Journal of Biology and Medicine, 89, 363–373.
  Castro‐Nallar, E., Shen, Y., Freishtat, R. J., Pérez‐Losada, M., Manimaran, S., Liu, G., … Crandall, K. A. (2015). Integrating microbial and host transcriptomics to characterize asthma‐associated microbial communities. BMC Medical Genomics, 8, 50. doi: 10.1186/s12920‐015‐0121‐1.
  Francis, O. E., Bendall, M., Manimaran, S., Hong, C., Clement, N. L., Castro‐Nallar, E., … Johnson, W. E. (2013). Pathoscope: Species identification and strain attribution with unassembled sequencing data. Genome Research, 23, 1721–1729. doi: 10.1101/gr.150151.112.
  Hong, C., Manimaran, S., Shen, Y., Perez‐Rogers, J. F., Byrd, A. L., Castro‐Nallar, E., … Johnson, W. E. (2014). PathoScope 2.0: A complete computational framework for strain identification in environmental or clinical sequencing samples. Microbiome, 2, 33. Available at https://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4164323&tool=pmcentrez&rendertype=abstract doi: 10.1186/2049‐2618‐2‐33.
  Langmead, B., & Salzberg, S. L. (2012). Fast gapped‐read alignment with Bowtie 2. Nature Methods, 9, 357–359. doi: 10.1038/nmeth.1923.
  Manimaran, S., Bendall, M., Valenzuela, S., Castro, E., Faits, T., & Johnson, W. E. (2016). PathoStat: PathoStat statistical microbiome analysis package. R package version 1.2.0. Available at http://bioconductor.org/packages/release/bioc/html/PathoStat.html.
  McMurdie, P. J., & Holmes, S. (2013). Phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One, 8 doi: 10.1371/journal.pone.0061217.
  Schmieder, R., & Edwards, R. (2011). Quality control and preprocessing of metagenomic datasets. Bioinformatics, 27, 863–864. doi: 10.1093/bioinformatics/btr026.
  The Human Microbiome Project. (2012). Structure, function and diversity of the healthy human microbiome. Nature, 486, 207–214. Available at https://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3564958&tool=pmcentrez&rendertype=abstract doi: 10.1038/nature11234.
  Young, V. B. (2017). The role of the microbiome in human health and disease: An introduction for clinicians. BMJ, 356, j831. Available at https://www.bmj.com/lookup/doi/10.1136/bmj.j831 doi: 10.1136/bmj.j831.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library