Finding Pathogenic Nucleic Acid Sequences in Next Generation Sequencing Data

Michael Parfenov1, J.G. Seidman1

1 Harvard Medical School, Boston, Massachusetts
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 18.9
DOI:  10.1002/0471142905.hg1809s86
Online Posting Date:  July, 2015
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Viruses and bacteria are established as one of the main causes of human diseases from hepatitis to cancer. Recently, the presence of such pathogens has been extensively studied using human whole genome and transcriptome sequencing data. However, detecting and studying pathogens via next generation sequencing data is a challenging task in terms of time and computational resources. In this protocol we give instructions for a simple and quick method to find pathogenic DNA or RNA and detect possible integration of the pathogen genome into the host genome. © 2015 by John Wiley & Sons, Inc.

Keywords: next generation sequencing; pathogens; viruses; bacteria; integration; detection

PDF or HTML at Wiley Online Library

Table of Contents

  • Commentary
  • Literature Cited
  • Figures
PDF or HTML at Wiley Online Library


Basic Protocol 1:

  • Unix operating system (for a beginner's guide see Stein, )
  • Burrows‐Wheeler Aligner (BWA) 0.5.9rc1 (r1561)
  • Samtools 0.1.19‐44428 cd
  • bamUtil
  • Perl v5.10.1
  • Integrative Genomics Viewer (IGV)
  • Analysis scripts written in Perl and C
  • Reference human genome hg19 in FASTA format:
  • Database of reference viral genomes is provided. To update the database or to create a customized database one could download reference genomes from the NCBI database:
    • Viral genomes:
    • Bacterial genomes:
  • Database of reference pathogen genomes in FASTA format
  • Paired‐end DNA sequencing data in FASTQ format (sample.01.fastq and sample.02.fastq)
NOTE: A computing cluster is recommended.NOTE: Later or earlier versions of BWA aligner should be tested.
PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Bhaduri, A., Qu, K., Lee, C.S., Ungewickell, A., and Khavari, P.A. 2012. Rapid identification of non‐human sequences in high‐throughput sequencing datasets. Bioinformatics 28:1174‐1175.
  Chen, Y., Yao, H., Thompson, E.J., Tannir, N.M., Weinstein, J.N., and Su, X. 2013. VirusSeq: Software to identify viruses and their integration sites using next‐generation sequencing of human cancer tissue. Bioinformatics 29:266‐267.
  Heckenberg, S.G., Brouwer, M.C., and van de Beek, D. 2014. Bacterial meningitis. Handb. Clin. Neurol. 121:1361‐1375.
  Kostic, A.D., Ojesina, A.I., Pedamallu, C.S., Jung, J., Verhaak, R.G., Getz, G., and Meyerson, M. 2011. PathSeq: Software to identify or discover microbes by deep sequencing of human tissue. Nat. Biotechnol. 29:393‐396.
  Li, J.W., Wan, R., Yu, C.S., Co, N.N., Wong, N., and Chan, T.F. 2013. ViralFusionSeq: Accurately discover viral integration events and reconstruct fusion transcripts at single‐base resolution. Bioinformatics 29:649‐651.
  McCullers, J.A. 2013. Do specific virus‐bacteria pairings drive clinical outcomes of pneumonia? Clin. Microbiol. Infect. 19:113‐118.
  Oh, J.K. and Weiderpass, E. 2014. Infection and cancer: Global distribution and burden of diseases. Ann. Glob. Health 80:384‐392.
  Rautava, J. and Syrjänen, S. 2012. Biology of human papillomavirus infections in head and neck carcinogenesis. Head Neck Pathol. 6:S3‐S15.
  Stein, L.D. 2007. Unix survival guide. Curr. Protoc. Bioinform. 16:A.1C.1‐A.1C.24.
  Wang, Q., Jia, P., and Zhao, Z. 2013. VirusFinder: Software for efficient and accurate detection of viruses and their integration sites in host genomes through next generation sequencing data. PLoS One 8:e64465.
  Wentzensen, N., Vinokurova, S., and von Knebel Doeberitz, M. 2004. Systematic review of genomic integration sites of human papillomavirus genomes in epithelial dysplasia and invasive cancer of the female lower genital tract. Cancer Res. 64:3878‐3884.
  Werdan, K., Dietz, S., Löffler, B., Niemann, S., Bushnaq, H., Silber, R.E., Peters, G., and Müller‐Werdan, U. 2014. Mechanisms of infective endocarditis: Pathogen‐host interaction and risk states. Nat. Rev. Cardiol. 11:35‐50.
PDF or HTML at Wiley Online Library