Using EMBL‐EBI Services via Web Interface and Programmatically via Web Services

Rodrigo Lopez1, Andrew Cowley1, Weizhong Li1, Hamish McWilliam1

1 EMBL Outstation–European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 3.12
DOI:  10.1002/0471250953.bi0312s48
Online Posting Date:  December, 2014
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The European Bioinformatics Institute (EMBL‐EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages. © 2014 by John Wiley & Sons, Inc.

Keywords: Web Services; Programmatic access; SOAP; REST; analytical pipelines; workflows

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Strategic Planning
  • Basic Protocol 1: Retrieving Data from EMBL‐EBI Using Dbfetch VIA the Web Interface
  • Alternate Protocol 1: Retrieving Data from EMBL‐EBI Using WSDbfetch VIA the Rest Interface
  • Alternate Protocol 2: Retrieving Data from EMBL‐EBI using WSDbfetch VIA Soap Interface
  • Support Protocol 1: Installing Perl Soap Web Services Clients
  • Basic Protocol 2: Sequence Similarity Search Using Fasta Search Via the Web Interface
  • Basic Protocol 3: Sequence Similarity Search Using NCBI BLAST+ Soap Web Services with Perl Client
  • Basic Protocol 4: Iterative Sequence Search Using PSI‐Search Rest Web Services with Perl Client
  • Support Protocol 2: Installing Perl Rest Web Services Clients
  • Basic Protocol 5: Protein Functional Analysis Using Interproscan 5 Soap Web Services with Java Client
  • Support Protocol 3: Installing Java Web Services Clients
  • Basic Protocol 6: Multiple Sequence Alignment Using Clustal Omega Via Web Interface
  • Alternate Protocol 3: Multiple Sequence Alignment Using Clustal Omega Via C#.Net Client
  • Support Protocol 4: Installing C# .Net Web Services Clients
  • Basic Protocol 7: Putting Services Together in a Workflow
  • Guidelines for Understanding Results
  • Acknowledgements
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402.
  Benson, D., Karsch‐Mizrachi, I., Lipman, D.J., Ostell, J., and Wheeler, D.L. 2008. GenBank. Nucleic Acids Res. 36:D25‐D30.
  Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O'Donovan, C., Phan, I., Pilbout, S., and Schneider, M. 2003. The SWISS‐PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31:365‐370.
  Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and Madden, T.L. 2009. BLAST+: Architecture and applications. BMC Bioinformatics 10:421.
  Cochrane, G., Akhtar, R., Aldebert, P., Althorpe, N., Baldwin, A., Bates, K., Bhattacharyya, S., Bonfield, J., and Bower, L. 2007. Priorities for nucleotide trace, sequence and annotation data capture at the Ensembl Trace Archive and the EMBL Nucleotide Sequence Database. Nucleic Acids Res. 36:D5‐D12.
  Crosby, M.A., Goodman, J.L., Strelets, V.B., Zhang, P., Gelbart, W.M., and FlyBase Consortium. 2007. FlyBase: Genomes by the dozen. Nucleic Acids Res. 35:D486‐D491.
  Edman, P., Högfeldt, E., Sillén, L.G., and Kinell, P. 1950. Method for determination of the amino acid sequence in peptides. Acta Chem. Scand. 4:283‐293.
  Flicek, P., Amode, M.R., Barrell, D., Beal, K., Brent, S., Chen, Y., Clapham, P., Coates, G., Fairley, S., Fitzgerald, S., Gordon, L., Hendrix, M., Hourlier, T., Johnson, N., Kähäri, A., Keefe, D., Keenan, S., Kinsella, R., Kokocinski, F., Kulesha, E., Larsson, P., Longden, I., McLaren, W., Overduin, B., Pritchard, B., Riat, H.S., Rios, D., Ritchie, G.R., Ruffier, M., Schuster, M., Sobral, D., Spudich, G., Tang, Y.A., Trevanion, S., Vandrovcova, J., Vilella, A.J., White, S., Wilder, S.P., Zadissa, A., Zamora, J., Aken, B.L., Birney, E., Cunningham, F., Dunham, I., Durbin, R., Fernández‐Suarez, X.M., Herrero, J., Hubbard, T.J., Parker, A., Proctor, G., Vogel, J., and Searle, S.M. 2011. Ensembl 2011. Nucleic Acids Res. 39:D800‐D806.
  Franklin, R.E. 1956. Structure of tobacco mosaic virus: Location of the ribonucleic acid in the tobacco mosaic virus particle. Nature 177:928‐930
  Gonzalez, M.W. and Pearson, W.R. 2010. Homologous over‐extension: A challenge for iterative similarity searches. Nucleic Acids Res. 38:2177‐2189.
  Goujon, M., McWilliam, H., Li, W., Valentin, F., Squizzato, S., Paern, J., and Lopez, R. 2010. A new bioinformatics analysis tools framework at EMBL‐EBI. Nucleic Acids Res. 38:W695‐W699.
  Harris, T.W., Antoshechkin, I., Bieri, T., Blasiar, D., Chan, J., Chen, W.J., De La Cruz, N., Davis, P., Duesbury, M., Fang, R., Fernandes, J., Han, M., Kishore, R., Lee, R., Müller, H.M., Nakamura, C., Ozersky, P., Petcherski, A., Rangarajan, A., Rogers, A., Schindelman, G, Schwarz, E.M., Tuli, M.A., Van Auken, K., Wang, D., Wang, X., Williams, G., Yook, K., Durbin, R., Stein, L.D., Spieth, J., and Sternberg, P.W. 2009. WormBase: A comprehensive resource for nematode research. Nucleic Acids Res. 38:D463‐D467.
  Hernandez, P., Müller, M., and Appel, R.D. 2006. Automated protein identification by tandem mass spectrometry: Issues and strategies. Mass Spectrom. Rev. 25:235‐254.
  Jones, P., Binns, D., Chang, H.Y., Fraser, M., Li, W., McAnulla, C., McWilliam, H., Maslen, J., Mitchell, A., Nuka, G., Pesseat, S., Quinn, A.F., Sangrador‐Vegas, A., Scheremetjew, M., Yong ,S.Y., Lopez, R., and Hunter, S. 2014. InterProScan 5: Genome‐scale protein function classification. Bioinformatics 30:1236‐1240.
  Kersey, P.J., Staines, D.M., Lawson, D., Kulesha, E., Derwent, P., Humphrey, J.C., Hughes, D.S.T., Keenan, S., Kerhornou, A., Koscielny, G., Langridge, N., McDowall, M.D., Megy, K., Maheswari, U., Nuhn, M., Paulini, M., Pedro, H., Toneva, I., Wilson, D., Yates, A., and Birney, E. 2011. Ensembl genomes: An integrative resource for genome‐scale data from non‐vertebrate species. Nucleic Acids Res. 40:D91‐D97.
  Li, W., McWilliam, H., Goujon, M., Cowley, A., Lopez, R., and Pearson, W.R. 2012. PSI‐Search: Iterative HOE‐reduced profile SSEARCH searching. Bioinformatics 28:1650‐1651.
  Lopez, R., Duggan, K., Harte, N., and Kibria, A. 2003 Public services from the European Bioinformatics Institute. Brief. Bioinform. 4:332‐340.
  Marsden, R.L., Lewis, T.A., and Orengo, C.A. 2007. Towards a comprehensive structural coverage of completed genomes: A structural genomics viewpoint. BMC Bioinformatics 8:86.
  McWilliam, H., Valentin, F., Goujon, M., Li, W., Narayanasamy, M., Martin, J., Miyar, T., and Lopez, R. 2009. Web services at the European Bioinformatics Institute‐2009. Nucleic Acids Res. 37:W6‐W10.
  McWilliam, H., Li, W., Uludagi, M., Squizzato, S., Park, Y.M., Buso, N., Cowley, A.P., and Lopez, R. 2013. Analysis tool web services from the EMBL‐EBI. Nucleic Acids Res. 41:W597‐W600.
  Pearson, W.R. 1991. Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith‐Waterman and FASTA algorithms. Genomics 11:635‐650.
  Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444‐2448.
  Pettersson, E., Lundeberg, J., and Ahmadian, A. 2009. Generations of sequencing technologies. Genomics 93:105‐111.
  Roberts, R.J. 1976. Restriction endonucleases. CRC Crit. Rev. Biochem. 4:123‐164.
  Sanger, F. and Coulson, A.R. 1975. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 94:441‐448.
  Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Soding, J., Thompson, J.D., and Higgins, D.G. 2011. Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7:539.
  Tateno, Y., Imanishi, T., Miyazaki, S., Fukami‐Kobayashi, K., Saitou, N., Sugawara, H., and Gojobori, T. 2002. DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res. 30:27‐30.
  UniProt Consortium. 2010. Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res. 39:D214‐D219.
  Valentin, F., Squizzato, S., Goujon, M., McWilliam, H., Paern, J., and Lopez, R. 2010. Fast and efficient searching of biological data resources–using EB‐eye. Brief. Bioinform. 11:375‐384.
  Waterhouse, A.M., Procter, J.B., Martin, D.M.A, Clamp, M., and Barton, G.J. 2009. Jalview Version 2: A multiple sequence alignment editor and analysis workbench. Bioinformatics 25:1189‐1191.
  Wu, C. and Nebert, D.W. 2004. Update on genome completion and annotations: Protein Information Resource. Hum. Genomics 1:229‐233.
PDF or HTML at Wiley Online Library