Protein Structure and Function Prediction Using I‐TASSER

Jianyi Yang1, Yang Zhang2

1 School of Mathematical Sciences, Nankai University, Tianjin, 2 Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 5.8
DOI:  10.1002/0471250953.bi0508s52
Online Posting Date:  December, 2015
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


I‐TASSER is a hierarchical protocol for automated protein structure prediction and structure‐based function annotation. Starting from the amino acid sequence of target proteins, I‐TASSER first generates full‐length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic‐level structure refinement. The biological functions of the protein, including ligand‐binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I‐TASSER is freely available as both an on‐line server and a stand‐alone package. This unit describes how to use the I‐TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I‐TASSER modeling quality for distant‐homologous and multi‐domain protein targets. © 2015 by John Wiley & Sons, Inc.

Keywords: protein structure prediction; protein function annotation; I‐TASSER; threading

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Using the I‐TASSER Server
  • Guidelines for Understanding I‐TASSER Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI‐BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389‐3402. doi: 10.1093/nar/25.17.3389.
  Battey, J.N., Kopp, J., Bordoli, L., Read, R.J., Clarke, N.D., and Schwede, T. 2007. Automated server predictions in CASP7. Proteins 69:68‐82. doi: 10.1002/prot.21761.
  Biasini, M., Bienert, S., Waterhouse, A., Arnold, K., Studer, G., Schmidt, T., Kiefer, F., Cassarino, T.G., Bertoni, M., Bordoli, L., and Schwede, T. 2014. SWISS‐MODEL: Modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 42:W252‐258. doi: 10.1093/nar/gku340.
  Blake, J. A. and Harris, M. A. 2008. The gene ontology (GO) project: Structured vocabularies for molecular biology and their application to genome and expression analysis. Curr. Protoc. Bioinformatics 23:7.2:7.2.1‐7.2.9.
  Brylinski, M. and Skolnick, J. 2008. A threading‐based method (FINDSITE) for ligand‐binding site prediction and functional annotation. Proc. Natl. Acad. Sci. U.S.A. 105:129‐134. doi: 10.1073/pnas.0707684105.
  Buchan, D.W., Ward, S.M., Lobley, A.E., Nugent, T.C., Bryson, K., and Jones, D.T. 2010. Protein annotation and modelling servers at University College London. Nucleic Acids Res. 38:W563‐568. doi: 10.1093/nar/gkq427.
  Capra, J.A., Laskowski, R.A., Thornton, J.M., Singh, M., and Funkhouser, T.A. 2009. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput. Biol. 5:e1000585. doi: 10.1371/journal.pcbi.1000585.
  Chen, R. and Weng, Z. 2002. Docking unbound proteins using shape complementarity, desolvation, and electrostatics. Proteins 47:281‐294. doi: 10.1002/prot.10092.
  Dutta, S., M. Berman, H., and F. Bluhm, W. 2007. Using the tools and resources of the RCSB protein data bank. Curr. Protoc. Bioinformatics 20:1.9:1.9.1‐1.9.24.
  Goodsell, D. S. 2005. Representing structural information with RasMol. Curr. Protoc. Bioinformatics 11:5.4:5.4.1‐5.4.23.
  Guerler, A., Govindarajoo, B., and Zhang, Y. 2013. Mapping monomeric threading to protein‐protein structure prediction. J. Chem. Inf. Model 53:717‐725. doi: 10.1021/ci300579r.
  Haas, J., Roth, S., Arnold, K., Kiefer, F., Schmidt, T., Bordoli, L., and Schwede, T. 2013. The protein model portal‐a comprehensive resource for protein structure and model information. Database (Oxford) 2013:bat031. doi: 10.1093/database/bat031.
  Habchi, J., Tompa, P., Longhi, S., and Uversky, V.N. 2014. Introducing protein intrinsic disorder. Chem. Rev. 114:6561‐6588. doi: 10.1021/cr400514h.
  Huang, Y.J., Mao, B., Aramini, J.M., and Montelione, G.T. 2014. Assessment of template‐based protein structure predictions in CASP10. Proteins 82:43‐56. doi: 10.1002/prot.24488.
  Jaroszewski, L., Rychlewski, L., Li, Z., Li, W., and Godzik, A. 2005. FFAS03: A server for profile‐profile sequence alignments. Nucleic Acids Res. 33:W284‐W288. doi: 10.1093/nar/gki418.
  Källberg, M., Wang, H., Wang, S., Peng, J., Wang, Z., Lu, H., and Xu, J. 2012. Template‐based protein structure modeling using the RaptorX web server. Nat. Protoc. 7:1511‐1522. doi: 10.1038/nprot.2012.085.
  Kelley, L.A., Mezulis, S., Yates, C.M., Wass, M.N., and Sternberg, M.J. 2015. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 10:845‐858. doi: 10.1038/nprot.2015.053.
  Kim, D.E., Chivian, D., and Baker, D. 2004. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 32:W526‐531. doi: 10.1093/nar/gkh468.
  Kryshtafovych, A., Fidelis, K., and Moult, J. 2014. CASP10 results compared to those of previous CASP experiments. Proteins 82:164‐174. doi: 10.1002/prot.24448.
  Mills, L. 2014. Common file formats. Curr. Protoc. Bioinformatics 1:1B:A.1B.1‐A.1B.18.
  Moult, J. 2005. A decade of CASP: Progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15:285‐289. doi: 10.1016/
  Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T., and Tramontano, A. 2014. Critical assessment of methods of protein structure prediction (CASP)‐round x. Proteins 82:1‐6. doi: 10.1002/prot.24452.
  Mukherjee, S. and Zhang, Y. 2011. Protein‐protein complex structure predictions by multimeric threading and template recombination. Structure 19:955‐966. doi: 10.1016/j.str.2011.04.006.
  Roy, A. and Zhang, Y. 2012. Recognizing protein‐ligand binding sites by global structural alignment and local geometry refinement. Structure 20:987‐997. doi: 10.1016/j.str.2012.03.009.
  Roy, A., Kucukural, A., and Zhang, Y. 2010. I‐TASSER: A unified platform for automated protein structure and function prediction. Nat. Protoc. 5:725‐738. doi: 10.1038/nprot.2010.5.
  Roy, A., Yang, J., and Zhang, Y. 2012. COFACTOR: An accurate comparative algorithm for structure‐based protein function annotation. Nucleic Acids Res. 40:W471‐477. doi: 10.1093/nar/gks372.
  Sali, A. and Blundell, T.L. 1993. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234:779‐815. doi: 10.1006/jmbi.1993.1626.
  Schmidt, T., Haas, J., Cassarino, T.G., and Schwede, T. 2011. Assessment of ligand‐binding residue predictions in CASP9. Proteins 79:126‐136. doi: 10.1002/prot.23174.
  Soding, J., Biegert, A., and Lupas, A.N. 2005. The HHpred interactive server for protein homology detection and structure prediction. Nucleic. Acids Res. 33:W244‐248. doi: 10.1093/nar/gki408.
  Szilagyi, A. and Zhang, Y. 2014. Template‐based structure prediction of protein‐protein interactions. Curr. Opin. Struc. Biol. 24:10‐23. doi: 10.1016/
  Tovchigrechko, A. and Vakser, I.A. 2005. Development and testing of an automated approach to protein docking. Proteins 60:296‐301. doi: 10.1002/prot.20573.
  Wu, S. and Zhang, Y. 2007. LOMETS: A local meta‐threading‐server for protein structure prediction. Nucl. Acids. Res. 35:3375‐3382. doi: 10.1093/nar/gkm251.
  Wu, S. and Zhang, Y. 2008. MUSTER: Improving protein sequence profile‐profile alignments by using multiple sources of structure information. Proteins 72:547‐556. doi: 10.1002/prot.21945.
  Wu, S., Skolnick, J., and Zhang, Y. 2007. Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol. 5:17. doi: 10.1186/1741-7007-5-17.
  Xu, J. and Zhang, Y. 2010. How significant is a protein structure similarity with TM‐score = 0.5? Bioinformatics 26:889‐895. doi: 10.1093/bioinformatics/btq066.
  Xu, D. and Zhang, Y. 2011. Improving the physical realism and structural accuracy of protein models by a two‐step atomic‐level energy minimization. Biophys J. 101:2525‐2534. doi: 10.1016/j.bpj.2011.10.024.
  Xu, D. and Zhang, Y. 2012. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge‐based force field. Proteins 80:1715‐1735. doi: 10.1002/prot.24105.
  Xu, D., Zhang, J., Roy, A., and Zhang, Y. 2011. Automated protein structure modeling in CASP9 by I‐TASSER pipeline combined with QUARK‐based ab initio folding and FG‐MD‐based structure refinement. Proteins 79:147‐160. doi: 10.1002/prot.23111.
  Xue, Z., Xu, D., Wang, Y., and Zhang, Y. 2013. ThreaDom: Extracting protein domain boundary information from multiple threading alignments. Bioinformatics 29:i247‐i256. doi: 10.1093/bioinformatics/btt209.
  Yang, J. and Zhang, Y. 2015. I‐TASSER server: New development for protein structure and function predictions. Nucleic Acids Res. 43:W174‐181. doi: 10.1093/nar/gkv342.
  Yang, J., Roy, A., and Zhang, Y. 2013a. BioLiP: A semi‐manually curated database for biologically relevant ligand‐protein interactions. Nucleic Acids Res. 41:D1096‐1103. doi: 10.1093/nar/gks966.
  Yang, J., Roy, A., and Zhang, Y. 2013b. Protein‐ligand binding site recognition using complementary binding‐specific substructure comparison and sequence profile alignment. Bioinformatics 29:2588‐2595. doi: 10.1093/bioinformatics/btt447.
  Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., and Zhang, Y. 2015. The I‐TASSER Suite: Protein structure and function prediction. Nat. Meth. 12:7‐8. doi: 10.1038/nmeth.3213.
  Zhang, Y. 2007. Template‐based modeling and free modeling by I‐TASSER in CASP7. Proteins 69:108‐117. doi: 10.1002/prot.21702.
  Zhang, Y. 2008. I‐TASSER server for protein 3D structure prediction. BMC Bioinformatics 9:40. doi: 10.1186/1471-2105-9-40.
  Zhang, Y. 2009. I‐TASSER: Fully automated protein structure prediction in CASP8. Proteins 77:100‐113. doi: 10.1002/prot.22588.
  Zhang, Y. 2014. Interplay of I‐TASSER and QUARK for template‐based and ab initio protein structure prediction in CASP10. Proteins 82:175‐187. doi: 10.1002/prot.24341.
  Zhang, Y. and Skolnick, J. 2004a. Automated structure prediction of weakly homologous proteins on a genomic scale. Proc. Natl. Acad. Sci. USA 101:7594‐7599. doi: 10.1073/pnas.0305695101.
  Zhang, Y. and Skolnick, J. 2004b. SPICKER: A clustering approach to identify near‐native protein folds. J. Comput. Chem. 25:865‐871. doi: 10.1002/jcc.20011.
  Zhang, Y. and Skolnick, J. 2005. TM‐align: A protein structure alignment algorithm based on the TM‐score. Nucleic Acids Res. 33:2302‐2309. doi: 10.1093/nar/gki524.
  Zhang, Y., Kolinski, A., and Skolnick, J. 2003. TOUCHSTONE II: A new approach to ab initio protein structure prediction. Biophys. J. 85:1145‐1164. doi: 10.1016/S0006-3495(03)74551-2.
  Zhang, J., Liang, Y., and Zhang, Y. 2011. Atomic‐level protein structure refinement using fragment‐guided molecular dynamics conformation sampling. Structure 19:1784‐1795. doi: 10.1016/j.str.2011.09.022.
  Zhang, J., Yang, J., Jang, R., and Zhang, Y. 2015. GPCR‐I‐TASSER: A hybrid approach to G protein‐coupled receptor structure modeling and the application to the human genome. Structure 23:1538‐1549. submitted. doi: 10.1016/j.str.2015.06.007.
  Zhou, H.Y. and Skolnick, J. 2007. Ab initio protein structure prediction using Chunk‐TASSER. Biophys. J. 93:1510‐1518. doi: 10.1529/biophysj.107.109959.
PDF or HTML at Wiley Online Library