Alternative Splicing Signatures in RNA‐seq Data: Percent Spliced in (PSI)

Sebastian Schafer1, Kui Miao2, Craig C. Benson3, Matthias Heinig4, Stuart A. Cook5, Norbert Hubner6

1 National Heart Center Singapore, 2 Duke‐National University of Singapore, 3 Division of Cardiovascular Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, 4 Present address: Institute of Computational Biology, Helmholtz Zentrum München, Neuerberg, 5 National Heart and Lung Institute, Imperial College London, London, 6 Charité‐Universitätsmedizin, Berlin
Publication Name:  Current Protocols in Human Genetics
Unit Number:  Unit 11.16
DOI:  10.1002/0471142905.hg1116s87
Online Posting Date:  October, 2015
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


Thousands of alternative exons are spliced out of messenger RNA to increase protein diversity. High‐throughput sequencing of short cDNA fragments (RNA‐seq) generates a genome‐wide snapshot of these post‐transcriptional processes. RNA‐seq reads yield insights into the regulation of alternative splicing by revealing the usage of known or unknown splice sites as well as the expression level of exons. Constitutive exons are never covered by split alignments, whereas alternative exonic parts are located within highly expressed splicing junctions. The ratio between reads including or excluding exons, also known as percent spliced in index (PSI), indicates how efficiently sequences of interest are spliced into transcripts. This protocol describes a method to calculate the PSI without prior knowledge of splicing patterns. It provides a quantitative, global assessment of exon usage that can be integrated with other tools that identify differential isoform processing. Novel, complex splicing events along a genetic locus can be visualized in an exon‐centric manner and compared across conditions. © 2015 by John Wiley & Sons, Inc.

Keywords: alternative splicing; RNA‐seq; percent spliced in; PSI; transcript processing; isoform expression

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Strategic Planning
  • Basic Protocol 1: Calculation of Percent Spliced in for Annotated Exons
  • Alternate Protocol 1: Calculation of Percent Spliced in for Annotated Exons with the Script
  • Support Protocol 1: Create an Exonic Part Annotation Based on UCSC Gene Annotation Files
  • Support Protocol 2: Reformat STAR Aligner Output Data to Calculate the PSI
  • Commentary
  • Figures
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

Literature Cited
  Anders, S., Reyes, A., and Huber, W. 2012. Detecting differential usage of exons from RNA‐seq data. Genome Res. 22:2008‐2017. doi: 10.1101/gr.133744.111.
  Cunningham, F., Amode, M.R., Barrell, D., Beal, K., Billis, K., Brent, S., Carvalho‐Silva, D., Clapham, P., Coates, G., Fitzgerald, S., Gil, L., García, G.C., Gordon, L., Hourlier, T., Hunt, S.E., Janacek, S.H., Johnson, N., Juettemann, T., Kähäri, A.K., Keenan, S., Martin, F.J., Maurel, T., McLaren, W., Murphy, D.N., Nag, R., Overduin, B., Parker, A., Patricio, M., Perry, E., Pignatelli, M., Riat, H.S., Sheppard, D., Taylor, K., Thormann, A., Vullo, A., Wilder, S.P., Zadissa, A., Aken, B.L., Birney, E., Harrow, J., Kinsella, R., Muffato, M., Ruffier, M., Searle, S.M.J., Spudich, G., Trevanion, S.J., Yates, A., Zerbino, D.R., and Flicek, P. 2015. Ensembl 2015. Nucl. Acids Res. 43:D662‐D669. doi: 10.1093/nar/gku1010.
  Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T.R. 2013. STAR: Ultrafast universal RNA‐seq aligner. Bioinformatics 29:15‐21. doi: 10.1093/bioinformatics/bts635.
  Guo, W., Schafer, S., Greaser, M.L., Radke, M.H., Liss, M., Govindarajan, T., Maatz, H., Schulz, H., Li, S., Parrish, A.M., Dauksaite, V., Vakeel, P., Klaassen, S., Gerull, B., Thierfelder, L., Regitz‐Zagrosek, V., Hacker, T.A., Saupe, K.W., Dec, G.W., Ellinor, P.T., MacRae, C.A., Spallek, B., Fischer, R., Perrot, A., Özcelik, C., Saar, K., Hubner, N., and Gotthardt, M. 2012. RBM20, a gene for hereditary cardiomyopathy, regulates titin splicing. Nat. Med. 18:766‐773. doi: 10.1038/nm.2693.
  Ingolia, N., Ghaemmaghami, S., Newman, J., and Weissman, J. 2009. Genome‐wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324:218‐223. doi: 10.1126/science.1168978.
  Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The human genome browser at UCSC. Genome Res. 12:996‐1006. doi: 10.1101/gr.229102.
  Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. 2013. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14:R36.
  Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078‐2079. doi: 10.1093/bioinformatics/btp352.
  Lister, R., O'Malley, R.C., Tonti‐Filippini, J., Gregory, B.D., Berry, C.C., Millar, A.H., and Ecker, J.R. 2008. Highly integrated single‐base resolution maps of the epigenome in Arabidopsis. Cell 133:523‐536. doi: 10.1016/j.cell.2008.03.029.
  Maatz, H., Jens, M., Liss, M., Schafer, S., Heinig, M., Kirchner, M., Adami, E., Rintisch, C., Dauksaite, V., Radke, M.H., Selbach, M., Barton, P.J., Cook, S.A., Rajewsky, N., Gotthardt, M., Landthaler, M., and Hubner, N. 2014. RNA‐binding protein RBM20 represses splicing to orchestrate cardiac pre‐mRNA processing. J. Clin. Investig. 124:3419‐3430. doi: 10.1172/JCI74523.
  Mortazavi, A., Williams, B., McCue, K., Schaeffer, L., and Wold, B. 2008. Mapping and quantifying mammalian transcriptomes by RNA‐Seq. Nat. Methods 5:621‐628. doi: 10.1038/nmeth.1226.
  Quinlan, A.R. 2014. BEDTools: The Swiss‐army tool for genome feature analysis. Curr. Protoc. Bioinform. 47:11.12.1‐11.12.34.
  Quinlan, A.R. and Hall, I.M. 2010. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26:841‐842. doi: 10.1093/bioinformatics/btq033.
  Roberts, A., Ware, J., Herman, D., Schafer, S., Baksi, J., Bick, A., Buchan, R., Walsh, R., John, S., Wilkinson, S., Mazzarotto, F., Felkin, L.E., Gong, S., MacArthur, J.A., Cunningham, F., Flannick, J., Gabriel, S.B., Altshuler, D.M., Macdonald, P.S., Heinig, M., Keogh, A.M., Hayward, C.S., Banner, N.R., Pennell, D.J., O'Regan, D.P., San, T.R., de Marvao, A., Dawes, T.J., Gulati, A., Birks, E.J., Yacoub M.H., Radke M., Gotthardt M., Wilson J.G., O'Donnell C.J., Prasad S.K., Barton P.J., Fatkin D., Hubner N., Seidman J.G., Seidman C.E., and Cook S.A. 2015. Integrated allelic, transcriptional, and phenomic dissection of the cardiac effects of titin truncations in health and disease. Science Transl. Med. 7:270ra6.
  Sultan, M., Amstislavskiy, V., Risch, T., Schuette, M., Dökel, S., Ralser, M., Balzereit, D., Lehrach, H., and Yaspo, M.‐L. 2014. Influence of RNA extraction methods and library selection schemes on RNA‐seq data. BMC Genomics 15:675. doi: 10.1186/1471‐2164‐15‐675.
  Trapnell, C., Pachter, L., and Salzberg, S.L. 2009. TopHat: Discovering splice junctions with RNA‐Seq. Bioinformatics 25:1105‐1111. doi: 10.1093/bioinformatics/btp120.
  Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. 2010. Transcript assembly and quantification by RNA‐Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28:511‐515. doi: 10.1038/nbt.1621.
  Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. 2012. Differential gene and transcript expression analysis of RNA‐seq experiments with TopHat and Cufflinks. Nat. Protoc. 7:562‐578. doi: 10.1038/nprot.2012.016.
  Wang, E. T., Sandberg, R., Luo, S., Khrebtukova, I., Zhang, L., Mayr, C., Kingsmore, S. F., Schroth, G. P., and Burge, C. B. 2008. Alternative isoform regulation in human tissue transcriptomes. Nature 456:470‐476. doi: 10.1038/nature07509.
Internet Resources
  BEDTools Suite Web site, which allows for downloading and installing the BEDTools applications. This toolset can analyze genome‐wide datasets and work with genomic intervals.
  UCSC Genome browser Web site to download gene annotations and view genomic interval files.
  Download the DEXSeq package from this site to obtain the script.
  Download, install, and read about the Python package HTSeq to analyze high‐throughput sequencing data.
  Python Web site to download and install the Python software to run code that was written in the Python programming language.
  Ensembl creates and provides gene annotations for several genomes. Gene annotations downloaded from this site can be used to create exonic part annotations.
  Samtools is a software suite that enables users to read/write and edit high‐throughput sequencing data.
PDF or HTML at Wiley Online Library

Supplementary Material