High-throughput RNA sequencing(RNA-seq)has been widely applied in transcriptome analysis recently.One important research direction of transcriptome study is to detect differential expression(DE)of genes and isoforms.RNA-seq experiments produce counts of reads that are affected by biological and technical variation.To distinguish the systematic changes in expression between conditions from noise,the counts are frequently modeled by the Negative Binomial distribution.Most proposed methods using the Negative Binomial models are based on statistics that compare read counts between conditions.Unfortunately,because of read mapping ambiguity,it is difficult to exactly obtain the read counts for each isoform.As a result,these methods are not available for detecting DE isoforms.In this paper,we propose a method PGDiff to detect differential expression for both genes and isoforms,which is based on the Negative Binomial models of gene and isoform expression derived from package PGseq.Instead of modeling the distribution of whole counts for each gene,PGseq model the variability of count for each individual exon,and obtain the expression of each gene and each isoform.Unlike the count-based methods,PGDiff detect DE expression in two steps.The first step is to obtain the expressions of genes and isoforms.Then in the second step,we use exact test to detect the differential expression with the obtained expressions and the Negative Binomial models.In the aspect of detecting DE genes,we evaluated the proposed approach using MAQC dataset and Griffith dataset,and compared its performance with that of currently popular packages MMDiff,Cuffdiff,BitSeq,DESeq and baySeq.In the aspect of detecting DE isoforms,we designed two types of comparison using the human breast cancer dataset,and compared with packages Cuffdiff,BitSeq,and t-test method.For these datasets,the proposed method performed favorably in sensitivity and specificity at both the gene and isoform level.
Wang Lili1,Liu Xuejun2*,Zhang Li.
Detection of differential gene and isoform expression for RNA-seq data[J]. Journal of Nanjing University(Natural Sciences), 2016, 52(2): 253
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Wang X,Wang X W,Wang L K,et al.A review on the processing and analysis of next-generation RNA-Seq data.Progress in Biochemistry and Biophysics,2010,37(8):834-846. [2] 张 礼,刘学军.一种基于Gamma模型的RAN-seq数据分析方法.南京大学学报(自然科学),2013,49(2):465-474.(Zhang L,Liu X J.A Gamma-based method of RNA-seq analysis.Journal of Nanjing University(Natural Sciences),2013,49(2):465-474.) [3] Rapaport F,Khanin R,Liang Y,et al.Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.Genome Biology,2013,14(9):R95. [4] Pan Q,Shai O,Lee L J,et al.Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing.Nature Genetics,2008,40(12):1413-1415. [5] Di Y,Schafer D W,Cumbie J S,et al.The NBP negative binomial model for assessing differential gene expression from RNA-Seq.Statistical Applications in Genetics and Molecular Biology,2011,10(1):1-28. [6] Anders S,Huber W.Differential expression analysis for sequence count data.Genome Biology,2010,11(10):R106. [7] Thomas J H,Krystyna A K.baySeq:Empirical Bayesian methods for identifying differential expression in sequence count data.BMC Bioinformatics,2010,11(1):422. [8] Soneson C,Delorenzi M.A comparison of methods for differential expression analysis of RNA-seq data.BMC Bioinformatics,2013,14(1):91. [9] Zhang Z H,Jhaveri D J,Marshall V M,et al.A Comparative study of techniques for differential expression analysis on RNA-Seq data.PLoS ONE,2014(9):e103207. [10] Turro E,Su S Y,Gon?alves ?,et al.Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads.Genome Biology,2011,12(2):R13. [11] Glaus P,Honkela A,Rattray M.Identifying differentially expressed transcripts from RNA-seq data with biological variation.Bioinformatics,2012,28(13):1721-1728. [12] Trapnell C,Roberts A,Goff L,et al.Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.Nature Protocols,2012,7(3):562-578. [13] Benjamini Y,Hochberg Y.Controlling the false discovery rate:A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society Series B(Methodological),1995:289-300. [14] Bullard J H,Purdom E,Hansen K D,et al.Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.BMC Bioinformatics,2010,11(1):94. [15] Yu D,Huber W,Vitek O.Shrinkage estimation of dispersion in negative binomial models for RNA-seq experiments with small sample size.Bioinformatics,2013,29(10):1275-1282. [16] Wang E T,Sandberg R,Luo S,et al.Alternative isoform regulation in human tissue transcriptomes.Nature,2008,456(7221):470-476.