基于RNA­seq数据的差异基因和异构体检测

 王黎黎1,刘学军2*,张 礼3

南京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (2) : 253.

PDF(1515199 KB)
PDF(1515199 KB)
南京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (2) : 253.

基于RNA­seq数据的差异基因和异构体检测

  •  王黎黎1,刘学军2*,张 礼3
作者信息 +

Detection of differential gene and isoform expression for RNA-seq data

  •  Wang Lili1,Liu Xuejun2*,Zhang Li
Author information +
文章历史 +

摘要

基因和异构体表达水平的差异检测是获取基因和异构体功能的重要途径,目前差异检测已经是转录组研究中一个重要的研究方向.RNA-seq技术近年来被广泛用于差异基因的检测.为模拟读段的非均匀分布,通常采用负二项分布对读段计数进行建模.现存的负二项分布模型大都是直接对基因读段计数进行建模,不能进行差异异构体检测.提出基于PGseq模型计算出的基因和异构体表达水平的负二项分布模型,采用exact test方法进行差异分析,解决了异构体的差异检测的问题.经实验验证,该方法在基因和异构体两方面的差异检测中都具有较高的准确度和灵敏度.

Abstract

High-throughput RNA sequencing(RNA-seq)has been widely applied in transcriptome analysis recently.One important research direction of transcriptome study is to detect differential expression(DE)of genes and isoforms.RNA-seq experiments produce counts of reads that are affected by biological and technical variation.To distinguish the systematic changes in expression between conditions from noise,the counts are frequently modeled by the Negative Binomial distribution.Most proposed methods using the Negative Binomial models are based on statistics that compare read counts between conditions.Unfortunately,because of read mapping ambiguity,it is difficult to exactly obtain the read counts for each isoform.As a result,these methods are not available for detecting DE isoforms.In this paper,we propose a method PGDiff to detect differential expression for both genes and isoforms,which is based on the Negative Binomial models of gene and isoform expression derived from package PGseq.Instead of modeling the distribution of whole counts for each gene,PGseq model the variability of count for each individual exon,and obtain the expression of each gene and each isoform.Unlike the count-based methods,PGDiff detect DE expression in two steps.The first step is to obtain the expressions of genes and isoforms.Then in the second step,we use exact test to detect the differential expression with the obtained expressions and the Negative Binomial models.In the aspect of detecting DE genes,we evaluated the proposed approach using MAQC dataset and Griffith dataset,and compared its performance with that of currently popular packages MMDiff,Cuffdiff,BitSeq,DESeq and baySeq.In the aspect of detecting DE isoforms,we designed two types of comparison using the human breast cancer dataset,and compared with packages Cuffdiff,BitSeq,and t-test method.For these datasets,the proposed method performed favorably in sensitivity and specificity at both the gene and isoform level.

引用本文

导出引用
 王黎黎1,刘学军2*,张 礼3. 基于RNA­seq数据的差异基因和异构体检测[J]. 南京大学学报(自然科学版), 2016, 52(2): 253
 Wang Lili1,Liu Xuejun2*,Zhang Li. Detection of differential gene and isoform expression for RNA-seq data[J]. Journal of Nanjing University(Natural Sciences), 2016, 52(2): 253

参考文献

[1] Wang X,Wang X W,Wang L K,et al.A review on the processing and analysis of next-generation RNA-Seq data.Progress in Biochemistry and Biophysics,2010,37(8):834-846.
[2]  张 礼,刘学军.一种基于Gamma模型的RAN-seq数据分析方法.南京大学学报(自然科学),2013,49(2):465-474.(Zhang L,Liu X J.A Gamma-based method of RNA-seq analysis.Journal of Nanjing University(Natural Sciences),2013,49(2):465-474.)
[3]  Rapaport F,Khanin R,Liang Y,et al.Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.Genome Biology,2013,14(9):R95.
[4]  Pan Q,Shai O,Lee L J,et al.Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing.Nature Genetics,2008,40(12):1413-1415.
[5]  Di Y,Schafer D W,Cumbie J S,et al.The NBP negative binomial model for assessing differential gene expression from RNA-Seq.Statistical Applications in Genetics and Molecular Biology,2011,10(1):1-28.
[6]  Anders S,Huber W.Differential expression analysis for sequence count data.Genome Biology,2010,11(10):R106.
[7]  Thomas J H,Krystyna A K.baySeq:Empirical Bayesian methods for identifying differential expression in sequence count data.BMC Bioinformatics,2010,11(1):422.
[8]  Soneson C,Delorenzi M.A comparison of methods for differential expression analysis of RNA-seq data.BMC Bioinformatics,2013,14(1):91.
[9]  Zhang Z H,Jhaveri D J,Marshall V M,et al.A Comparative study of techniques for differential expression analysis on RNA-Seq data.PLoS ONE,2014(9):e103207.
[10]  Turro E,Su S Y,Gon?alves ?,et al.Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads.Genome Biology,2011,12(2):R13.
[11]  Glaus P,Honkela A,Rattray M.Identifying differentially expressed transcripts from RNA-seq data with biological variation.Bioinformatics,2012,28(13):1721-1728.
[12]  Trapnell C,Roberts A,Goff L,et al.Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.Nature Protocols,2012,7(3):562-578.
[13]  Benjamini Y,Hochberg Y.Controlling the false discovery rate:A practical and powerful approach to multiple testing.Journal of the Royal Statistical Society Series B(Methodological),1995:289-300.
[14]  Bullard J H,Purdom E,Hansen K D,et al.Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.BMC Bioinformatics,2010,11(1):94.
[15]  Yu D,Huber W,Vitek O.Shrinkage estimation of dispersion in negative binomial models for RNA-seq experiments with small sample size.Bioinformatics,2013,29(10):1275-1282.
[16]  Wang E T,Sandberg R,Luo S,et al.Alternative isoform regulation in human tissue transcriptomes.Nature,2008,456(7221):470-476.

基金

基金项目:国家自然科学基金(61170152),中央高校基本科研业务费专项(CXZZ11_0217)
收稿日期:2015-06-09
*通讯联系人,E-mail:xuejun.liu@nuaa.edu.cn

PDF(1515199 KB)

2488

Accesses

0

Citation

Detail

段落导航
相关文章

/