[med-svn] [r-bioc-ebseq] 02/06: Imported Upstream version 1.10.0
Andreas Tille
tille at debian.org
Sun Nov 8 20:32:06 UTC 2015
This is an automated email from the git hooks/post-receive script.
tille pushed a commit to branch master
in repository r-bioc-ebseq.
commit e29ba01e3a3fc221dfa25bd60dce059b4fff66b0
Author: Andreas Tille <tille at debian.org>
Date: Sun Nov 8 21:11:37 2015 +0100
Imported Upstream version 1.10.0
---
DESCRIPTION | 12 +--
NAMESPACE | 4 +-
NEWS | 52 ++++++++++++-
R/EBMultiTest.R | 16 ++--
R/EBTest.R | 11 ++-
R/GetDEResults.R | 76 +++++++++++++++++++
R/MedianNorm.R | 18 ++++-
R/PlotPostVsRawFC.R | 2 +-
README.md | 126 ++++++++++++++++++++++++++++++++
build/vignette.rds | Bin 199 -> 203 bytes
demo/EBSeq.R | 28 ++-----
inst/doc/EBSeq_Vignette.R | 142 +++++++++++++++++-------------------
inst/doc/EBSeq_Vignette.Rnw | 169 +++++++++++++++++++++++++++++++++----------
inst/doc/EBSeq_Vignette.pdf | Bin 958612 -> 946690 bytes
man/EBMultiTest.Rd | 6 +-
man/EBTest.Rd | 7 +-
man/GetDEResults.Rd | 104 ++++++++++++++++++++++++++
man/MedianNorm.Rd | 3 +-
vignettes/EBSeq_Vignette.Rnw | 169 +++++++++++++++++++++++++++++++++----------
19 files changed, 740 insertions(+), 205 deletions(-)
diff --git a/DESCRIPTION b/DESCRIPTION
index a05a01d..a80992f 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -2,11 +2,11 @@ Package: EBSeq
Type: Package
Title: An R package for gene and isoform differential expression
analysis of RNA-seq data
-Version: 1.6.0
-Date: 2014-9-17
+Version: 1.10.0
+Date: 2015-7-28
Author: Ning Leng, Christina Kendziorski
-Maintainer: Ning Leng <nleng at wisc.edu>
-Depends: blockmodeling, gplots, R (>= 3.0.0)
+Maintainer: Ning Leng <lengning1 at gmail.com>
+Depends: blockmodeling, gplots, testthat, R (>= 3.0.0)
Description: Differential Expression analysis at both gene and isoform
level using RNA-seq data
License: Artistic-2.0
@@ -17,7 +17,9 @@ Collate: 'MedianNorm.R' 'GetNg.R' 'beta.mom.R' 'f0.R' 'f1.R'
'GetPPMat.R' 'GetMultiPP.R' 'GetMultiFC.R' 'PlotPostVsRawFC.R'
'crit_fun.R' 'DenNHist.R' 'GetNormalizedMat.R' 'PlotPattern.R'
'PolyFitPlot.R' 'QQP.R' 'QuantileNorm.R' 'RankNorm.R'
+ 'GetDEResults.R'
BuildVignettes: yes
biocViews: StatisticalMethod, DifferentialExpression,
MultipleComparison, RNASeq, Sequencing
-Packaged: 2014-10-14 03:22:48 UTC; biocbuild
+NeedsCompilation: no
+Packaged: 2015-10-14 02:58:51 UTC; biocbuild
diff --git a/NAMESPACE b/NAMESPACE
index 9b63076..97446ce 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -1,5 +1,5 @@
export(crit_fun, DenNHist, EBTest, GetNg, GetPP, MedianNorm,
PolyFitPlot, PostFC, QQP, QuantileNorm, RankNorm, EBMultiTest,
GetMultiPP, GetPatterns, PlotPattern, GetPPMat, GetMultiFC, PlotPostVsRawFC,
-GetNormalizedMat,f0,f1,LogN,LogNMulti)
-
+GetNormalizedMat,f0,f1,LogN,LogNMulti, GetDEResults)
+import(blockmodeling, gplots, testthat)
diff --git a/NEWS b/NEWS
index 0424f32..20cf398 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,49 @@
+
+CHANGES IN VERSION 1.9.3
+------------------------
+
+
+ o Correct typos in GetDEResults help file.
+ o Include an additional method for normalization.
+
+CHANGES IN VERSION 1.9.2
+------------------------
+
+
+ o Fixed a bug which may cause error when input a matrix to the sizeFactors parameter
+
+CHANGES IN VERSION 1.9.1
+------------------------
+
+
+ o Added Q&A seqction in vignette to address common questions
+
+CHANGES IN VERSION 1.7.1
+------------------------
+
+
+ o In EBSeq 1.7.1, EBSeq incorporates a new function
+ GetDEResults() which may be used to obtain a list of transcripts under a target FDR
+ in a two-condition experiment.
+ The results obtained by applying this function with its default setting will be
+ more robust to transcripts with low variance and potential outliers.
+ By using the default settings in this function,
+ the number of genes identified in any given analysis may
+ differ slightly from the previous version (1.7.0 or order).
+ To obtain results that are comparable
+ to results from earlier versions of EBSeq (1.7.0 or older), a user may set
+ Method="classic" in GetDEResults() function, or use the original GetPPMat() function.
+ The GeneDEResults() function also allows a user to modify thresholds to
+ target genes/isoforms with a pre-specified posterior fold change.
+
+ o Also, in EBSeq 1.7.1, the default settings in EBTest() and EBMultiTest() function
+ will only remove transcripts with all 0's (instead of removing transcripts with
+ 75th quantile less than 10 in version 1.3.3-1.7.0).
+ To obtain a list of transcripts comparable to the results generated by
+ EBSeq version 1.3.3-1.7.0, a user may change Qtrm = 0.75 and QtrmCut = 10
+ when applying EBTest() or EBMultiTest() function.
+
+
CHANGES IN VERSION 1.5.4
------------------------
@@ -34,7 +80,7 @@ NEW FEATURES
low expressed genes (genes whose 75th quantile of normalized counts is less
than 10) before identifying DE genes.
These two thresholds can be changed in EBTest function.
- We found that low expressed genes are more easily to be affected by noises.
- Removing these genes prior to downstream analyses can improve the
- model fitting and reduce impacts of noisy genes (e.g. genes with outliers).
+ Because low expressed genes are disproportionately noisy,
+ removing these genes prior to downstream analyses can improve model fitting and increase robustness
+ (e.g. by removing outliers).
diff --git a/R/EBMultiTest.R b/R/EBMultiTest.R
index ce2dd7e..54d6352 100644
--- a/R/EBMultiTest.R
+++ b/R/EBMultiTest.R
@@ -1,25 +1,23 @@
EBMultiTest <-
-function(Data,NgVector=NULL,Conditions,AllParti=NULL, sizeFactors, maxround, Pool=F, NumBin=1000, ApproxVal=10^-10,PoolLower=.25, PoolUpper=.75,Print=T,Qtrm=.75,QtrmCut=10)
+function(Data,NgVector=NULL,Conditions,AllParti=NULL, sizeFactors, maxround, Pool=F, NumBin=1000, ApproxVal=10^-10,PoolLower=.25, PoolUpper=.75,Print=T,Qtrm=1,QtrmCut=0)
{
+ expect_is(sizeFactors, c("numeric","integer"))
+ expect_is(maxround, c("numeric","integer"))
if(!is.factor(Conditions))Conditions=as.factor(Conditions)
if(is.null(rownames(Data)))stop("Please add gene/isoform names to the data matrix")
- if(!is.matrix(Data))stop("The input Data is not a matrix")
+ if(!is.matrix(Data))stop("The input Data is not a matrix")
if(length(Conditions)!=ncol(Data))stop("The number of conditions is not the same as the number of samples! ")
if(nlevels(Conditions)==2)stop("Only 2 conditions - Please use EBTest() function")
if(nlevels(Conditions)<2)stop("Less than 2 conditions - Please check your input")
if(length(sizeFactors)!=length(Data) & length(sizeFactors)!=ncol(Data))
stop("The number of library size factors is not the same as the number of samples!")
-
tau=CI=CIthre=NULL
Dataraw=Data
-
-
+
#Normalized
DataNorm=GetNormalizedMat(Data, sizeFactors)
-
-
QuantileFor0=apply(DataNorm,1,function(i)quantile(i,Qtrm))
AllZeroNames=which(QuantileFor0<=QtrmCut)
NotAllZeroNames=which(QuantileFor0>QtrmCut)
@@ -32,7 +30,7 @@ function(Data,NgVector=NULL,Conditions,AllParti=NULL, sizeFactors, maxround, Po
if(!is.null(NgVector))NgVector=NgVector[NotAllZeroNames]
if(is.null(NgVector))NgVector=rep(1,nrow(Data))
-
+ if(length(sizeFactors)!=ncol(Data))sizeFactors=sizeFactors[NotAllZeroNames,]
#ReNameThem
IsoNamesIn=rownames(Data)
@@ -360,7 +358,7 @@ if (length(AllNA)>0){
if(length(sizeFactors)==ncol(Data))
R.NotIn=matrix(outer(R.NotIn.raw,sizeFactors),nrow=length(AllNA.Ngorder))
if(length(sizeFactors)==length(Data))
- R.NotIn=matrix(R.NotIn.raw*sizeFactors[NotIn,],nrow=length(AllNA.Ngorder))
+ R.NotIn=matrix(R.NotIn.raw*sizeFactors[names(R.NotIn.raw),],nrow=length(AllNA.Ngorder))
DataListNotIn.unlistWithZ=matrix(DataList.unlist[AllNA.Ngorder,],
nrow=length(AllNA.Ngorder))
diff --git a/R/EBTest.R b/R/EBTest.R
index 4453909..ddd9e74 100644
--- a/R/EBTest.R
+++ b/R/EBTest.R
@@ -1,6 +1,8 @@
EBTest <-
-function(Data,NgVector=NULL,Conditions, sizeFactors, maxround, Pool=F, NumBin=1000,ApproxVal=10^-10, Alpha=NULL, Beta=NULL,PInput=NULL,RInput=NULL,PoolLower=.25, PoolUpper=.75,Print=T, Qtrm=.75,QtrmCut=10)
+function(Data,NgVector=NULL,Conditions, sizeFactors, maxround, Pool=F, NumBin=1000,ApproxVal=10^-10, Alpha=NULL, Beta=NULL,PInput=NULL,RInput=NULL,PoolLower=.25, PoolUpper=.75,Print=T, Qtrm=1,QtrmCut=0)
{
+ expect_is(sizeFactors, c("numeric","integer"))
+ expect_is(maxround, c("numeric","integer"))
if(!is.factor(Conditions))Conditions=as.factor(Conditions)
if(is.null(rownames(Data)))stop("Please add gene/isoform names to the data matrix")
@@ -17,6 +19,7 @@ function(Data,NgVector=NULL,Conditions, sizeFactors, maxround, Pool=F, NumBin=10
#Normalized
DataNorm=GetNormalizedMat(Data, sizeFactors)
+ expect_is(DataNorm, "matrix")
Levels=levels(as.factor(Conditions))
# Dixon Statistics
@@ -41,7 +44,7 @@ function(Data,NgVector=NULL,Conditions, sizeFactors, maxround, Pool=F, NumBin=10
if(length(NotAllZeroNames)==0)stop("0 transcript passed")
Data=Data[NotAllZeroNames,]
if(!is.null(NgVector))NgVector=NgVector[NotAllZeroNames]
- if(length(sizeFactors)==length(Data))sizeFactors=sizeFactors[NotAllZeroNames,]
+ if(length(sizeFactors)!=ncol(Data))sizeFactors=sizeFactors[NotAllZeroNames,]
if(is.null(NgVector))NgVector=rep(1,nrow(Data))
#Rename Them
@@ -528,7 +531,7 @@ if (length(AllNA)>0){
if(length(sizeFactors)==ncol(Data))
R.NotIn=outer(R.NotIn.raw,sizeFactors)
if(length(sizeFactors)==length(Data))
- R.NotIn=R.NotIn.raw*sizeFactors[NotIn,]
+ R.NotIn=R.NotIn.raw*sizeFactors[names(R.NotIn.raw),]
R.NotIn1=matrix(R.NotIn[,Conditions==levels(Conditions)[1]],nrow=nrow(R.NotIn))
R.NotIn2=matrix(R.NotIn[,Conditions==levels(Conditions)[2]],nrow=nrow(R.NotIn))
@@ -587,6 +590,6 @@ Result=list(Alpha=UpdateAlpha,Beta=UpdateBeta,P=UpdateP,
C2EstVar=RealName.C2VarList, PoolVar=RealName.PoolVarList ,
DataList=RealName.DataList,PPDE=AllZ,f0=AllF0, f1=AllF1,
AllZeroIndex=AllZeroNames,PPMat=PPMatNZ, PPMatWith0=PPMat,
- ConditionOrder=CondOut, Conditions=Conditions)
+ ConditionOrder=CondOut, Conditions=Conditions, DataNorm=DataNorm)
}
diff --git a/R/GetDEResults.R b/R/GetDEResults.R
new file mode 100644
index 0000000..f750ee7
--- /dev/null
+++ b/R/GetDEResults.R
@@ -0,0 +1,76 @@
+GetDEResults<-function(EBPrelim, FDR=0.05, Method="robust",
+ FDRMethod="hard", Threshold_FC=0.7,
+ Threshold_FCRatio=0.3, SmallNum=0.01)
+{
+ if(!"PPDE"%in%names(EBPrelim))stop("The input doesn't seem like an output from EBTest")
+
+ #################
+ Conditions = EBPrelim$Conditions
+ Levels = levels(as.factor(Conditions))
+ PPcut=FDR
+ # normalized data
+ GeneMat=EBPrelim$DataNorm
+
+
+ ###Get DEfound by FDRMethod type
+ PP=GetPPMat(EBPrelim)
+ if(FDRMethod=="hard")
+ {DEfound=rownames(PP)[which(PP[,"PPDE"]>=(1-PPcut))]}
+ else{SoftThre=crit_fun(PP[,"PPEE"],PPcut)
+ DEfound=rownames(PP)[which(PP[,"PPDE"]>=SoftThre)]}
+
+ # classic
+ if(Method=="classic"){
+ Gene_status=rep("EE",dim(GeneMat)[1])
+ names(Gene_status)=rownames(GeneMat)
+ Gene_status[DEfound]="DE"
+ NoTest_genes=rownames(GeneMat)[!(rownames(GeneMat)%in%rownames(PP))]
+ Gene_status[NoTest_genes]="Filtered: Low Expression"
+
+ PPMatWith0=EBPrelim$PPMatWith0
+ PPMatWith0[NoTest_genes,]=c(NA,NA)
+
+ return(list(DEfound=DEfound,PPMat=PPMatWith0,Status=Gene_status))
+ }
+ else{
+ ###Post_Foldchange
+ PostFoldChange=PostFC(EBPrelim)
+ PPFC=PostFoldChange$PostFC
+
+ OldPPFC=PPFC[DEfound]
+ OldPPFC[which(OldPPFC>1)]=1/OldPPFC[which(OldPPFC>1)]
+
+ FilterFC=names(OldPPFC)[which(OldPPFC>Threshold_FC)]
+
+ ###New Fold Change
+ NewFC1=apply(matrix(GeneMat[DEfound,which(Conditions==Levels[[1]])]+SmallNum,
+ nrow=length(DEfound)),1,median)
+ NewFC2=apply(matrix(GeneMat[DEfound,which(Conditions==Levels[[2]])]+SmallNum,
+ nrow=length(DEfound)),1,median)
+ NewFC=NewFC1/NewFC2
+ NewFC[which(NewFC>1)]=1/NewFC[which(NewFC>1)]
+
+ ###FC Ratio
+ FCRatio=NewFC/OldPPFC
+ FCRatio[which(OldPPFC<NewFC)]=1/FCRatio[which(OldPPFC<NewFC)]
+
+ FilterFCR=names(FCRatio)[which(FCRatio<Threshold_FCRatio)]
+
+ ###Results format Setting
+ Gene_status=rep("EE",dim(GeneMat)[1])
+ names(Gene_status)=rownames(GeneMat)
+ Gene_status[DEfound]="DE"
+ NoTest_genes=rownames(GeneMat)[!(rownames(GeneMat)%in%rownames(PP))]
+ Gene_status[NoTest_genes]="Filtered: Low Expression"
+ ###Filtering
+ Filtered_DEfound=setdiff(DEfound, union(FilterFC, FilterFCR))
+
+ PPMatWith0=EBPrelim$PPMatWith0
+ NAGenes=union(NoTest_genes,union(FilterFC, FilterFCR))
+ PPMatWith0[NAGenes,]=c(NA,NA)
+
+ Gene_status[FilterFC]="Filtered: Fold Change"
+ Gene_status[FilterFCR]="Filtered: Fold Change Ratio"
+
+ return(list(DEfound=Filtered_DEfound,PPMat=PPMatWith0,Status=Gene_status))
+}}
diff --git a/R/MedianNorm.R b/R/MedianNorm.R
index afd7b37..caa9a36 100644
--- a/R/MedianNorm.R
+++ b/R/MedianNorm.R
@@ -1,5 +1,17 @@
-MedianNorm=function(Data){
+MedianNorm <- function(Data, alternative=FALSE){
if(ncol(Data)==1)stop("Only 1 sample!")
- geomeans <- exp(rowMeans(log(Data)))
- apply(Data, 2, function(cnts) median((cnts/geomeans)[geomeans > 0]))
+ if(!alternative){
+ geomeans <- exp(rowMeans(log(Data)))
+ out <- apply(Data, 2, function(cnts) median((cnts/geomeans)[geomeans > 0]))}
+ if(alternative){
+ DataMatO <- Data
+ N <- ncol(DataMatO)
+ DataList0 <- sapply(1:N,function(i)DataMatO[,i]/DataMatO,simplify=F)
+ DataEachMed0 <- sapply(1:N,function(i)apply(DataList0[[i]],2,function(j)median(j[which(j>0
+ & j<Inf)])))
+ DataColgeo <- sapply(1:N,function(i)exp(mean(log(DataEachMed0[-i,i]))))
+ out <- DataColgeo
+ }
+
+ out
}
diff --git a/R/PlotPostVsRawFC.R b/R/PlotPostVsRawFC.R
index 64ba2b3..519bf6d 100644
--- a/R/PlotPostVsRawFC.R
+++ b/R/PlotPostVsRawFC.R
@@ -1,5 +1,5 @@
PlotPostVsRawFC<-function(EBOut,FCOut){
-library(gplots)
+#library(gplots)
par(fig=c(0,.8,0,1), new=F)
RainbowColor=rev(redgreen(length(FCOut$PostFC)))
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..f876425
--- /dev/null
+++ b/README.md
@@ -0,0 +1,126 @@
+# EBSeq Q & A
+
+
+## ReadIn data
+
+csv file:
+
+```
+In=read.csv("FileName", stringsAsFactors=F, row.names=1, header=T)
+
+Data=data.matrix(In)
+```
+
+txt file:
+```
+In=read.table("FileName", stringsAsFactors=F, row.names=1, header=T)
+
+Data=data.matrix(In)
+```
+check str(Data) and make sure it is a matrix instead of data frame. You may need to play around with the row.names and header option depends on how the input file was generated.
+
+
+
+## GetDEResults() function not found
+
+You may on an earlier version of EBSeq. The GetDEResults function
+was introduced since version 1.7.1.
+The latest release version could be found at:
+http://www.bioconductor.org/packages/devel/bioc/html/EBSeq.html
+And you may check your package version by typing packageVersion("EBSeq")
+
+
+## Visualizing DE genes/isoforms
+
+To generate a heatmap, you may consider the heatmap.2 function in gplots package.
+For example, you may run
+```
+heatmap.2(NormalizedMatrix[GenesOfInterest,], scale="row", trace="none", Colv=F)
+```
+The normalized matrix may be obtained from GetNormalizedMat() function.
+
+
+## My favorite gene/isoform has NA in PP (status "NoTest")
+
+The NoTest status comes from two sources
+
+1) Using the default parameter settings of EBMultiTest(), the function
+will not test on genes with more than 75% values < 10 to ensure better
+model fitting. To disable this filter, you may set Qtrm=1 and
+QtrmCut=0.
+
+2) numerical over/underflow in R. That happens when the within
+condition variance is extremely large or small. I did implemented a numerical
+approximation step to calculate the approximated PP for these genes
+with over/underflow. Here I use 10^-10 to approximate the parameter p
+in the NB distribution for these genes (I set it to a small value
+since I want to cover more over/underflow genes with low
+within-condition variation). You may try to tune this value (to a larger value) in the
+approximation by setting ApproxVal in EBTest() or EBMultiTest() function.
+
+## Can I run more than 5 iterations when running EBSeq via RSEM wrapper?
+
+Yes you may modify the script rsem-for-ebseq-find-DE under RSEM/EBSeq
+change line 36
+```
+EBOut <- EBTest(Data = DataMat, NgVector = ngvector, Conditions =
+conditions, sizeFactors = Sizes, maxround = 5)
+```
+to
+```
+EBOut <- EBTest(Data = DataMat, NgVector = ngvector, Conditions =
+conditions, sizeFactors = Sizes, maxround = 10)
+```
+If you are running multiple condition analysis, you will need to change line 53:
+```
+MultiOut <- EBMultiTest(Data = DataMat, NgVector = ngvector,
+Conditions = conditions, AllParti = patterns, sizeFactors = Sizes,
+maxround = 5)
+```
+```
+MultiOut <- EBMultiTest(Data = DataMat, NgVector = ngvector,
+Conditions = conditions, AllParti = patterns, sizeFactors = Sizes,
+maxround = 10)
+```
+You will need to redo make after you make the changes.
+
+## I saw a gene has significant FC but is not called as DE by EBSeq, why does that happen?
+
+EBSeq calls a gene as DE (assign high PPDE) if the across-condition variability is significantly larger than the within-condition
+variability. In the cases that a gene has large within-condition variation, although the FC across two conditions is large (small),
+the across-condition difference could still be explained by biological variation within condition. In these cases the gene/isoform
+will have a moderate PPDE.
+
+## Can I look at TPMs/RPKMs/FPKMs across samples?
+
+In general, it is not appropriate to perform cross sample comparisons using TPM, FPKM or RPKM without further normalization.
+Instead, you may use normalized counts (It can be generated by GetNormalizedMat() function from raw count,
+note EBSeq testing functions takes raw counts and library size factors)
+
+Here is an example:
+
+Suppose there are 2 samples S1 and S2 from different conditions. Each has 5 genes. For simplicity, we assume
+each of 5 genes contains only one isoform and all genes have the same length.
+
+Assume only gene 5 is DE and the gene expressions of these 5 genes are:
+
+
+
+|Sample|g1|g2|g3|g4|g5|
+|---|---|---|---|---|---|
+|S1|10|10|10|10|10|
+|S2| 20 | 20 | 20 | 20 | 100 |
+
+Then the TPM/FPKM/RPKM will be (note sum TPM/FPKM/RPKM of all genes should be 10^6 ):
+
+|Sample|g1|g2|g3|g4|g5|
+|---|---|---|---|---|---|
+| S1 | 2x10^5 | 2x10^5 | 2x10^5 | 2x10^5 | 2x10^5 |
+| S2 | 1.1x10^5| 1.1x10^5| 1.1x10^5| 1.1x10^5| 5.6x10^5|
+
+Based on TPM/FPKM/RPKM, an investigator may conclude that the first 4 genes are down-regulated and the 5th gene is up-regulated.
+Then we will get 4 false positive calls.
+
+Cross-sample TPM/FPKM/RPKM comparisons will be feasible only when no hypothetical DE genes present across samples
+(Or when assuming the DE genes are sort of 'symmetric' regarding up and down regulation).
+
diff --git a/build/vignette.rds b/build/vignette.rds
index 6c7e94f..489e54e 100644
Binary files a/build/vignette.rds and b/build/vignette.rds differ
diff --git a/demo/EBSeq.R b/demo/EBSeq.R
index 3d5024d..8a264b4 100644
--- a/demo/EBSeq.R
+++ b/demo/EBSeq.R
@@ -5,11 +5,8 @@ str(GeneMat)
Sizes=MedianNorm(GeneMat)
EBOut=EBTest(Data=GeneMat,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5)
-PP=GetPPMat(EBOut)
-str(PP)
-head(PP)
-DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)]
-str(DEfound)
+DEOut=GetDEResults(EBOut)
+str(DEOut)
#3.2
data(IsoList)
str(IsoList)
@@ -23,10 +20,7 @@ IsoNgTrun=NgList$IsoformNgTrun
IsoNgTrun[c(1:3,201:203,601:603)]
IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5)
-IsoPP=GetPPMat(IsoEBOut)
-str(IsoPP)
-head(IsoPP)
-IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)]
+IsoDE=GetDEResults(IsoEBOut)
str(IsoDE)
#3.3
data(MultiGeneMat)
@@ -73,11 +67,7 @@ str(GeneMat)
Sizes=MedianNorm(GeneMat)
EBOut=EBTest(Data=GeneMat,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5)
-PP=GetPPMat(EBOut)
-str(PP)
-head(PP)
-DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)]
-str(DEfound)
+DEOut=GetDEResults(EBOut)
EBOut$Alpha
EBOut$Beta
EBOut$P
@@ -102,9 +92,7 @@ IsoNgTrun=NgList$IsoformNgTrun
IsoNgTrun[c(1:3,201:203,601:603)]
IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5)
-IsoPP=GetPPMat(IsoEBOut)
-str(IsoPP)
-IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)]
+IsoDE=GetDEResults(IsoEBOut)
str(IsoDE)
IsoEBOut$Alpha
IsoEBOut$Beta
@@ -194,8 +182,7 @@ GeneMat.norep=GeneMat[,c(1,6)]
Sizes.norep=MedianNorm(GeneMat.norep)
EBOut.norep=EBTest(Data=GeneMat.norep,
Conditions=as.factor(rep(c("C1","C2"))),sizeFactors=Sizes.norep, maxround=5)
-PP.norep=GetPPMat(EBOut.norep)
-DEfound.norep=rownames(PP.norep)[which(PP.norep[,"PPDE"]>=.95)]
+DE.norep=GetDEResults(EBOut.norep)
GeneFC.norep=PostFC(EBOut.norep)
@@ -210,8 +197,7 @@ IsoMat.norep=IsoMat[,c(1,6)]
IsoSizes.norep=MedianNorm(IsoMat.norep)
IsoEBOut.norep=EBTest(Data=IsoMat.norep, NgVector=IsoNgTrun,
Conditions=as.factor(c("C1","C2")),sizeFactors=IsoSizes.norep, maxround=5)
-IsoPP.norep=GetPPMat(IsoEBOut.norep)
-IsoDE.norep=rownames(IsoPP.norep)[which(IsoPP.norep[,"PPDE"]>=.95)]
+IsoDE.norep=GetDEResults(IsoEBOut.norep)
IsoFC.norep=PostFC(IsoEBOut.norep)
diff --git a/inst/doc/EBSeq_Vignette.R b/inst/doc/EBSeq_Vignette.R
index 1595b03..372d896 100644
--- a/inst/doc/EBSeq_Vignette.R
+++ b/inst/doc/EBSeq_Vignette.R
@@ -27,22 +27,16 @@ Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5)
###################################################
-### code chunk number 5: EBSeq_Vignette.Rnw:241-244
+### code chunk number 5: EBSeq_Vignette.Rnw:240-244
###################################################
-PP=GetPPMat(EBOut)
-str(PP)
-head(PP)
+EBDERes=GetDEResults(EBOut, FDR=0.05)
+str(EBDERes$DEfound)
+head(EBDERes$PPMat)
+str(EBDERes$Status)
###################################################
-### code chunk number 6: EBSeq_Vignette.Rnw:249-251
-###################################################
-DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)]
-str(DEfound)
-
-
-###################################################
-### code chunk number 7: EBSeq_Vignette.Rnw:285-291
+### code chunk number 6: EBSeq_Vignette.Rnw:289-295
###################################################
data(IsoList)
str(IsoList)
@@ -53,13 +47,13 @@ IsosGeneNames=IsoList$IsosGeneNames
###################################################
-### code chunk number 8: EBSeq_Vignette.Rnw:298-299
+### code chunk number 7: EBSeq_Vignette.Rnw:302-303
###################################################
IsoSizes=MedianNorm(IsoMat)
###################################################
-### code chunk number 9: EBSeq_Vignette.Rnw:320-323
+### code chunk number 8: EBSeq_Vignette.Rnw:324-327
###################################################
NgList=GetNg(IsoNames, IsosGeneNames)
IsoNgTrun=NgList$IsoformNgTrun
@@ -67,26 +61,25 @@ IsoNgTrun[c(1:3,201:203,601:603)]
###################################################
-### code chunk number 10: EBSeq_Vignette.Rnw:335-342
+### code chunk number 9: EBSeq_Vignette.Rnw:339-345
###################################################
IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5)
-IsoPP=GetPPMat(IsoEBOut)
-str(IsoPP)
-head(IsoPP)
-IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)]
-str(IsoDE)
+IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05)
+str(IsoEBDERes$DEfound)
+head(IsoEBDERes$PPMat)
+str(IsoEBDERes$Status)
###################################################
-### code chunk number 11: EBSeq_Vignette.Rnw:353-355
+### code chunk number 10: EBSeq_Vignette.Rnw:368-370
###################################################
data(MultiGeneMat)
str(MultiGeneMat)
###################################################
-### code chunk number 12: EBSeq_Vignette.Rnw:363-366
+### code chunk number 11: EBSeq_Vignette.Rnw:378-381
###################################################
Conditions=c("C1","C1","C2","C2","C3","C3")
PosParti=GetPatterns(Conditions)
@@ -94,14 +87,14 @@ PosParti
###################################################
-### code chunk number 13: EBSeq_Vignette.Rnw:374-376
+### code chunk number 12: EBSeq_Vignette.Rnw:389-391
###################################################
Parti=PosParti[-3,]
Parti
###################################################
-### code chunk number 14: EBSeq_Vignette.Rnw:381-384
+### code chunk number 13: EBSeq_Vignette.Rnw:396-399
###################################################
MultiSize=MedianNorm(MultiGeneMat)
MultiOut=EBMultiTest(MultiGeneMat,NgVector=NULL,Conditions=Conditions,
@@ -109,7 +102,7 @@ AllParti=Parti, sizeFactors=MultiSize, maxround=5)
###################################################
-### code chunk number 15: EBSeq_Vignette.Rnw:388-393
+### code chunk number 14: EBSeq_Vignette.Rnw:403-408
###################################################
MultiPP=GetMultiPP(MultiOut)
names(MultiPP)
@@ -119,7 +112,7 @@ MultiPP$Patterns
###################################################
-### code chunk number 16: EBSeq_Vignette.Rnw:412-420
+### code chunk number 15: EBSeq_Vignette.Rnw:427-435
###################################################
data(IsoMultiList)
IsoMultiMat=IsoMultiList[[1]]
@@ -132,21 +125,21 @@ Conditions=c("C1","C1","C2","C2","C3","C3","C4","C4")
###################################################
-### code chunk number 17: EBSeq_Vignette.Rnw:426-428
+### code chunk number 16: EBSeq_Vignette.Rnw:441-443
###################################################
PosParti.4Cond=GetPatterns(Conditions)
PosParti.4Cond
###################################################
-### code chunk number 18: EBSeq_Vignette.Rnw:433-435
+### code chunk number 17: EBSeq_Vignette.Rnw:448-450
###################################################
Parti.4Cond=PosParti.4Cond[c(1,2,3,8,15),]
Parti.4Cond
###################################################
-### code chunk number 19: EBSeq_Vignette.Rnw:440-444
+### code chunk number 18: EBSeq_Vignette.Rnw:455-459
###################################################
IsoMultiOut=EBMultiTest(IsoMultiMat,
NgVector=IsoNgTrun.Multi,Conditions=Conditions,
@@ -155,7 +148,7 @@ maxround=5)
###################################################
-### code chunk number 20: EBSeq_Vignette.Rnw:448-453
+### code chunk number 19: EBSeq_Vignette.Rnw:463-468
###################################################
IsoMultiPP=GetMultiPP(IsoMultiOut)
names(MultiPP)
@@ -165,26 +158,26 @@ IsoMultiPP$Patterns
###################################################
-### code chunk number 21: EBSeq_Vignette.Rnw:470-475 (eval = FALSE)
+### code chunk number 20: EBSeq_Vignette.Rnw:485-490 (eval = FALSE)
###################################################
## data(GeneMat)
## Sizes=MedianNorm(GeneMat)
## EBOut=EBTest(Data=GeneMat,
## Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5)
-## PP=GetPPMat(EBOut)
+## EBDERes=GetDEResults(EBOut, FDR=0.05)
###################################################
-### code chunk number 22: EBSeq_Vignette.Rnw:477-481
+### code chunk number 21: EBSeq_Vignette.Rnw:492-496
###################################################
-str(PP)
-head(PP)
-DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)]
-str(DEfound)
+EBDERes=GetDEResults(EBOut, FDR=0.05)
+str(EBDERes$DEfound)
+head(EBDERes$PPMat)
+str(EBDERes$Status)
###################################################
-### code chunk number 23: EBSeq_Vignette.Rnw:491-494
+### code chunk number 22: EBSeq_Vignette.Rnw:506-509
###################################################
GeneFC=PostFC(EBOut)
str(GeneFC)
@@ -192,7 +185,7 @@ PlotPostVsRawFC(EBOut,GeneFC)
###################################################
-### code chunk number 24: EBSeq_Vignette.Rnw:515-518
+### code chunk number 23: EBSeq_Vignette.Rnw:530-533
###################################################
EBOut$Alpha
EBOut$Beta
@@ -200,21 +193,21 @@ EBOut$P
###################################################
-### code chunk number 25: EBSeq_Vignette.Rnw:537-539
+### code chunk number 24: EBSeq_Vignette.Rnw:552-554
###################################################
par(mfrow=c(1,2))
QQP(EBOut)
###################################################
-### code chunk number 26: EBSeq_Vignette.Rnw:555-557
+### code chunk number 25: EBSeq_Vignette.Rnw:570-572
###################################################
par(mfrow=c(1,2))
DenNHist(EBOut)
###################################################
-### code chunk number 27: EBSeq_Vignette.Rnw:578-583 (eval = FALSE)
+### code chunk number 26: EBSeq_Vignette.Rnw:593-598 (eval = FALSE)
###################################################
## data(IsoList)
## IsoMat=IsoList$IsoMat
@@ -224,7 +217,7 @@ DenNHist(EBOut)
###################################################
-### code chunk number 28: EBSeq_Vignette.Rnw:585-588
+### code chunk number 27: EBSeq_Vignette.Rnw:600-603
###################################################
names(NgList)
IsoNgTrun=NgList$IsoformNgTrun
@@ -232,36 +225,35 @@ IsoNgTrun[c(1:3,201:203,601:603)]
###################################################
-### code chunk number 29: EBSeq_Vignette.Rnw:619-620 (eval = FALSE)
+### code chunk number 28: EBSeq_Vignette.Rnw:634-635 (eval = FALSE)
###################################################
## IsoNgTrun = scan(file="output_name.ngvec", what=0, sep="\n")
###################################################
-### code chunk number 30: EBSeq_Vignette.Rnw:633-638 (eval = FALSE)
+### code chunk number 29: EBSeq_Vignette.Rnw:648-652 (eval = FALSE)
###################################################
## IsoSizes=MedianNorm(IsoMat)
## IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun,
## Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5)
-## IsoPP=GetPPMat(IsoEBOut)
-## IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)]
+## IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05)
###################################################
-### code chunk number 31: EBSeq_Vignette.Rnw:640-641
+### code chunk number 30: EBSeq_Vignette.Rnw:654-655
###################################################
-str(IsoDE)
+str(IsoEBDERes)
###################################################
-### code chunk number 32: EBSeq_Vignette.Rnw:646-648
+### code chunk number 31: EBSeq_Vignette.Rnw:660-662
###################################################
IsoFC=PostFC(IsoEBOut)
str(IsoFC)
###################################################
-### code chunk number 33: EBSeq_Vignette.Rnw:659-662
+### code chunk number 32: EBSeq_Vignette.Rnw:673-676
###################################################
IsoEBOut$Alpha
IsoEBOut$Beta
@@ -269,7 +261,7 @@ IsoEBOut$P
###################################################
-### code chunk number 34: EBSeq_Vignette.Rnw:681-686
+### code chunk number 33: EBSeq_Vignette.Rnw:695-700
###################################################
par(mfrow=c(2,2))
PolyFitValue=vector("list",3)
@@ -279,7 +271,7 @@ for(i in 1:3)
###################################################
-### code chunk number 35: EBSeq_Vignette.Rnw:699-708
+### code chunk number 34: EBSeq_Vignette.Rnw:713-722
###################################################
PolyAll=PolyFitPlot(unlist(IsoEBOut$C1Mean), unlist(IsoEBOut$C1EstVar),5)
lines(log10(IsoEBOut$C1Mean[[1]][PolyFitValue[[1]]$sort]),
@@ -293,21 +285,21 @@ col=c("red","yellow","pink","green"),lty=1,lwd=3,box.lwd=2)
###################################################
-### code chunk number 36: EBSeq_Vignette.Rnw:721-723
+### code chunk number 35: EBSeq_Vignette.Rnw:735-737
###################################################
par(mfrow=c(2,3))
QQP(IsoEBOut)
###################################################
-### code chunk number 37: EBSeq_Vignette.Rnw:735-737
+### code chunk number 36: EBSeq_Vignette.Rnw:749-751
###################################################
par(mfrow=c(2,3))
DenNHist(IsoEBOut)
###################################################
-### code chunk number 38: EBSeq_Vignette.Rnw:754-758
+### code chunk number 37: EBSeq_Vignette.Rnw:768-772
###################################################
Conditions=c("C1","C1","C2","C2","C3","C3")
PosParti=GetPatterns(Conditions)
@@ -316,14 +308,14 @@ PlotPattern(PosParti)
###################################################
-### code chunk number 39: EBSeq_Vignette.Rnw:765-767
+### code chunk number 38: EBSeq_Vignette.Rnw:779-781
###################################################
Parti=PosParti[-3,]
Parti
###################################################
-### code chunk number 40: EBSeq_Vignette.Rnw:773-779 (eval = FALSE)
+### code chunk number 39: EBSeq_Vignette.Rnw:787-793 (eval = FALSE)
###################################################
## data(MultiGeneMat)
## MultiSize=MedianNorm(MultiGeneMat)
@@ -334,7 +326,7 @@ Parti
###################################################
-### code chunk number 41: EBSeq_Vignette.Rnw:783-788
+### code chunk number 40: EBSeq_Vignette.Rnw:797-802
###################################################
MultiPP=GetMultiPP(MultiOut)
names(MultiPP)
@@ -344,28 +336,28 @@ MultiPP$Patterns
###################################################
-### code chunk number 42: EBSeq_Vignette.Rnw:795-797
+### code chunk number 41: EBSeq_Vignette.Rnw:809-811
###################################################
MultiFC=GetMultiFC(MultiOut)
str(MultiFC)
###################################################
-### code chunk number 43: EBSeq_Vignette.Rnw:806-808
+### code chunk number 42: EBSeq_Vignette.Rnw:820-822
###################################################
par(mfrow=c(2,2))
QQP(MultiOut)
###################################################
-### code chunk number 44: EBSeq_Vignette.Rnw:816-818
+### code chunk number 43: EBSeq_Vignette.Rnw:830-832
###################################################
par(mfrow=c(2,2))
DenNHist(MultiOut)
###################################################
-### code chunk number 45: EBSeq_Vignette.Rnw:833-836
+### code chunk number 44: EBSeq_Vignette.Rnw:847-850
###################################################
Conditions=c("C1","C1","C2","C2","C3","C3","C4","C4")
PosParti.4Cond=GetPatterns(Conditions)
@@ -373,7 +365,7 @@ PosParti.4Cond
###################################################
-### code chunk number 46: EBSeq_Vignette.Rnw:841-844
+### code chunk number 45: EBSeq_Vignette.Rnw:855-858
###################################################
PlotPattern(PosParti.4Cond)
Parti.4Cond=PosParti.4Cond[c(1,2,3,8,15),]
@@ -381,7 +373,7 @@ Parti.4Cond
###################################################
-### code chunk number 47: EBSeq_Vignette.Rnw:851-862 (eval = FALSE)
+### code chunk number 46: EBSeq_Vignette.Rnw:865-876 (eval = FALSE)
###################################################
## data(IsoMultiList)
## IsoMultiMat=IsoMultiList[[1]]
@@ -397,7 +389,7 @@ Parti.4Cond
###################################################
-### code chunk number 48: EBSeq_Vignette.Rnw:864-869
+### code chunk number 47: EBSeq_Vignette.Rnw:878-883
###################################################
names(MultiPP)
IsoMultiPP$PP[1:10,]
@@ -407,7 +399,7 @@ IsoMultiFC=GetMultiFC(IsoMultiOut)
###################################################
-### code chunk number 49: EBSeq_Vignette.Rnw:880-883
+### code chunk number 48: EBSeq_Vignette.Rnw:894-897
###################################################
par(mfrow=c(3,4))
QQP(IsoMultiOut)
@@ -415,14 +407,14 @@ QQP(IsoMultiOut)
###################################################
-### code chunk number 50: EBSeq_Vignette.Rnw:893-895
+### code chunk number 49: EBSeq_Vignette.Rnw:907-909
###################################################
par(mfrow=c(3,4))
DenNHist(IsoMultiOut)
###################################################
-### code chunk number 51: EBSeq_Vignette.Rnw:927-936
+### code chunk number 50: EBSeq_Vignette.Rnw:941-949
###################################################
data(GeneMat)
GeneMat.norep=GeneMat[,c(1,6)]
@@ -430,13 +422,12 @@ Sizes.norep=MedianNorm(GeneMat.norep)
EBOut.norep=EBTest(Data=GeneMat.norep,
Conditions=as.factor(rep(c("C1","C2"))),
sizeFactors=Sizes.norep, maxround=5)
-PP.norep=GetPPMat(EBOut.norep)
-DEfound.norep=rownames(PP.norep)[which(PP.norep[,"PPDE"]>=.95)]
+EBDERes.norep=GetDEResults(EBOut.norep)
GeneFC.norep=PostFC(EBOut.norep)
###################################################
-### code chunk number 52: EBSeq_Vignette.Rnw:946-960
+### code chunk number 51: EBSeq_Vignette.Rnw:959-972
###################################################
data(IsoList)
IsoMat=IsoList$IsoMat
@@ -449,13 +440,12 @@ IsoSizes.norep=MedianNorm(IsoMat.norep)
IsoEBOut.norep=EBTest(Data=IsoMat.norep, NgVector=IsoNgTrun,
Conditions=as.factor(c("C1","C2")),
sizeFactors=IsoSizes.norep, maxround=5)
-IsoPP.norep=GetPPMat(IsoEBOut.norep)
-IsoDE.norep=rownames(IsoPP.norep)[which(IsoPP.norep[,"PPDE"]>=.95)]
+IsoEBDERes.norep=GetDEResults(IsoEBOut.norep)
IsoFC.norep=PostFC(IsoEBOut.norep)
###################################################
-### code chunk number 53: EBSeq_Vignette.Rnw:969-981
+### code chunk number 52: EBSeq_Vignette.Rnw:981-993
###################################################
data(MultiGeneMat)
MultiGeneMat.norep=MultiGeneMat[,c(1,3,5)]
@@ -472,7 +462,7 @@ MultiFC.norep=GetMultiFC(MultiOut.norep)
###################################################
-### code chunk number 54: EBSeq_Vignette.Rnw:993-1012
+### code chunk number 53: EBSeq_Vignette.Rnw:1005-1024
###################################################
data(IsoMultiList)
IsoMultiMat=IsoMultiList[[1]]
diff --git a/inst/doc/EBSeq_Vignette.Rnw b/inst/doc/EBSeq_Vignette.Rnw
index b384e0c..bed7960 100644
--- a/inst/doc/EBSeq_Vignette.Rnw
+++ b/inst/doc/EBSeq_Vignette.Rnw
@@ -1,4 +1,4 @@
-%\VignetteIndexEntry{EBSeq}
+%\VignetteIndexEntry{EBSeq Vignette}
\documentclass{article}
\usepackage{fullpage}
@@ -236,21 +236,25 @@ Please note this may take several minutes:
EBOut=EBTest(Data=GeneMat,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5)
@
-\noindent The posterior probabilities of being DE are obtained as follows, where \verb+PP+ is a matrix containing the posterior probabilities of
-being EE or DE for each of the 1,000 simulated genes:
+\noindent The list of DE genes and the posterior probabilities of being DE are obtained as follows
<<>>=
-PP=GetPPMat(EBOut)
-str(PP)
-head(PP)
+EBDERes=GetDEResults(EBOut, FDR=0.05)
+str(EBDERes$DEfound)
+head(EBDERes$PPMat)
+str(EBDERes$Status)
@
-\noindent The matrix \verb+PP+ contains two columns \verb+PPEE+ and \verb+PPDE+,
+\noindent \verb+EBDERes$DEfound+ is a list of genes identified with 5\% FDR. EBSeq found
+95 genes. The matrix \verb+EBDERes$PPMat+ contains two columns \verb+PPEE+ and \verb+PPDE+,
corresponding to the posterior probabilities of being EE or DE for each gene.
-\verb+PP+ may be used to form an FDR-controlled list of DE genes with a target FDR of 0.05 as follows:
-<<>>=
-DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)]
-str(DEfound)
-@
-\noindent EBSeq found 98 DE genes in total with target FDR 0.05.
+\verb+EBDERes$Status+ contains each gene's status called by EBSeq.
+
+\noindent Note the \verb+GetDEResults()+ was incorporated in EBSeq since version 1.7.1.
+By using the default settings, the number of genes identified in any given analysis may
+differ slightly from the previous version. The updated algorithm is more robust to outliers
+and transcripts with low variance. To obtain results that are comparable
+to results from earlier versions of EBSeq ($\le$ 1.7.0), a user may set
+\verb+Method="classic"+ in \verb+GetDEResults()+ function, or use the \verb+GetPPMat()+ function.
+
\subsection{Isoform level DE analysis (two conditions)}
\label{sec:startisode}
@@ -335,13 +339,24 @@ required in practice (see Section \ref{sec:detailedisodeconverge} for details).
<<>>=
IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5)
-IsoPP=GetPPMat(IsoEBOut)
-str(IsoPP)
-head(IsoPP)
-IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)]
-str(IsoDE)
+IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05)
+str(IsoEBDERes$DEfound)
+head(IsoEBDERes$PPMat)
+str(IsoEBDERes$Status)
@
-\noindent We see that EBSeq found 105 DE isoforms at the target FDR of 0.05.
+\noindent We see that EBSeq found 104 DE isoforms at the target FDR of 0.05.
+
+\noindent Note the \verb+GetDEResults()+ was incorporated in EBSeq since version 1.7.1.
+By using the default settings, the number of transcripts identified in any given analysis may
+differ slightly from the previous version. The updated algorithm is more robust to outliers
+and transcripts with low variance. To obtain results that are comparable
+to results from earlier versions of EBSeq ($\le$ 1.7.0), a user may set
+\verb+Method="classic"+ in \verb+GetDEResults()+ function, or use the \verb+GetPPMat()+ function.
+
+
+
+
+
\subsection{Gene level DE analysis (more than two conditions)}
\label{sec:startmulticond}
@@ -472,15 +487,15 @@ data(GeneMat)
Sizes=MedianNorm(GeneMat)
EBOut=EBTest(Data=GeneMat,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5)
-PP=GetPPMat(EBOut)
+EBDERes=GetDEResults(EBOut, FDR=0.05)
@
<<>>=
-str(PP)
-head(PP)
-DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)]
-str(DEfound)
+EBDERes=GetDEResults(EBOut, FDR=0.05)
+str(EBDERes$DEfound)
+head(EBDERes$PPMat)
+str(EBDERes$Status)
@
-\noindent EBSeq found 98 DE genes at a target FDR of 0.05.\\
+\noindent EBSeq found 95 DE genes at a target FDR of 0.05.\\
\subsubsection{Calculating FC}
\label{sec:detailedgenedefc}
@@ -634,13 +649,12 @@ EBSeq can be applied as described in Section \ref{sec:startisoderun}.
IsoSizes=MedianNorm(IsoMat)
IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5)
-IsoPP=GetPPMat(IsoEBOut)
-IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)]
+IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05)
@
<<>>=
-str(IsoDE)
+str(IsoEBDERes)
@
-\noindent We see that EBSeq found 105 DE isoforms at a target FDR of 0.05.
+\noindent We see that EBSeq found 104 DE isoforms at a target FDR of 0.05.
The function \verb+PostFC+ could also be used here to calculate the Fold Change (FC)
as well as the posterior FC on the normalization factor adjusted data.
<<>>=
@@ -922,7 +936,7 @@ This approach works well when there are no more than 50\% DE genes in the data s
To generate a data set with no replicates, we take the first sample of each condition.
For example, using the data from Section \ref{sec:detailedgenede}, we take sample 1 from condition 1 and
-sample 6 from condition 2. Functions \verb+MedianNorm+, \verb+GetPPMat+ and
+sample 6 from condition 2. Functions \verb+MedianNorm+, \verb+GetDEResults+ and
\verb+PostFC+ may be used on data without replicates.
<<>>=
data(GeneMat)
@@ -931,8 +945,7 @@ Sizes.norep=MedianNorm(GeneMat.norep)
EBOut.norep=EBTest(Data=GeneMat.norep,
Conditions=as.factor(rep(c("C1","C2"))),
sizeFactors=Sizes.norep, maxround=5)
-PP.norep=GetPPMat(EBOut.norep)
-DEfound.norep=rownames(PP.norep)[which(PP.norep[,"PPDE"]>=.95)]
+EBDERes.norep=GetDEResults(EBOut.norep)
GeneFC.norep=PostFC(EBOut.norep)
@
@@ -955,8 +968,7 @@ IsoSizes.norep=MedianNorm(IsoMat.norep)
IsoEBOut.norep=EBTest(Data=IsoMat.norep, NgVector=IsoNgTrun,
Conditions=as.factor(c("C1","C2")),
sizeFactors=IsoSizes.norep, maxround=5)
-IsoPP.norep=GetPPMat(IsoEBOut.norep)
-IsoDE.norep=rownames(IsoPP.norep)[which(IsoPP.norep[,"PPDE"]>=.95)]
+IsoEBDERes.norep=GetDEResults(IsoEBOut.norep)
IsoFC.norep=PostFC(IsoEBOut.norep)
@
@@ -1040,12 +1052,95 @@ proofreading the vignette.
low expressed genes (genes whose 75th quantile of normalized counts is less
than 10) before identifying DE genes.
These two thresholds can be changed in EBTest function.
-We found that low expressed genes are more easily to be affected by noises.
-Removing these genes prior to downstream analyses can improve the
-model fitting and reduce impacts of noisy genes (e.g. genes with outliers).
+Because low expressed genes are disproportionately noisy,
+removing these genes prior to downstream analyses can improve model fitting and increase robustness
+(e.g. by removing outliers).
2014-5-22: In EBSeq 1.5.2, numerical approximations are implemented to deal with
underflow. The underflow is likely due to large number of samples.
+
+2015-1-29: In EBSeq 1.7.1, EBSeq incorporates a new function
+GetDEResults() which may be used to obtain a list of transcripts under a target FDR in a two-condition experiment.
+The results obtained by applying this function with its default setting will be
+more robust to transcripts with low variance and potential outliers.
+By using the default settings in this function,
+the number of genes identified in any given analysis may
+differ slightly from the previous version (1.7.0 or order).
+To obtain results that are comparable
+to results from earlier versions of EBSeq (1.7.0 or older), a user may set
+Method="classic" in GetDEResults() function, or use the original GetPPMat() function.
+The GeneDEResults() function also allows a user to modify thresholds to
+target genes/isoforms with a pre-specified posterior fold change.
+
+Also, in EBSeq 1.7.1, the default settings in EBTest() and EBMultiTest() function
+will only remove transcripts with all 0's (instead of removing transcripts with
+75th quantile less than 10 in version 1.3.3-1.7.0).
+To obtain a list of transcripts comparable to the results generated by EBSeq version 1.3.3-1.7.0, a user may change Qtrm = 0.75 and QtrmCut = 10
+when applying EBTest() or EBMultiTest() function.
+
+\section{Common Q and A}
+\subsection{Read in data}
+
+csv file:
+
+\verb+In=read.csv("FileName", stringsAsFactors=F, row.names=1, header=T)+
+
+\verb+Data=data.matrix(In)+
+
+\noindent txt file:
+
+\verb+In=read.table("FileName", stringsAsFactors=F, row.names=1, header=T)+
+
+\verb+Data=data.matrix(In)+
+
+\noindent Check \verb+str(Data)+ and make sure it is a matrix instead of data frame. You may need to play around with the \verb+row.names+
+and \verb+header+ option depends on how the input file was generated.
+
+
+
+\subsection{GetDEResults() function not found}
+
+You may on an earlier version of EBSeq. The GetDEResults function
+was introduced since version 1.7.1.
+The latest release version could be found at:
+
+\url{http://www.bioconductor.org/packages/release/bioc/html/EBSeq.html}
+
+\noindent The latest devel version:
+
+\url{http://www.bioconductor.org/packages/devel/bioc/html/EBSeq.html}
+
+\noindent And you may check your package version by typing \verb+packageVersion("EBSeq")+.
+
+
+\subsection{Visualizing DE genes/isoforms}
+
+To generate a heatmap, you may consider the heatmap.2 function in gplots package.
+For example, you may run
+
+\verb+heatmap.2(NormalizedMatrix[GenesOfInterest,], scale="row", trace="none", Colv=F)+
+
+The normalized matrix may be obtained from \verb+GetNormalizedMat()+ function.
+
+
+\subsection{My favorite gene/isoform has NA in PP (status "NoTest")}
+
+\indent The NoTest status comes from two sources:
+
+1) In version 1.3.3-1.7.0, using the default parameter settings of EBMultiTest(), the function
+will not test on genes with more than 75\% values $\le$ 10 to ensure better
+model fitting. To disable this filter, you may set Qtrm=1 and
+QtrmCut=0.
+
+2) numerical over/underflow in R. That happens when the within
+condition variance is extremely large or small. we did implemented a numerical
+approximation step to calculate the approximated PP for these genes
+with over/underflow. Here we use $10^{-10}$ to approximate the parameter p
+in the NB distribution for these genes (we set it to a small value
+since we want to cover more over/underflow genes with low
+within-condition variation). You may try to tune this value (to a larger value) in the
+approximation by setting \verb+ApproxVal+ in \verb+EBTest()+ or \verb+EBMultiTest()+ function.
+
\pagebreak
\bibliographystyle{plain}
\bibliography{lengetal}
diff --git a/inst/doc/EBSeq_Vignette.pdf b/inst/doc/EBSeq_Vignette.pdf
index 106d468..2ba03da 100644
Binary files a/inst/doc/EBSeq_Vignette.pdf and b/inst/doc/EBSeq_Vignette.pdf differ
diff --git a/man/EBMultiTest.Rd b/man/EBMultiTest.Rd
index 28a5c34..7b93b9d 100644
--- a/man/EBMultiTest.Rd
+++ b/man/EBMultiTest.Rd
@@ -10,7 +10,7 @@ of interested patterns in a multiple condition study
\usage{
EBMultiTest(Data, NgVector = NULL, Conditions, AllParti = NULL,
sizeFactors, maxround, Pool = F, NumBin = 1000,
- ApproxVal=10^-10, PoolLower=.25, PoolUpper = .75, Print=T,Qtrm=.75,QtrmCut=10)
+ ApproxVal=10^-10, PoolLower=.25, PoolUpper = .75, Print=T,Qtrm=1,QtrmCut=0)
}
\arguments{
@@ -45,8 +45,8 @@ We use the cross condition variance estimations for the candidate genes and the
\item{Print}{Whether print the elapsed-time while running the test.}
\item{Qtrm, QtrmCut}{
-Transcripts with Qtrm th quantile < = QtrmCut will be removed before testing. The default value is Qtrm = 0.75 and QtrmCut=10.
-By default setting, transcripts that have >75\% of the samples with expression less than 10
+Transcripts with Qtrm th quantile < = QtrmCut will be removed before testing. The default value is Qtrm = 1 and QtrmCut=0.
+By default setting, transcripts with all 0's
won't be tested.
}
}
diff --git a/man/EBTest.Rd b/man/EBTest.Rd
index 97e4d39..28f23f4 100644
--- a/man/EBTest.Rd
+++ b/man/EBTest.Rd
@@ -11,7 +11,7 @@ Base on the assumption of NB-Beta Empirical Bayes model, the EM algorithm is use
EBTest(Data, NgVector = NULL, Conditions, sizeFactors, maxround,
Pool = F, NumBin = 1000, ApproxVal = 10^-10, Alpha = NULL,
Beta = NULL, PInput = NULL, RInput = NULL,
- PoolLower = .25, PoolUpper = .75, Print = T, Qtrm = .75,QtrmCut=10)
+ PoolLower = .25, PoolUpper = .75, Print = T, Qtrm = 1,QtrmCut=0)
}
\arguments{
@@ -36,8 +36,8 @@ We use the cross condition variance estimations for the candidate genes and the
\item{Alpha, Beta, PInput, RInput}{If the parameters are known and the user doesn't want to estimate them from the data, user could specify them here.}
\item{Print}{Whether print the elapsed-time while running the test.}
\item{Qtrm, QtrmCut}{
-Transcripts with Qtrm th quantile < = QtrmCut will be removed before testing. The default value is Qtrm = 0.75 and QtrmCut=10.
-By default setting, transcripts that have >75\% of the samples with expression less than 10
+Transcripts with Qtrm th quantile < = QtrmCut will be removed before testing. The default value is Qtrm = 1 and QtrmCut=0.
+By default setting, transcripts with all 0's
won't be tested.
}
}
@@ -77,6 +77,7 @@ Transcripts with expression 0 for all samples are shown as PP(EE) = PP(DE) = NA
The transcript order is exactly the same as the order of the input data.}
\item{ConditionOrder}{The condition assignment for C1Mean, C2Mean, etc.}
\item{Conditions}{The input conditions.}
+\item{DataNorm}{Normalized expression matrix.}
}
\references{
Ning Leng, John A. Dawson, James A. Thomson, Victor Ruotti, Anna I. Rissman, Bart M.G. Smits, Jill D. Haag, Michael N. Gould, Ron M. Stewart, and Christina Kendziorski. EBSeq: An empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics (2013)
diff --git a/man/GetDEResults.Rd b/man/GetDEResults.Rd
new file mode 100644
index 0000000..34c51f8
--- /dev/null
+++ b/man/GetDEResults.Rd
@@ -0,0 +1,104 @@
+\name{GetDEResults}
+\alias{GetDEResults}
+\title{
+Obtain Differential Expression Analysis Results in a Two-condition Test
+}
+\description{
+Obtain DE analysis results in a two-condition test using the output of EBTest()
+}
+\usage{
+GetDEResults(EBPrelim, FDR=0.05, Method="robust",
+ FDRMethod="hard", Threshold_FC=0.7,
+ Threshold_FCRatio=0.3, SmallNum=0.01)
+}
+\arguments{
+ \item{EBPrelim}{Output from the function EBTest().}
+ \item{FDR}{Target FDR, defaut is 0.05.}
+ \item{FDRMethod}{"hard" or "soft".
+ Giving a target FDR alpha, either hard threshold and soft
+ threshold may be used. If the hard threshold is preferred, DE transcripts are
+ defined as the the transcripts with PP(DE) greater than
+ (1-alpha). Using the hard threshold, any DE transcript in the list
+ has FDR <= alpha.
+
+ If the soft threshold is preferred, the DE transcripts are defined as the
+ transcripts with PP(DE) greater than crit_fun(PPEE, alpha). Using
+ the soft threshold, the list of DE transcripts has average FDR
+ alpha.
+
+ Based on results from our simulation studies, hard thresholds provide a better-controlled
+ empirical FDR when sample size is relatively small(Less than 10 samples in each condition).
+ User may consider the soft threshold when sample size is large to improve power.}
+ \item{Method}{"robust" or "classic".
+ Using the "robust" option, EBSeq is more robust to genes with outliers and
+ genes with extremely small variances.
+ Using the "classic" option, the results will be more comparable to those obtained
+ by using the GetPPMat() function from earlier version (<= 1.7.0) of EBSeq.
+ Default is "robust".}
+ \item{Threshold_FC}{Threshold for the fold change (FC) statistics.
+ The default is 0.7. The FC statistics are calculated as follows.
+ First the posterior FC estimates are calculated using PostFC() function.
+ The FC statistics is defined as exp(-|log posterior FC|) and therefore is always less than
+ or equal to 1.
+ The default threshold was selected as the optimal threshold learned from our simulation studies. By setting the
+ threshold as 0.7, the expected FC for a DE transcript is less than 0.7
+ (or greater than 1/0.7=1.4).
+ User may specify their own threshold here. A higher (less conservative) threshold
+ may be used here when sample size is large. Our simulation results
+ indicated that when there are more than or equal to 5 samples in each condition,
+ a less conservative threshold will improve the power when the FDR is still well-controlled.
+ The parameter will be ignored if Method is set as "classic".}
+ \item{Threshold_FCRatio}{Threshold for the fold change ratio (FCRatio) statistics.
+ The default is 0.3. The FCRatio statistics are calculated as follows.
+ First we get another revised fold change
+ statistic called Median-FC statistic for each transcript.
+ For each transcript, we calculate the median of
+ normalized expression values within each condition.
+ The MedianFC is defined as exp(-|log((C1Median+SmallNum)/(C2Median+SmallNum))|).
+ Note a small number is added to avoid Inf and NA. See SmallNum for more details.
+ The FCRatio is calculated as exp(-|log(FCstatistics/MedianFC)|).
+ Therefore it is always less than or equal to 1.
+ The default threshold was selected as the optimal threshold learned from our simulation studies.
+ By setting the threshold as 0.3, the FCRatio for a DE transcript is
+ expected to be larger than 0.3.
+ }
+ \item{SmallNum}{When calculating the FCRatio (or Median-FC), a small number is added for each transcript in each
+ condition to avoid Inf and NA. Default is 0.01.}
+}
+\details{
+GetDEResults() function takes output from EBTest() function and output a list of
+DE transcripts under a target FDR. It also provides posterior probability estimates for each
+transcript.
+}
+\value{
+ \item{DEfound}{A list of DE transcripts.}
+ \item{PPMat}{Posterior probability matrix. Transcripts are following the same order as
+ in the input matrix.
+ Transcripts that were filtered by magnitude (in EBTest function), FC, or FCR
+ are assigned with NA for both PPDE and PPEE.}
+ \item{Status}{Each transcript will be assigned with one of the following
+ values: "DE", "EE", "Filtered: Low Expression",
+ "Filtered: Fold Change" and "Filtered: Fold Change Ratio".
+ Transcripts are following the same order as in the input matrix.}
+}
+\references{
+Ning Leng, John A. Dawson, James A. Thomson, Victor Ruotti, Anna I. Rissman, Bart M.G. Smits, Jill D. Haag, Michael N. Gould, Ron M. Stewart, and Christina Kendziorski. EBSeq: An empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics (2013)
+}
+\author{
+Ning Leng, Yuan Li
+}
+\seealso{
+EBTest
+}
+\examples{
+data(GeneMat)
+str(GeneMat)
+GeneMat.small = GeneMat[c(1:10,511:550),]
+Sizes = MedianNorm(GeneMat.small)
+EBOut = EBTest(Data = GeneMat.small,
+ Conditions = as.factor(rep(c("C1","C2"), each = 5)),
+ sizeFactors = Sizes, maxround = 5)
+Out = GetDEResults(EBOut)
+}
+\keyword{ DE }
+\keyword{ Two condition }
diff --git a/man/MedianNorm.Rd b/man/MedianNorm.Rd
index 9fc47ff..afc1475 100644
--- a/man/MedianNorm.Rd
+++ b/man/MedianNorm.Rd
@@ -7,10 +7,11 @@ Median Normalization
'MedianNorm' specifies the median normalization function from Anders et. al., 2010.
}
\usage{
-MedianNorm(Data)
+MedianNorm(Data, alternative = FALSE)
}
\arguments{
\item{Data}{The data matrix with transcripts in rows and lanes in columns.}
+ \item{alternative}{if alternative = TRUE, the alternative version of median normalization will be applied.}
}
\value{The function will return a vector contains the normalization factor for each lane.}
diff --git a/vignettes/EBSeq_Vignette.Rnw b/vignettes/EBSeq_Vignette.Rnw
index b384e0c..bed7960 100644
--- a/vignettes/EBSeq_Vignette.Rnw
+++ b/vignettes/EBSeq_Vignette.Rnw
@@ -1,4 +1,4 @@
-%\VignetteIndexEntry{EBSeq}
+%\VignetteIndexEntry{EBSeq Vignette}
\documentclass{article}
\usepackage{fullpage}
@@ -236,21 +236,25 @@ Please note this may take several minutes:
EBOut=EBTest(Data=GeneMat,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5)
@
-\noindent The posterior probabilities of being DE are obtained as follows, where \verb+PP+ is a matrix containing the posterior probabilities of
-being EE or DE for each of the 1,000 simulated genes:
+\noindent The list of DE genes and the posterior probabilities of being DE are obtained as follows
<<>>=
-PP=GetPPMat(EBOut)
-str(PP)
-head(PP)
+EBDERes=GetDEResults(EBOut, FDR=0.05)
+str(EBDERes$DEfound)
+head(EBDERes$PPMat)
+str(EBDERes$Status)
@
-\noindent The matrix \verb+PP+ contains two columns \verb+PPEE+ and \verb+PPDE+,
+\noindent \verb+EBDERes$DEfound+ is a list of genes identified with 5\% FDR. EBSeq found
+95 genes. The matrix \verb+EBDERes$PPMat+ contains two columns \verb+PPEE+ and \verb+PPDE+,
corresponding to the posterior probabilities of being EE or DE for each gene.
-\verb+PP+ may be used to form an FDR-controlled list of DE genes with a target FDR of 0.05 as follows:
-<<>>=
-DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)]
-str(DEfound)
-@
-\noindent EBSeq found 98 DE genes in total with target FDR 0.05.
+\verb+EBDERes$Status+ contains each gene's status called by EBSeq.
+
+\noindent Note the \verb+GetDEResults()+ was incorporated in EBSeq since version 1.7.1.
+By using the default settings, the number of genes identified in any given analysis may
+differ slightly from the previous version. The updated algorithm is more robust to outliers
+and transcripts with low variance. To obtain results that are comparable
+to results from earlier versions of EBSeq ($\le$ 1.7.0), a user may set
+\verb+Method="classic"+ in \verb+GetDEResults()+ function, or use the \verb+GetPPMat()+ function.
+
\subsection{Isoform level DE analysis (two conditions)}
\label{sec:startisode}
@@ -335,13 +339,24 @@ required in practice (see Section \ref{sec:detailedisodeconverge} for details).
<<>>=
IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5)
-IsoPP=GetPPMat(IsoEBOut)
-str(IsoPP)
-head(IsoPP)
-IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)]
-str(IsoDE)
+IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05)
+str(IsoEBDERes$DEfound)
+head(IsoEBDERes$PPMat)
+str(IsoEBDERes$Status)
@
-\noindent We see that EBSeq found 105 DE isoforms at the target FDR of 0.05.
+\noindent We see that EBSeq found 104 DE isoforms at the target FDR of 0.05.
+
+\noindent Note the \verb+GetDEResults()+ was incorporated in EBSeq since version 1.7.1.
+By using the default settings, the number of transcripts identified in any given analysis may
+differ slightly from the previous version. The updated algorithm is more robust to outliers
+and transcripts with low variance. To obtain results that are comparable
+to results from earlier versions of EBSeq ($\le$ 1.7.0), a user may set
+\verb+Method="classic"+ in \verb+GetDEResults()+ function, or use the \verb+GetPPMat()+ function.
+
+
+
+
+
\subsection{Gene level DE analysis (more than two conditions)}
\label{sec:startmulticond}
@@ -472,15 +487,15 @@ data(GeneMat)
Sizes=MedianNorm(GeneMat)
EBOut=EBTest(Data=GeneMat,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5)
-PP=GetPPMat(EBOut)
+EBDERes=GetDEResults(EBOut, FDR=0.05)
@
<<>>=
-str(PP)
-head(PP)
-DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)]
-str(DEfound)
+EBDERes=GetDEResults(EBOut, FDR=0.05)
+str(EBDERes$DEfound)
+head(EBDERes$PPMat)
+str(EBDERes$Status)
@
-\noindent EBSeq found 98 DE genes at a target FDR of 0.05.\\
+\noindent EBSeq found 95 DE genes at a target FDR of 0.05.\\
\subsubsection{Calculating FC}
\label{sec:detailedgenedefc}
@@ -634,13 +649,12 @@ EBSeq can be applied as described in Section \ref{sec:startisoderun}.
IsoSizes=MedianNorm(IsoMat)
IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun,
Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5)
-IsoPP=GetPPMat(IsoEBOut)
-IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)]
+IsoEBDERes=GetDEResults(IsoEBOut, FDR=0.05)
@
<<>>=
-str(IsoDE)
+str(IsoEBDERes)
@
-\noindent We see that EBSeq found 105 DE isoforms at a target FDR of 0.05.
+\noindent We see that EBSeq found 104 DE isoforms at a target FDR of 0.05.
The function \verb+PostFC+ could also be used here to calculate the Fold Change (FC)
as well as the posterior FC on the normalization factor adjusted data.
<<>>=
@@ -922,7 +936,7 @@ This approach works well when there are no more than 50\% DE genes in the data s
To generate a data set with no replicates, we take the first sample of each condition.
For example, using the data from Section \ref{sec:detailedgenede}, we take sample 1 from condition 1 and
-sample 6 from condition 2. Functions \verb+MedianNorm+, \verb+GetPPMat+ and
+sample 6 from condition 2. Functions \verb+MedianNorm+, \verb+GetDEResults+ and
\verb+PostFC+ may be used on data without replicates.
<<>>=
data(GeneMat)
@@ -931,8 +945,7 @@ Sizes.norep=MedianNorm(GeneMat.norep)
EBOut.norep=EBTest(Data=GeneMat.norep,
Conditions=as.factor(rep(c("C1","C2"))),
sizeFactors=Sizes.norep, maxround=5)
-PP.norep=GetPPMat(EBOut.norep)
-DEfound.norep=rownames(PP.norep)[which(PP.norep[,"PPDE"]>=.95)]
+EBDERes.norep=GetDEResults(EBOut.norep)
GeneFC.norep=PostFC(EBOut.norep)
@
@@ -955,8 +968,7 @@ IsoSizes.norep=MedianNorm(IsoMat.norep)
IsoEBOut.norep=EBTest(Data=IsoMat.norep, NgVector=IsoNgTrun,
Conditions=as.factor(c("C1","C2")),
sizeFactors=IsoSizes.norep, maxround=5)
-IsoPP.norep=GetPPMat(IsoEBOut.norep)
-IsoDE.norep=rownames(IsoPP.norep)[which(IsoPP.norep[,"PPDE"]>=.95)]
+IsoEBDERes.norep=GetDEResults(IsoEBOut.norep)
IsoFC.norep=PostFC(IsoEBOut.norep)
@
@@ -1040,12 +1052,95 @@ proofreading the vignette.
low expressed genes (genes whose 75th quantile of normalized counts is less
than 10) before identifying DE genes.
These two thresholds can be changed in EBTest function.
-We found that low expressed genes are more easily to be affected by noises.
-Removing these genes prior to downstream analyses can improve the
-model fitting and reduce impacts of noisy genes (e.g. genes with outliers).
+Because low expressed genes are disproportionately noisy,
+removing these genes prior to downstream analyses can improve model fitting and increase robustness
+(e.g. by removing outliers).
2014-5-22: In EBSeq 1.5.2, numerical approximations are implemented to deal with
underflow. The underflow is likely due to large number of samples.
+
+2015-1-29: In EBSeq 1.7.1, EBSeq incorporates a new function
+GetDEResults() which may be used to obtain a list of transcripts under a target FDR in a two-condition experiment.
+The results obtained by applying this function with its default setting will be
+more robust to transcripts with low variance and potential outliers.
+By using the default settings in this function,
+the number of genes identified in any given analysis may
+differ slightly from the previous version (1.7.0 or order).
+To obtain results that are comparable
+to results from earlier versions of EBSeq (1.7.0 or older), a user may set
+Method="classic" in GetDEResults() function, or use the original GetPPMat() function.
+The GeneDEResults() function also allows a user to modify thresholds to
+target genes/isoforms with a pre-specified posterior fold change.
+
+Also, in EBSeq 1.7.1, the default settings in EBTest() and EBMultiTest() function
+will only remove transcripts with all 0's (instead of removing transcripts with
+75th quantile less than 10 in version 1.3.3-1.7.0).
+To obtain a list of transcripts comparable to the results generated by EBSeq version 1.3.3-1.7.0, a user may change Qtrm = 0.75 and QtrmCut = 10
+when applying EBTest() or EBMultiTest() function.
+
+\section{Common Q and A}
+\subsection{Read in data}
+
+csv file:
+
+\verb+In=read.csv("FileName", stringsAsFactors=F, row.names=1, header=T)+
+
+\verb+Data=data.matrix(In)+
+
+\noindent txt file:
+
+\verb+In=read.table("FileName", stringsAsFactors=F, row.names=1, header=T)+
+
+\verb+Data=data.matrix(In)+
+
+\noindent Check \verb+str(Data)+ and make sure it is a matrix instead of data frame. You may need to play around with the \verb+row.names+
+and \verb+header+ option depends on how the input file was generated.
+
+
+
+\subsection{GetDEResults() function not found}
+
+You may on an earlier version of EBSeq. The GetDEResults function
+was introduced since version 1.7.1.
+The latest release version could be found at:
+
+\url{http://www.bioconductor.org/packages/release/bioc/html/EBSeq.html}
+
+\noindent The latest devel version:
+
+\url{http://www.bioconductor.org/packages/devel/bioc/html/EBSeq.html}
+
+\noindent And you may check your package version by typing \verb+packageVersion("EBSeq")+.
+
+
+\subsection{Visualizing DE genes/isoforms}
+
+To generate a heatmap, you may consider the heatmap.2 function in gplots package.
+For example, you may run
+
+\verb+heatmap.2(NormalizedMatrix[GenesOfInterest,], scale="row", trace="none", Colv=F)+
+
+The normalized matrix may be obtained from \verb+GetNormalizedMat()+ function.
+
+
+\subsection{My favorite gene/isoform has NA in PP (status "NoTest")}
+
+\indent The NoTest status comes from two sources:
+
+1) In version 1.3.3-1.7.0, using the default parameter settings of EBMultiTest(), the function
+will not test on genes with more than 75\% values $\le$ 10 to ensure better
+model fitting. To disable this filter, you may set Qtrm=1 and
+QtrmCut=0.
+
+2) numerical over/underflow in R. That happens when the within
+condition variance is extremely large or small. we did implemented a numerical
+approximation step to calculate the approximated PP for these genes
+with over/underflow. Here we use $10^{-10}$ to approximate the parameter p
+in the NB distribution for these genes (we set it to a small value
+since we want to cover more over/underflow genes with low
+within-condition variation). You may try to tune this value (to a larger value) in the
+approximation by setting \verb+ApproxVal+ in \verb+EBTest()+ or \verb+EBMultiTest()+ function.
+
\pagebreak
\bibliographystyle{plain}
\bibliography{lengetal}
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-bioc-ebseq.git
More information about the debian-med-commit
mailing list