[med-svn] [r-bioc-edger] 02/05: Imported Upstream version 3.10.2+dfsg

Andreas Tille tille at debian.org
Sun Jul 19 21:07:33 UTC 2015


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository r-bioc-edger.

commit b57357dd2e92be94ae4e4c7ff746d870e07afada
Author: Andreas Tille <tille at debian.org>
Date:   Sun Jul 19 21:40:03 2015 +0200

    Imported Upstream version 3.10.2+dfsg
---
 DESCRIPTION                          |  13 +-
 NAMESPACE                            |   5 +-
 R/DGEList.R                          |  35 ++-
 R/aveLogCPM.R                        |  13 +-
 R/dispBinTrend.R                     |   4 +-
 R/dispCoxReidSplineTrend.R           |   6 +-
 R/dropEmptyLevels.R                  |  13 +
 R/estimateDisp.R                     |  38 ++-
 R/estimateTrendedDisp.R              |   4 +-
 R/glmQLFTest.R                       |  99 ++++---
 R/{treatDGE.R => glmTreat.R}         |  57 ++--
 R/goana.R                            |   6 +-
 R/locfitByCol.R                      |   8 +-
 R/mglmOneGroup.R                     |   2 +-
 R/plotMDS.DGEList.R                  |  58 ++--
 R/processAmplicons.R                 |  80 ++++--
 R/residDF.R                          |  67 +++--
 R/romer.DGEList.R                    |  39 +++
 R/subsetting.R                       |   4 +-
 R/topTags.R                          |  23 +-
 R/zscoreNBinom.R                     |  24 +-
 build/vignette.rds                   | Bin 229 -> 228 bytes
 inst/CITATION                        |   2 +-
 inst/NEWS.Rd                         |  42 +++
 inst/doc/edgeR.pdf                   | Bin 48664 -> 48664 bytes
 man/DGEExact-class.Rd                |   6 +-
 man/DGEGLM-class.Rd                  |  26 +-
 man/DGELRT-class.Rd                  |  10 +-
 man/DGEList-class.Rd                 |  16 +-
 man/DGEList.Rd                       |   2 +-
 man/asdataframe.Rd                   |   2 +-
 man/aveLogCPM.Rd                     |  11 +-
 man/binomTest.Rd                     |  20 +-
 man/calcNormFactors.Rd               |   4 +-
 man/commonCondLogLikDerDelta.Rd      |   4 +-
 man/condLogLikDerSize.Rd             |   4 +-
 man/cpm.Rd                           |   2 +-
 man/decidetestsDGE.Rd                |   4 +-
 man/dglmStdResid.Rd                  |  16 +-
 man/dim.Rd                           |   6 +-
 man/dispBinTrend.Rd                  |   8 +-
 man/dispCoxReid.Rd                   |   6 +-
 man/dispCoxReidInterpolateTagwise.Rd |  24 +-
 man/dispCoxReidSplineTrend.Rd        |   8 +-
 man/dropEmptyLevels.Rd               |  32 +++
 man/edgeR-package.Rd                 |   3 +
 man/estimateCommonDisp.Rd            |   6 +-
 man/estimateDisp.Rd                  |  12 +-
 man/estimateGLMCommonDisp.Rd         |   6 +-
 man/estimateGLMRobustDisp.Rd         |  12 +-
 man/estimateGLMTagwiseDisp.Rd        |  17 +-
 man/estimateGLMTrendedDisp.Rd        |   9 +-
 man/estimateTagwiseDisp.Rd           |   8 +-
 man/estimateTrendedDisp.Rd           |   4 +-
 man/exactTest.Rd                     |   8 +-
 man/getCounts.Rd                     |   2 +-
 man/getPriorN.Rd                     |   4 +-
 man/glmQLFTest.Rd                    | 129 +++++----
 man/{treatDGE.Rd => glmTreat.Rd}     |  38 ++-
 man/glmfit.Rd                        |  32 +--
 man/gof.Rd                           |  12 +-
 man/goodTuring.Rd                    |   2 +-
 man/maPlot.Rd                        |   6 +-
 man/meanvar.Rd                       |  28 +-
 man/mglm.Rd                          |  12 +-
 man/nbinomDeviance.Rd                |   4 +-
 man/plotBCV.Rd                       |   6 +-
 man/plotMDS.DGEList.Rd               |  45 ++-
 man/plotQLDisp.Rd                    |   4 +-
 man/plotSmear.Rd                     |  14 +-
 man/plotSpliceDGE.Rd                 |   6 +-
 man/predFC.Rd                        |   2 +-
 man/processAmplicons.Rd              |   9 +-
 man/readDGE.Rd                       |   6 +-
 man/roast.DGEList.Rd                 |   2 +-
 man/romer.DGEList.Rd                 |  80 ++++++
 man/splitIntoGroups.Rd               |   2 +-
 man/subsetting.Rd                    |   2 +-
 man/topTags.Rd                       |   9 +-
 man/weightedCondLogLikDerDelta.Rd    |  13 +-
 src/R_one_group.cpp                  |  20 +-
 src/R_process_hairpin_reads.c        | 543 ++++++++++++++++-------------------
 src/init.cpp                         |   4 +-
 src/utils.h                          |  12 +-
 tests/edgeR-Tests.Rout.save          | 221 ++++----------
 vignettes/edgeR.Rnw                  |  48 ----
 86 files changed, 1235 insertions(+), 1020 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index 4fcf5b6..e700f3b 100755
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,9 +1,10 @@
 Package: edgeR
-Version: 3.8.2
-Date: 2014/10/17
+Version: 3.10.2
+Date: 2015/05/24
 Title: Empirical analysis of digital gene expression data in R
-Author: Yunshun Chen <yuchen at wehi.edu.au>, Davis McCarthy <dmccarthy at wehi.edu.au>, Aaron Lun <alun at wehi.edu.au>, Xiaobei Zhou <xiaobei.zhou at uzh.ch>, Mark Robinson <mark.robinson at imls.uzh.ch>, Gordon Smyth <smyth at wehi.edu.au>
-Maintainer: Yunshun Chen <yuchen at wehi.edu.au>, Mark Robinson <mark.robinson at imls.uzh.ch>, Davis McCarthy <dmccarthy at wehi.edu.au>, Gordon Smyth <smyth at wehi.edu.au>
+Description: Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. As well as RNA-seq, it be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, SAGE and CAGE.
+Author: Yunshun Chen <yuchen at wehi.edu.au>, Aaron Lun <alun at wehi.edu.au>, Davis McCarthy <dmccarthy at wehi.edu.au>, Xiaobei Zhou <xiaobei.zhou at uzh.ch>, Mark Robinson <mark.robinson at imls.uzh.ch>, Gordon Smyth <smyth at wehi.edu.au>
+Maintainer: Yunshun Chen <yuchen at wehi.edu.au>, Aaron Lun <alun at wehi.edu.au>, Mark Robinson <mark.robinson at imls.uzh.ch>, Davis McCarthy <dmccarthy at wehi.edu.au>, Gordon Smyth <smyth at wehi.edu.au>
 License: GPL (>=2)
 Depends: R (>= 2.15.0), limma
 Imports: methods
@@ -14,5 +15,5 @@ biocViews: GeneExpression, Transcription, AlternativeSplicing,
         GeneSetEnrichment, Genetics, Bayesian, Clustering, Regression,
         TimeCourse, SAGE, Sequencing, ChIPSeq, RNASeq, BatchEffect,
         MultipleComparison, Normalization, QualityControl
-Description: Differential expression analysis of RNA-seq and digital gene expression profiles with biological replication.  Uses empirical Bayes estimation and exact tests based on the negative binomial distribution.  Also useful for differential signal analysis with other types of genome-scale count data.
-Packaged: 2014-10-18 00:59:00 UTC; biocbuild
+NeedsCompilation: yes
+Packaged: 2015-05-26 01:55:23 UTC; biocbuild
diff --git a/NAMESPACE b/NAMESPACE
index 4e884e7..1379f7c 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -8,7 +8,7 @@ exportClasses("DGEList","DGEExact","DGEGLM","DGELRT","TopTags")
 exportMethods("show")
 
 import(methods)
-importFrom("limma",camera,goana,is.fullrank,loessFit,mroast,nonEstimable,plotMDS,roast,squeezeVar,subsetListOfArrays)
+importFrom("limma",camera,contrastAsCoef,decideTests,goana,is.fullrank,lmFit,loessFit,mroast,nonEstimable,plotMDS,removeExt,roast,romer,squeezeVar,subsetListOfArrays,tricubeMovingAverage,zscoreGamma)
 importClassesFrom("limma","LargeDataObject","Roast","MDS")
 
 S3method(as.data.frame,TopTags)
@@ -36,6 +36,7 @@ S3method(plotMDS,DGEList)
 S3method(camera,DGEList)
 S3method(roast,DGEList)
 S3method(mroast,DGEList)
+S3method(romer,DGEList)
 S3method(aveLogCPM,default)
 S3method(aveLogCPM,DGEList)
 S3method(aveLogCPM,DGEGLM)
@@ -56,3 +57,5 @@ S3method(sumTechReps,default)
 S3method(sumTechReps,DGEList)
 S3method(goana,DGEExact)
 S3method(goana,DGELRT)
+S3method(glmQLFit,default)
+S3method(glmQLFit,DGEList)
\ No newline at end of file
diff --git a/R/DGEList.R b/R/DGEList.R
index 32d785d..e50565b 100644
--- a/R/DGEList.R
+++ b/R/DGEList.R
@@ -1,13 +1,13 @@
 DGEList <- function(counts=matrix(0,0,0), lib.size=colSums(counts), norm.factors=rep(1,ncol(counts)), group=rep(1,ncol(counts)), genes=NULL, remove.zeros=FALSE) 
 #	Construct DGEList object from components, with some checking
-#	Last modified 5 June 2014
+#	Last modified 8 Feb 2015
 {
 #	Check counts
 	counts <- as.matrix(counts)
 	nlib <- ncol(counts)
 	ntags <- nrow(counts)
-	if(nlib>0 && is.null(colnames(counts))) colnames(counts) <- paste("Sample",1:ncol(counts),sep="")
-	if(ntags>0 && is.null(rownames(counts))) rownames(counts) <- 1:ntags
+	if(nlib>0L && is.null(colnames(counts))) colnames(counts) <- paste0("Sample",1L:nlib)
+	if(ntags>0L && is.null(rownames(counts))) rownames(counts) <- 1L:ntags
 
 #	Check lib.size
 	if(is.null(lib.size)) lib.size <- colSums(counts)
@@ -19,24 +19,33 @@ DGEList <- function(counts=matrix(0,0,0), lib.size=colSums(counts), norm.factors
 
 #	Check group
 	if(is.null(group)) group <- rep(1,ncol(counts))
-	group <- .check.factor(group)
+	group <- dropEmptyLevels(group)
 	if(nlib != length(group)) stop("Length of 'group' must equal number of columns in 'counts'")
 
+#	Make data frame of sample information
 	samples <- data.frame(group=group,lib.size=lib.size,norm.factors=norm.factors)
-	row.names(samples) <- colnames(counts)
+	if(anyDuplicated(colnames(counts))) {
+		message("Repeated column names found in count matrix")
+		row.names(samples) <- 1L:nlib
+	} else 
+		row.names(samples) <- colnames(counts)
+
+#	Make object
 	x <- new("DGEList",list(counts=counts,samples=samples))
 
+#	Add data frame of gene information
 	if(!is.null(genes)) {
 		genes <- as.data.frame(genes, stringsAsFactors=FALSE)
-		if(nrow(genes) != ntags) stop("counts and genes have different numbers of rows")
+		if(nrow(genes) != ntags) stop("Counts and genes have different numbers of rows")
 		x$genes <- genes
 	}
 
+#	Remove rows with all zeros
 	if(remove.zeros) {
-		all.zeros <- rowSums(counts,na.rm=TRUE)==0
+		all.zeros <- rowSums(counts>0,na.rm=TRUE)==0
 		if(any(all.zeros)) {
 			x <- x[!all.zeros,]
-			message("Removing ",sum(all.zeros)," rows with all zero counts.")
+			message("Removing ",sum(all.zeros)," rows with all zero counts")
 		}
 	}
 
@@ -46,13 +55,3 @@ DGEList <- function(counts=matrix(0,0,0), lib.size=colSums(counts), norm.factors
 	x
 }
 
-.check.factor <- function(x)
-{
-	if(is.factor(x)) {
-		i <- table(x)>0
-		if(!all(i)) x <- factor(x,levels=levels(x)[i])
-	} else {
-		x <- factor(x)
-	}
-	x
-}
diff --git a/R/aveLogCPM.R b/R/aveLogCPM.R
index 5b88d53..0b3c601 100644
--- a/R/aveLogCPM.R
+++ b/R/aveLogCPM.R
@@ -45,7 +45,8 @@ aveLogCPM.default <- function(y,lib.size=NULL,offset=NULL,prior.count=2,dispersi
 	if(any(y<0)) stop("counts must be non-negative")
 
 #	Check prior.count
-	if(prior.count<0) prior.count <- 0
+	neg.prior <- prior.count < 0
+	if(any(neg.prior)) prior.count[neg.prior] <- 0
 
 #	Check dispersion
 	if(is.null(dispersion)) dispersion <- 0.05
@@ -70,14 +71,20 @@ aveLogCPM.default <- function(y,lib.size=NULL,offset=NULL,prior.count=2,dispersi
 		return( (abundance+log(1e6)) / log(2) )
 	}
 
+#	Ensuring lib.size has appropriate dimensions for prior.count
+	if(length(prior.count)>1L) {
+		if(nrow(y)!=length(prior.count)) stop("length of prior count vector should be equal to the number of rows")
+		lib.size <- expandAsMatrix(lib.size, dim(y)) 
+	}
+
 #	Scale prior counts to preserve fold changes
-	prior.count.scaled <- lib.size/mean.lib.size * prior.count
+	prior.count.scaled <- lib.size/mean.lib.size*prior.count
 
 #	Add double prior counts to library sizes
 	offset <- log(lib.size+2*prior.count.scaled)
 
 #	Add prior counts to y
-	if(is.null(dim(prior.count.scaled))) prior.count.scaled <- matrix(1,nrow(y),1) %*% prior.count.scaled
+	if (is.null(dim(prior.count.scaled))) prior.count.scaled <- expandAsMatrix(prior.count.scaled, dim(y))
 	y <- y+prior.count.scaled
 
 	abundance <- mglmOneGroup(y,dispersion=dispersion,offset=offset,weights=weights)
diff --git a/R/dispBinTrend.R b/R/dispBinTrend.R
index edf461c..d0fb75a 100644
--- a/R/dispBinTrend.R
+++ b/R/dispBinTrend.R
@@ -76,13 +76,13 @@ dispBinTrend <- function(y, design=NULL, offset=NULL, df=5, span=0.3, min.n=400,
 
 #	Spline smoother through binned dispersions
 	if( method.trend=="spline" ) {
-		require("splines")
+		if(!requireNamespace("splines",quietly=TRUE)) stop("splines required but is not available")
 		p1 <- (1:(df-1))/df
 		knots1 <- quantile(bin.A,p=p1)
 		r <- range(bin.A)
 		knots2 <- r[1]+p1*(r[2]-r[1])
 		knots <- 0.3*knots1+0.7*knots2
-		basisbins <- ns(bin.A,df=df,knots=knots,intercept=TRUE)
+		basisbins <- splines::ns(bin.A,df=df,knots=knots,intercept=TRUE)
 		beta <- coefficients(lm.fit(basisbins, sqrt(bin.d)))
 		basisall <- predict(basisbins,newx=AveLogCPM)
 		dispersion <- drop(basisall %*% beta)^2
diff --git a/R/dispCoxReidSplineTrend.R b/R/dispCoxReidSplineTrend.R
index f108840..f00142a 100644
--- a/R/dispCoxReidSplineTrend.R
+++ b/R/dispCoxReidSplineTrend.R
@@ -1,7 +1,7 @@
 dispCoxReidSplineTrend <- function(y, design, offset=NULL, df = 5, subset=10000, AveLogCPM=NULL, method.optim="Nelder-Mead", trace=0)
 #	Estimate spline trend dispersion
 #	Gordon Smyth, Yunshun Chen, Davis McCarthy
-#	Created 16 Dec 2010.  Last modified 3 Oct 2012.
+#	Created 16 Dec 2010.  Last modified 2 Feb 2015.
 {
 	y <- as.matrix(y)
 	nlibs <- ncol(y)
@@ -21,13 +21,13 @@ dispCoxReidSplineTrend <- function(y, design, offset=NULL, df = 5, subset=10000,
 	y.nonzero <- y[!all.zero,]
 	
 #	Spline basis
-	require("splines")
+	if(!requireNamespace("splines",quietly=TRUE)) stop("splines required but is not available")
 	p1 <- (1:(df-1))/df
 	knots1 <- quantile(abundance.nonzero,p=p1)
 	r <- range(abundance.nonzero)
 	knots2 <- r[1]+p1*(r[2]-r[1])
 	knots <- 0.3*knots1+0.7*knots2
-	X <- cbind(1, ns(abundance.nonzero, df, knots=knots))
+	X <- cbind(1, splines::ns(abundance.nonzero, df, knots=knots))
 	
 	fun <- function(par,y,design,offset,abundance,X) {
 		eta <- X %*% par
diff --git a/R/dropEmptyLevels.R b/R/dropEmptyLevels.R
new file mode 100644
index 0000000..b1dec00
--- /dev/null
+++ b/R/dropEmptyLevels.R
@@ -0,0 +1,13 @@
+dropEmptyLevels <- function(x)
+#	Drop levels of a factor that don't occur
+#	Gordon Smyth
+#	Created 25 March 2012.  Last modified 6 March 2015.
+{
+	if(is.factor(x)) {
+		i <- which(tabulate(as.integer(x))>0L)
+		if(length(i) < nlevels(x)) x <- factor(x, levels=levels(x)[i])
+		return(x)
+	} else {
+		return(factor(x))
+	}
+}
diff --git a/R/estimateDisp.R b/R/estimateDisp.R
index 0674f17..d98c87e 100644
--- a/R/estimateDisp.R
+++ b/R/estimateDisp.R
@@ -6,11 +6,9 @@ estimateDisp <- function(y, design=NULL, prior.df=NULL, trend.method="locfit", s
 #  Estimating dispersion using weighted conditional likelihood empirical Bayes.
 #  Use GLM approach if a design matrix is given, and classic approach otherwise.
 #  It calculates a matrix of likelihoods for each gene at a set of dispersion grid points, and then calls WLEB() to do the shrinkage.
-#  Yunshun Chen, Gordon Smyth. Created July 2012. Last modified 11 Sep 2014.
+#  Yunshun Chen, Gordon Smyth. Created July 2012. Last modified 03 Feb 2015.
 {
 	if( !is(y,"DGEList") ) stop("y must be a DGEList")
-	group <- y$samples$group <- as.factor(y$samples$group)
-
 	trend.method <- match.arg(trend.method, c("none", "loess", "locfit", "movingave"))
 	ntags <- nrow(y$counts)
 	nlibs <- ncol(y$counts)
@@ -31,6 +29,7 @@ estimateDisp <- function(y, design=NULL, prior.df=NULL, trend.method="locfit", s
 	# Classic edgeR
 	if(is.null(design)){
 		# One group
+		group <- y$samples$group <- as.factor(y$samples$group)
 		if(length(levels(group))==1)
 			design <- matrix(1,nlibs,1)
 		else
@@ -56,7 +55,7 @@ estimateDisp <- function(y, design=NULL, prior.df=NULL, trend.method="locfit", s
 		for(j in 1:grid.length) for(i in 1:length(ysplit)) 
 			l0[,j] <- condLogLikDerDelta(ysplit[[i]], grid.vals[j], der=0) + l0[,j]
 	}
-	# GLM edgeR (using the last fit to hot-start the next fit).
+	# GLM edgeR 
 	else {
 		design <- as.matrix(design)
 		if(ncol(design) >= ncol(y$counts)) {
@@ -64,11 +63,32 @@ estimateDisp <- function(y, design=NULL, prior.df=NULL, trend.method="locfit", s
 			y$common.dispersion <- NA
 			return(y)
 		}
-		last.beta <- NULL
-		for(i in 1:grid.length) {
-			out <- adjustedProfileLik(spline.disp[i], y=y$counts[sel, ], design=design, offset=offset[sel,], start=last.beta, get.coef=TRUE)
-			l0[,i] <- out$apl
-			last.beta <- out$beta
+		
+		# Protect against zeros.
+		glmfit <- glmFit(y$counts[sel,], design, offset=offset[sel,], dispersion=0.05, prior.count=0)
+		zerofit <- (glmfit$fitted.values < 1e-4) & (glmfit$counts < 1e-4)
+		by.group <- .comboGroups(zerofit)
+
+		for (subg in by.group) { 
+			cur.nzero <- !zerofit[subg[1],]
+			if (!any(cur.nzero)) { next } 
+			if (all(cur.nzero)) { 
+				redesign <- design
+			} else {
+				redesign <- design[cur.nzero,,drop=FALSE]
+				QR <- qr(redesign)
+				redesign <- redesign[,QR$pivot[1:QR$rank],drop=FALSE]
+				if (nrow(redesign) == ncol(redesign)) { next }
+			}
+
+			# Using the last fit to hot-start the next fit
+			last.beta <- NULL
+			for(i in 1:grid.length) {
+				out <- adjustedProfileLik(spline.disp[i], y=y$counts[sel, ][subg,cur.nzero,drop=FALSE], design=redesign, 
+					offset=offset[sel,][subg,cur.nzero,drop=FALSE], start=last.beta, get.coef=TRUE)
+				l0[subg,i] <- out$apl
+				last.beta <- out$beta
+			}
 		}
 	}
 
diff --git a/R/estimateTrendedDisp.R b/R/estimateTrendedDisp.R
index 6a40799..6d08bbe 100644
--- a/R/estimateTrendedDisp.R
+++ b/R/estimateTrendedDisp.R
@@ -24,7 +24,7 @@ estimateTrendedDisp <- function(object, method="bin.spline", df=5, span=2/3)
 	}
 
 	if( method=="bin.spline" ) {
-		require("splines")
+		if(!requireNamespace("splines",quietly=TRUE)) stop("splines required but is not available")
 		p1 <- (1:(df-1))/df
 		knots1 <- quantile(logCPM.bins,p=p1)
 		r <- range(logCPM.bins)
@@ -35,7 +35,7 @@ estimateTrendedDisp <- function(object, method="bin.spline", df=5, span=2/3)
 		ind[df+1] <- which.max(logCPM.bins)
 		for(i in 2:df)
 			ind[i] <- which.min(abs(knots[i-1]-logCPM.bins))
-		fit <- lm(disp.bins ~ ns(logCPM.bins, df=df, knots=knots))
+		fit <- lm(disp.bins ~ splines::ns(logCPM.bins, df=df, knots=knots))
 		f <- splinefun(logCPM.bins[ind], fit$fitted.value[ind], method="natural")
 		dispersion <- f(logCPM)
 	}
diff --git a/R/glmQLFTest.R b/R/glmQLFTest.R
index dbff67f..4a8cabc 100644
--- a/R/glmQLFTest.R
+++ b/R/glmQLFTest.R
@@ -1,26 +1,58 @@
-glmQLFit <- function(y, design=NULL, dispersion=NULL, abundance.trend=TRUE, robust=FALSE, winsor.tail.p=c(0.05, 0.1), ...)
-# 	Fits a GLM and computes quasi-likelihood dispersions for each gene.
-# 	Written by Aaron Lun, based on code by Davis McCarthy and Gordon Smyth
-# 	Created 15 September 2014. Last modified 17 September 2014.
+#  FIT QUASI-LIKELIHOOD GENERALIZED LINEAR MODELS
+
+glmQLFit <- function(y, ...)
+UseMethod("glmQLFit")
+
+glmQLFit.DGEList <- function(y, design=NULL, dispersion=NULL, offset=NULL, abundance.trend=TRUE, robust=FALSE, winsor.tail.p=c(0.05, 0.1), ...)
+#	Quasi-likelihood F-tests for DGE glms.
+# 	Yunshun Chen and Aaron Lun
+#	Created 05 November 2014.
 {
-#	Initial fit with trended dispersion
-	if(!is(y,"DGEList")) { stop("y must be a DGEList") }
 	if(is.null(dispersion)) {
 		dispersion <- y$trended.dispersion
 		if(is.null(dispersion)) dispersion <- y$common.dispersion
 		if(is.null(dispersion)) dispersion <- 0.05
 	}
-	glmfit <- glmFit(y,design=design,dispersion=dispersion, ...)
-	
+	if(is.null(dispersion)) stop("No dispersion values found in DGEList object.")
+	if(is.null(offset)) offset <- getOffset(y)
+	if(is.null(y$AveLogCPM)) y$AveLogCPM <- aveLogCPM(y)
+
+	fit <- glmQLFit(y=y$counts, design=design, dispersion=dispersion, offset=offset, lib.size=NULL, abundance.trend=abundance.trend, AveLogCPM=y$AveLogCPM, robust=robust, winsor.tail.p=winsor.tail.p, weights=y$weights, ...)
+	fit$samples <- y$samples
+	fit$genes <- y$genes
+#	fit$prior.df <- y$prior.df
+	fit$AveLogCPM <- y$AveLogCPM
+	new("DGEGLM",fit)
+}
+
+glmQLFit.default <- function(y, design=NULL, dispersion=0.05, offset=NULL, lib.size=NULL, abundance.trend=TRUE, AveLogCPM=NULL, robust=FALSE, winsor.tail.p=c(0.05, 0.1), ...)
+# 	Fits a GLM and computes quasi-likelihood dispersions for each gene.
+# 	Yunshun Chen and Aaron Lun, based on code by Davis McCarthy and Gordon Smyth
+# 	Created 15 September 2014. Last modified 05 November 2014.
+{
+#	Check y
+	y <- as.matrix(y)
+	ntag <- nrow(y)
+	nlib <- ncol(y)
+
+#	Check dispersion
+	if(is.null(dispersion)) dispersion <- 0.05
+
+#	Check offset and lib.size
+	if(is.null(offset)) {
+		if(is.null(lib.size)) lib.size <- colSums(y)
+		offset <- log(lib.size)
+	}
+	offset <- expandAsMatrix(offset,dim(y))
+
+	glmfit <- glmFit(y,design=design,dispersion=dispersion,offset=offset,lib.size=lib.size,...)
+
 #	Setting up the abundances.
-	A <- NULL
 	if(abundance.trend) {
-		if(is.null(y$AveLogCPM)) {
-			A <- aveLogCPM(y) # For consistency with call from estimateDisp.
-		} else {
-			A <- y$AveLogCPM
-		}
-		glmfit$AveLogCPM <- A
+		if(is.null(AveLogCPM)) AveLogCPM <- aveLogCPM(y) 
+		glmfit$AveLogCPM <- AveLogCPM
+	} else {
+		AveLogCPM <- NULL
 	}
 
 #	Adjust df.residual for fitted values at zero
@@ -31,66 +63,67 @@ glmQLFit <- function(y, design=NULL, dispersion=NULL, abundance.trend=TRUE, robu
 	s2 <- glmfit$deviance / df.residual
 	s2[df.residual==0] <- 0
 	s2 <- pmax(s2,0)
-	s2.fit <- squeezeVar(s2,df=df.residual,covariate=A,robust=robust,winsor.tail.p=winsor.tail.p)
+	s2.fit <- squeezeVar(s2,df=df.residual,covariate=AveLogCPM,robust=robust,winsor.tail.p=winsor.tail.p)
 
 #	Storing results
-	glmfit$df.residual <- df.residual
-	glmfit$s2.fit <- s2.fit
+	glmfit$df.residual.zeros <- df.residual
 	glmfit$df.prior <- s2.fit$df.prior
+	glmfit$var.post <- s2.fit$var.post
+	glmfit$var.prior <- s2.fit$var.prior
 	glmfit
 }
 
+
 glmQLFTest <- function(glmfit, coef=ncol(glmfit$design), contrast=NULL)
 #	Quasi-likelihood F-tests for DGE glms.
 #	Davis McCarthy and Gordon Smyth.
 #	Created 18 Feb 2011. Last modified 15 Sep 2014.
 {
     if(!is(glmfit,"DGEGLM")) stop("glmfit must be an DGEGLM object produced by glmQLFit") 
-	if(is.null(glmfit$s2.fit)) stop("need to run glmQLFit before glmQLFTest") 
+	if(is.null(glmfit$var.post)) stop("need to run glmQLFit before glmQLFTest") 
 	out <- glmLRT(glmfit, coef=coef, contrast=contrast)
 
 #	Compute the QL F-statistic
-	F <- out$table$LR / out$df.test / glmfit$s2.fit$var.post
-	df.total <- glmfit$s2.fit$df.prior + glmfit$df.residual
+	F.stat <- out$table$LR / out$df.test / glmfit$var.post
+	df.total <- glmfit$df.prior + glmfit$df.residual.zeros
 	max.df.residual <- ncol(glmfit$counts)-ncol(glmfit$design)
 	df.total <- pmin(df.total, nrow(glmfit)*max.df.residual)
 
 #	Compute p-values from the QL F-statistic
-	F.pvalue <- pf(F, df1=out$df.test, df2=df.total, lower.tail=FALSE, log.p=FALSE)
+	F.pvalue <- pf(F.stat, df1=out$df.test, df2=df.total, lower.tail=FALSE, log.p=FALSE)
 
 #	Ensure is not more significant than chisquare test
-	i <- glmfit$s2.fit$var.post < 1
+	i <- glmfit$var.post < 1
 	if(any(i)) {
 		chisq.pvalue <- pchisq(out$table$LR[i], df=out$df.test[i], lower.tail=FALSE, log.p=FALSE)
 		F.pvalue[i] <- pmax(F.pvalue[i],chisq.pvalue)
 	}
 
 	out$table$LR <- out$table$PValue <- NULL
-	out$table$F <- F
+	out$table$F <- F.stat
 	out$table$PValue <- F.pvalue
 	out$df.total <- df.total
 
 	out
 }
 
-plotQLDisp <- function(glmfit, xlab="Average Log2 CPM", ylab="Quarter-Root Mean Deviance", pch=16, cex=0.2, 
-		col.shrunk="red", col.trend="blue", col.raw="black", ...)
+plotQLDisp <- function(glmfit, xlab="Average Log2 CPM", ylab="Quarter-Root Mean Deviance", pch=16, cex=0.2, col.shrunk="red", col.trend="blue", col.raw="black", ...)
 # 	Plots the result of QL-based shrinkage.
-#	written by Aaron Lun, based on code by Davis McCarthy and Gordon Smyth
+#	Aaron Lun, based on code by Davis McCarthy and Gordon Smyth
 #	15 September 2014
 {
 	A <- glmfit$AveLogCPM
 	if(is.null(A)) A <- aveLogCPM(glmfit)
-	s2 <- glmfit$deviance / glmfit$df.residual
-	if(is.null(glmfit$s2.fit)) { stop("need to run glmQLFit before plotQLDisp") }
+	s2 <- glmfit$deviance / glmfit$df.residual.zeros
+	if(is.null(glmfit$var.post)) { stop("need to run glmQLFit before plotQLDisp") }
 
 	plot(A, sqrt(sqrt(s2)),xlab=xlab, ylab=ylab, pch=pch, cex=cex, col=col.raw, ...)
-	points(A,sqrt(sqrt(glmfit$s2.fit$var.post)),pch=16,cex=0.2,col=col.shrunk)
-	if (length(glmfit$s2.fit$var.prior)==1L) { 
-		abline(h=sqrt(sqrt(glmfit$s2.fit$var.prior)), col=col.trend)
+	points(A,sqrt(sqrt(glmfit$var.post)),pch=16,cex=0.2,col=col.shrunk)
+	if (length(glmfit$var.prior)==1L) { 
+		abline(h=sqrt(sqrt(glmfit$var.prior)), col=col.trend)
 	} else {
 		o <- order(A)
-		lines(A[o],sqrt(sqrt(glmfit$s2.fit$var.prior[o])),col=col.trend)
+		lines(A[o],sqrt(sqrt(glmfit$var.prior[o])),col=col.trend)
 	}
 
 	legend("topright",pch=16,col=c(col.raw,col.shrunk,col.trend),legend=c("Raw","Squeezed", "Trend"))
diff --git a/R/treatDGE.R b/R/glmTreat.R
similarity index 59%
rename from R/treatDGE.R
rename to R/glmTreat.R
index 36cdf89..df8d2f4 100644
--- a/R/treatDGE.R
+++ b/R/glmTreat.R
@@ -1,7 +1,13 @@
 treatDGE <- function(glmfit, coef=ncol(glmfit$design), contrast=NULL, lfc=0)
-#	Likelihood ratio test with threshold
+{
+	message("treatDGE() has been renamed to glmTreat().  Please use the latter instead.")
+	glmTreat(glmfit=glmfit,coef=coef,contrast=contrast,lfc=lfc)
+}
+
+glmTreat <- function(glmfit, coef=ncol(glmfit$design), contrast=NULL, lfc=0)
+#	Likelihood ratio test or quasi-likelihood F-test with threshold
 #	Yunshun Chen and Gordon Smyth
-#	05 May 2014.
+#	Created on 05 May 2014. Last modified on 25 Mar 2015
 {
 	if(lfc < 0) stop("lfc has to be non-negative")
 #	Check glmfit
@@ -9,11 +15,15 @@ treatDGE <- function(glmfit, coef=ncol(glmfit$design), contrast=NULL, lfc=0)
 		if(is(glmfit,"DGEList") && is(coef,"DGEGLM")) {
 			stop("First argument is no longer required. Rerun with just the glmfit and coef/contrast arguments.")
 		}
-		stop("glmfit must be an DGEGLM object (usually produced by glmFit).")
+		stop("glmfit must be an DGEGLM object (usually produced by glmFit or glmQLFit).")
 	}
 	if(is.null(glmfit$AveLogCPM)) glmfit$AveLogCPM <- aveLogCPM(glmfit)
 	nlibs <- ncol(glmfit)
 	ngenes <- nrow(glmfit)
+	
+#	Check if glmfit is from glmFit() or glmQLFit()
+	isLRT <- is.null(glmfit$df.prior)
+	if(isLRT) fun <- glmFit else fun <- glmQLFit
 
 #	Check design matrix
 	design <- as.matrix(glmfit$design)
@@ -56,24 +66,41 @@ treatDGE <- function(glmfit, coef=ncol(glmfit$design), contrast=NULL, lfc=0)
 #	Null design matrix
 	design0 <- design[, -coef, drop=FALSE]
 
-#	LRT of beta_0 = zero
-	fit0.zero <- glmFit(glmfit$counts, design=design0, offset=glmfit$offset, weights=glmfit$weights, dispersion=glmfit$dispersion, prior.count=0)
+#	Test statistics at beta_0 = zero
+	fit0.zero <- fun(glmfit$counts, design=design0, offset=glmfit$offset, weights=glmfit$weights, dispersion=glmfit$dispersion, prior.count=0)
 	X2.zero <- pmax(0, fit0.zero$deviance - glmfit$deviance)
 
-#	LRT of beta_0 = tau
+#	Test statistics at beta_0 = tau
 	offset.adj <- matrix(-lfc*log(2), ngenes, 1)
 	up <- logFC >= 0
 	offset.adj[up, ] <- lfc*log(2)
 	offset.new <- glmfit$offset + offset.adj %*% t(design[, coef, drop=FALSE])
-	fit0.tau <- glmFit(glmfit$counts, design=design0, offset=offset.new, weights=glmfit$weights, dispersion=glmfit$dispersion, prior.count=0)
-	fit1.tau <- glmFit(glmfit$counts, design=design, offset=offset.new, weights=glmfit$weights, dispersion=glmfit$dispersion, prior.count=0)
+	fit0.tau <- fun(glmfit$counts, design=design0, offset=offset.new, weights=glmfit$weights, dispersion=glmfit$dispersion, prior.count=0)
+	fit1.tau <- fun(glmfit$counts, design=design, offset=offset.new, weights=glmfit$weights, dispersion=glmfit$dispersion, prior.count=0)
 	X2.tau <- pmax(0, fit0.tau$deviance - fit1.tau$deviance)
-	
-	z.zero <- sqrt(X2.zero)
-	z.tau <- sqrt(X2.tau)
+
 	within <- abs(logFC) <= lfc
 	sgn <- 2*within - 1
-	p.value <- pnorm( z.tau*sgn ) + pnorm( -z.tau*sgn-2*z.zero )
+	z.zero <- sqrt(X2.zero)
+	z.tau <- sqrt(X2.tau)
+	
+	if(isLRT){
+		p.value <- pnorm( z.tau*sgn ) + pnorm( -z.tau*sgn-2*z.zero )
+	} else {
+		t.zero <- z.zero/sqrt(glmfit$var.post)
+		t.tau <- z.tau/sqrt(glmfit$var.post)
+		df.total <- glmfit$df.prior + glmfit$df.residual.zeros
+		max.df.residual <- ncol(glmfit$counts)-ncol(glmfit$design)
+		df.total <- pmin(df.total, nrow(glmfit)*max.df.residual)
+		p.value <- pt( t.tau*sgn, df=df.total ) + pt( -t.tau*sgn-2*t.zero, df=df.total )
+	
+	#	Ensure is not more significant than z-test
+		i <- glmfit$var.post < 1
+		if(any(i)) {
+			z.pvalue <- pnorm( z.tau[i]*sgn[i] ) + pnorm( -z.tau[i]*sgn[i]-2*z.zero[i] )
+			p.value[i] <- pmax(p.value[i], z.pvalue)
+		}
+	}
 
 	tab <- data.frame(
 		logFC=logFC,
@@ -88,9 +115,3 @@ treatDGE <- function(glmfit, coef=ncol(glmfit$design), contrast=NULL, lfc=0)
 	new("DGELRT",unclass(glmfit))
 }
 
-
-
-	
-	
-	
-	
\ No newline at end of file
diff --git a/R/goana.R b/R/goana.R
index 0b44802..10b14d0 100644
--- a/R/goana.R
+++ b/R/goana.R
@@ -1,7 +1,7 @@
 goana.DGELRT <- function(de, geneid = rownames(de), FDR = 0.05, species = "Hs", trend = FALSE, ...)
 #  Gene ontology analysis of DE genes from linear model fit
-#  Gordon Smyth and Yifang Hu
-#  Created 25 August 2014.  Last modified 26 August 2014.
+#  Gordon Smyth, Yifang Hu and Yunshun Chen
+#  Created 25 August 2014.  Last modified 6 April 2015.
 {
 	# Check fit
 	if(is.null(de$table$PValue)) stop("p values not found in fit object")	
@@ -60,7 +60,7 @@ goana.DGELRT <- function(de, geneid = rownames(de), FDR = 0.05, species = "Hs",
 	}
 	if(!trend) PW <- NULL
 
-	NextMethod(de = de.gene, universe = universe, species = species, prior.prob = PW, ...)
+	goana(de = de.gene, universe = universe, species = species, prior.prob = PW, ...)
 }
 
 goana.DGEExact <- goana.DGELRT
diff --git a/R/locfitByCol.R b/R/locfitByCol.R
index 2059c85..b38dcbc 100644
--- a/R/locfitByCol.R
+++ b/R/locfitByCol.R
@@ -1,13 +1,13 @@
 locfitByCol <- function(y, x=NULL, weights=1, span=0.5, degree=0)
 #	Gordon Smyth
-#	20 Aug 2012.
+#	20 Aug 2012.  Last modified 2 Feb 2015.
 {
 	y <- as.matrix(y)
 	ntags <- nrow(y)
+	weights <- rep_len(weights,ntags)
 	if(is.null(x)) x <- 1:ntags
 	if(span*ntags<2 || ntags<=1) return(y)
-	suppressPackageStartupMessages(require(locfit))
-	X <- cbind(1,x)
-	for (j in 1:ncol(y)) y[,j] <- fitted(locfit.raw(X,y[,j],weights=weights, alpha=span,deg=degree))
+	if(!requireNamespace("locfit",quietly=TRUE)) stop("locfit required but is not available")
+	for (j in 1:ncol(y)) y[,j] <- fitted(locfit::locfit(y[,j]~x,weights=weights, alpha=span,deg=degree))
 	y
 }
diff --git a/R/mglmOneGroup.R b/R/mglmOneGroup.R
index f2513f2..1d05252 100644
--- a/R/mglmOneGroup.R
+++ b/R/mglmOneGroup.R
@@ -42,7 +42,7 @@ mglmOneGroup <- function(y,dispersion=0,offset=0,weights=NULL,maxit=50,tol=1e-10
 	weights <- expandAsMatrix(weights,dim(y))
 
 #	Fisher scoring iteration.
-	output <- .Call(.cR_one_group, y, dispersion, offset, weights, maxit, tol, coef.start)
+	output <- .Call(.cR_one_group, nlibs, ntags, t(y), dispersion, offset, weights, maxit, tol, coef.start)
 
 #	Check error condition
 	if(is.character(output)) stop(output)
diff --git a/R/plotMDS.DGEList.R b/R/plotMDS.DGEList.R
index 6c9b4c5..85280fb 100644
--- a/R/plotMDS.DGEList.R
+++ b/R/plotMDS.DGEList.R
@@ -1,8 +1,20 @@
-plotMDS.DGEList <- function (x, top=500, labels=colnames(x), col=NULL, cex=1, dim.plot=c(1, 2), ndim=max(dim.plot), xlab=NULL, ylab=NULL, method="logFC", prior.count=2, gene.selection="pairwise", ...)
+plotMDS.DGEList <- function (x,top=500,labels=NULL,pch=NULL,cex=1,dim.plot=c(1,2),ndim=max(dim.plot),gene.selection="pairwise",xlab=NULL,ylab=NULL,method="logFC",prior.count=2,...)
 #	Multidimensional scaling plot of digital gene expression profiles
 #	Yunshun Chen, Mark Robinson and Gordon Smyth
-#	23 May 2011.  Last modified 28 May 2013.
+#	23 May 2011.  Last modified 25 Nov 2014.
 {
+	method <- match.arg(method, c("logfc","logFC","bcv","BCV"))
+	if(method=="logfc") method <- "logFC"
+	if(method=="BCV") method <- "bcv"
+
+#	Default method is to convert to moderated logCPM and call limma plotMDS
+	if(method=="logFC") {
+		y <- cpm(x,log=TRUE,prior.count=prior.count)
+		return(plotMDS(y,top=top,labels=labels,pch=pch,cex=cex,dim.plot=dim.plot,ndim=ndim,gene.selection=gene.selection,xlab=xlab,ylab=ylab,...))
+	}
+
+#	From here method="bcv"
+
 #	Check x
 	x$counts <- as.matrix(x$counts)
 	if(!all(is.finite(x$counts))) stop("Missing or infinite counts not allowed")
@@ -11,23 +23,17 @@ plotMDS.DGEList <- function (x, top=500, labels=colnames(x), col=NULL, cex=1, di
 	nsamples <- ncol(x)
 	if(nsamples < 3) stop("Need at least 3 columns of data")
 
-#	Check value for labels
-	if(is.null(labels)) labels <- 1:nsamples
-	labels <- as.character(labels)
-
-#	Check value for dim.plot
-	if(nsamples < ndim) stop("Dimension to be plotted is greater than number of libraries")
-
-#	Default method is to convert to moderated logCPM and call limma plotMDS
-	method <- match.arg(method, c("logFC","bcv","BCV"))
-	if(method=="logFC") {
-		if(is.null(xlab)) xlab <- paste("Leading logFC dim",dim.plot[1])
-		if(is.null(ylab)) ylab <- paste("Leading logFC dim",dim.plot[2])
-		y <- cpm(x,log=TRUE,prior.count=prior.count)
-		return(plotMDS(y,top=top,labels=labels,col=col,cex=cex,dim.plot=dim.plot,ndim=ndim,gene.selection=gene.selection,xlab=xlab,ylab=ylab,...))
+#	Check labels and pch
+	if(is.null(pch) & is.null(labels)) {
+		labels <- colnames(x)
+		if(is.null(labels)) labels <- 1:nsamples
 	}
+	if(!is.null(labels)) labels <- as.character(labels)
 
-#	From here method="bcv"
+#	Check dim
+	if(ndim < 2) stop("Need at least two dim.plot")
+	if(nsamples < ndim) stop("Too few samples")
+	if(nprobes < ndim) stop("Too few rows")
 
 	x$samples$group <- factor(rep.int(1,nsamples))
 
@@ -35,17 +41,17 @@ plotMDS.DGEList <- function (x, top=500, labels=colnames(x), col=NULL, cex=1, di
 	dd <- matrix(0,nrow=nsamples,ncol=nsamples,dimnames=list(cn,cn))	
 
 #	Check value for top
-	if (top < nprobes) { 
+	if(top < nprobes) { 
 		twd <- estimateTagwiseDisp(estimateCommonDisp(x), grid.length = 100) 
 		o <- order(twd$tagwise.dispersion, decreasing = TRUE)[1:top]
 		subdata <- x$counts[o,,drop=FALSE]
 	} else {
-		subdata<-x$counts
+		subdata <- x$counts
 	}
 
+#	Compute pairwise BCV values
 	lib.size <- x$samples$lib.size * x$samples$norm.factors
 	myFun <- function(delta, y, ...) sum(condLogLikDerDelta(y, delta, ...))
-
 	for (i in 2:(nsamples)) {
 		for (j in 1:(i - 1))  {
 			mm <- subdata[,c(i,j)]
@@ -57,15 +63,17 @@ plotMDS.DGEList <- function (x, top=500, labels=colnames(x), col=NULL, cex=1, di
 		}
 	}
 
-#	Securing against negative eigenvalues with non-Euclidian distance matrices.
+#	Multidim scaling
 	a1 <- cmdscale(as.dist(dd), k = ndim)
+
+#	Check whether dimensions have been removed (because of negative eigenvalues)
+#	Add random variate if necessary
 	ndiff <- ndim-ncol(a1)
-	if (ndiff > 0) a1<-cbind(a1, matrix(runif(ndiff*nsamples, -1e-6, 1e-6), ncol=ndiff, nrow=nsamples))
+	if(ndiff > 0) a1 <- cbind(a1, matrix(runif(ndiff*nsamples, -1e-6, 1e-6), ncol=ndiff, nrow=nsamples))
 
 	mds <- new("MDS",list(dim.plot=dim.plot,distance.matrix=dd,cmdscale.out=a1,top=top))
 	mds$x <- a1[,dim.plot[1]]
 	mds$y <- a1[,dim.plot[2]]
-	if(is.null(xlab)) xlab <- paste("BCV distance",dim.plot[1])
-	if(is.null(ylab)) ylab <- paste("BCV distance",dim.plot[2])
-	plotMDS(mds,labels=labels,col=col,cex=cex,xlab=xlab,ylab=ylab,...)
+	mds$axislabel <- "BCV distance"
+	plotMDS(mds,labels=labels,pch=pch,cex=cex,xlab=xlab,ylab=ylab,...)
 }
diff --git a/R/processAmplicons.R b/R/processAmplicons.R
index 50a78d2..d07e69a 100644
--- a/R/processAmplicons.R
+++ b/R/processAmplicons.R
@@ -1,15 +1,17 @@
 #  Code to process hairpin reads from Illumina sequencer
-#  Assume fixed structure of read:
+#  Assume basic fixed structure of read:
 #  Barcode + Common sequence + Hairpin sequence
+#  If barcode2Start and barcode2End are not NULL, forward read sequences contain a second barcode
 #  If readfile2, barcodeStartRev and barcodeEndRev are not NULL, reverse read sequences contain reverse barcodes
 
 processAmplicons = function(readfile, readfile2=NULL, barcodefile, hairpinfile,
-                    barcodeStart=1, barcodeEnd=5, barcodeStartRev=NULL, barcodeEndRev=NULL, hairpinStart=37, hairpinEnd=57,
+                    barcodeStart=1, barcodeEnd=5, barcode2Start=NULL, barcode2End=NULL, barcodeStartRev=NULL, barcodeEndRev=NULL,
+                    hairpinStart=37, hairpinEnd=57,
                     allowShifting=FALSE, shiftingBase = 3,
                     allowMismatch=FALSE, barcodeMismatchBase = 1, hairpinMismatchBase = 2,
                     allowShiftedMismatch = FALSE, 
                     verbose = FALSE) {
-     
+    
   checkFileExistence = function(readfilenames){
     if ((length(readfilenames) == 1) && (!file.exists(readfilenames)))
       stop("Read file doesn't exist.\n")
@@ -22,6 +24,7 @@ processAmplicons = function(readfile, readfile2=NULL, barcodefile, hairpinfile,
   }
   
   # Check file existence
+  numfiles = length(readfile)
   checkFileExistence(readfile);
   if (is.null(readfile2)) {
     IsPairedReads = FALSE;
@@ -31,7 +34,8 @@ processAmplicons = function(readfile, readfile2=NULL, barcodefile, hairpinfile,
       stop("readfile and readfile2 should match each other.")
     checkFileExistence(readfile2);
   } 
-  
+  IsDualIndexingOnForwardRead = !is.null(barcode2Start) && !is.null(barcode2End)
+      
   if (!file.exists(barcodefile))
     stop("Barcode file doesn't exist.\n")
   if (!file.exists(hairpinfile))
@@ -54,7 +58,16 @@ processAmplicons = function(readfile, readfile2=NULL, barcodefile, hairpinfile,
     stop("Invalid hairpin end position!")
   if (hairpinEnd <= hairpinStart)
     stop("Hairpin end position should be greater than hairpin start position. \n")
-
+  
+  if (IsDualIndexingOnForwardRead){
+    if ((barcode2Start < 1) || (barcode2Start > readlength))
+      stop("Invalid barcode2 start position!\n")
+    if ((barcode2End < 1) || (barcode2End > readlength))
+      stop("Invalid barcode2 end position!\n")
+    if (barcode2End <= barcode2Start)
+      stop("Barcode2 end position should be greater than barcode2 start position. \n")
+  }
+      
   close(reads)
   
   if (IsPairedReads) {
@@ -76,7 +89,6 @@ processAmplicons = function(readfile, readfile2=NULL, barcodefile, hairpinfile,
   # Validate barcodes
   barcodelength <- barcodeEnd - barcodeStart + 1;
   barcodes <- read.table(barcodefile, header=TRUE, sep="\t");
-  numbc <- nrow(barcodes) 
   barcodeIDIndex = which(colnames(barcodes) == 'ID')
   if (length(barcodeIDIndex) < 1) 
     stop("Can't find column ID in ", barcodefile)
@@ -85,31 +97,42 @@ processAmplicons = function(readfile, readfile2=NULL, barcodefile, hairpinfile,
     stop("Can't find column Sequences in ", barcodefile)
   barcodeIDs <- as.character(barcodes[, barcodeIDIndex]) 
   barcodeseqs <- as.character(barcodes[, barcodeseqIndex]) 
-  if (length(unique(barcodeIDs)) != numbc)
+  if (anyDuplicated(barcodeIDs))
     stop("There are duplicate barcode IDs.\n")
   if ((min(nchar(barcodeseqs)) != barcodelength) || (max(nchar(barcodeseqs)) != barcodelength))
     stop(paste("Barcode sequence length is set to ", barcodelength, ", there are barcode sequence not with specified length.\n", sep=""))
+
   if (IsPairedReads) {
     barcodeseqRevIndex = which(colnames(barcodes) == 'SequencesReverse')
     if (length(barcodeseqRevIndex) < 1) 
       stop("Can't find column SequencesReverse in ", barcodefile)
     barcodeseqsReverse <- as.character(barcodes[, barcodeseqRevIndex])
-	barcodelengthReverse <- barcodeEndRev - barcodeStartRev + 1;	
+    barcodelengthReverse <- barcodeEndRev - barcodeStartRev + 1;	
     if ((min(nchar(barcodeseqsReverse)) != barcodelengthReverse) || (max(nchar(barcodeseqsReverse)) != barcodelengthReverse))
       stop(paste("Reverse barcode sequence length is set to ", barcodelength, ", there are reverse barcode sequence not in specified length.\n", sep=""))  
-	concatenatedBarcodeseqs = paste(barcodeseqs, barcodeseqsReverse, sep="")
-    if (length(unique(concatenatedBarcodeseqs)) != numbc)
+    concatenatedBarcodeseqs = paste(barcodeseqs, barcodeseqsReverse, sep="")
+    if (anyDuplicated(concatenatedBarcodeseqs)) 
       stop("There are duplicate forward/reverse barcode sequences.\n")
+  } else if (IsDualIndexingOnForwardRead) {
+    barcodeseq2Index = which(colnames(barcodes) == 'Sequences2')
+    if (length(barcodeseq2Index) < 1) 
+      stop("Can't find column Sequences2 in ", barcodefile)
+    barcode2seqs <- as.character(barcodes[, barcodeseq2Index])
+    barcode2length <- barcode2End - barcode2Start + 1;
+
+    if ((min(nchar(barcode2seqs)) != barcode2length) || (max(nchar(barcode2seqs)) != barcode2length))
+      stop(paste("Forward barcode2 sequence length is set to ", barcode2length, ", there are barcode2 sequence not in specified length.\n", sep=""))  
+    concatenatedBarcodeseqs = paste(barcodeseqs, barcode2seqs, sep="")
+    if (anyDuplicated(concatenatedBarcodeseqs)) 
+      stop("There are duplicate barcode/barcode2 sequences.\n") 
   } else {
-    if (length(unique(barcodeseqs)) != numbc)
+    if (anyDuplicated(barcodeseqs)) 
       stop("There are duplicate barcode sequences.\n")
   }
   
   # Validate hairpins
   hairpinlength <- hairpinEnd - hairpinStart + 1;
   hairpins <- read.table(hairpinfile, header=TRUE, sep="\t");
-  numhp <- nrow(hairpins) 
-
   hairpinIDIndex = which(colnames(hairpins) == 'ID')
   if (length(hairpinIDIndex) < 1) 
     stop("Can't find column ID in ", hairpinfile)
@@ -121,9 +144,9 @@ processAmplicons = function(readfile, readfile2=NULL, barcodefile, hairpinfile,
 
   if ((min(nchar(hairpinseqs)) != hairpinlength) || (max(nchar(hairpinseqs)) != hairpinlength))
     stop(paste("Hairpin sequence length is set to ", hairpinlength, ", there are hairpin sequences not with specified length.\n", sep=""))
-  if (length(unique(hairpinseqs)) != numhp)
+  if (anyDuplicated(hairpinseqs)) 
     stop("There are duplicate hairpin sequences.\n")
-  if (length(unique(hairpinIDs)) != numhp)
+  if (anyDuplicated(hairpinIDs)) 
     stop("There are duplicate hairpin IDs.\n")
  
   # validate mismatch/shifting input parameters
@@ -145,27 +168,40 @@ processAmplicons = function(readfile, readfile2=NULL, barcodefile, hairpinfile,
 
   # passing only barcode/hairpin sequences to C function
   tempbarcodefile <- paste("Barcode", as.character(Sys.getpid()), "temp.txt", sep = "_")
+  on.exit({ if (file.exists(tempbarcodefile)) { file.remove(tempbarcodefile) }}, add=TRUE)
   if (IsPairedReads) {
     bothBarcodeSeqs = cbind(barcodeseqs, barcodeseqsReverse)
     write.table(bothBarcodeSeqs, file=tempbarcodefile, sep="\t", quote=FALSE, row.names=FALSE, col.names=FALSE);
+  } else if (IsDualIndexingOnForwardRead) {
+    bothBarcodeSeqs = cbind(barcodeseqs, barcode2seqs)
+    write.table(bothBarcodeSeqs, file=tempbarcodefile, sep="\t", quote=FALSE, row.names=FALSE, col.names=FALSE);
   } else {
     write.table(barcodeseqs, file=tempbarcodefile, sep="\t", quote=FALSE, row.names=FALSE, col.names=FALSE);
   }
   
   temphairpinfile <- paste("Hairpin", as.character(Sys.getpid()), "temp.txt", sep = "_")
+  on.exit({ if (file.exists(temphairpinfile)) { file.remove(temphairpinfile) }}, add=TRUE)
   write.table(hairpinseqs, file=temphairpinfile, sep="\t", quote=FALSE, row.names=FALSE, col.names=FALSE);
 
   tempoutfile <- paste("ReadcountSummary", as.character(Sys.getpid()), "output.txt", sep = "_")
+  on.exit({ if (file.exists(tempoutfile)) { file.remove(tempoutfile) }}, add=TRUE)
 
   tryCatch({
     if (!IsPairedReads) {
-       readfile2 = "DummyReadfile.fastq"
+       readfile2 = rep("DummyReadfile.fastq", numfiles)
        barcodeStartRev = 0;
        barcodeEndRev = 0;
     }
-    .C(.cprocessHairpinReads, IsPairedReads, readfile, readfile2, as.integer(length(readfile)),
+    if (!IsDualIndexingOnForwardRead) {
+       barcode2Start = 0;
+       barcode2End = 0;
+    }
+    
+    .C(.cprocessHairpinReads, as.integer(IsPairedReads), as.integer(IsDualIndexingOnForwardRead), 
+	   as.character(readfile), as.character(readfile2), as.integer(numfiles),
        as.character(tempbarcodefile), as.character(temphairpinfile),
-       as.integer(barcodeStart), as.integer(barcodeEnd), as.integer(barcodeStartRev), as.integer(barcodeEndRev), as.integer(hairpinStart), as.integer(hairpinEnd),
+       as.integer(barcodeStart), as.integer(barcodeEnd), as.integer(barcode2Start), as.integer(barcode2End), as.integer(barcodeStartRev), as.integer(barcodeEndRev),
+       as.integer(hairpinStart), as.integer(hairpinEnd),
        as.integer(allowShifting), as.integer(shiftingBase),
        as.integer(allowMismatch), as.integer(barcodeMismatchBase), as.integer(hairpinMismatchBase),
        as.integer(allowShiftedMismatch),
@@ -175,14 +211,6 @@ processAmplicons = function(readfile, readfile2=NULL, barcodefile, hairpinfile,
   },
   error = function(err) {
     print(paste("ERROR MESSAGE:  ",err))
-  },
-  finally = {
-    if (file.exists(tempbarcodefile))
-      file.remove(tempbarcodefile)
-    if (file.exists(temphairpinfile))
-      file.remove(temphairpinfile)
-    if (file.exists(tempoutfile))
-      file.remove(tempoutfile)
   }
   )
 
diff --git a/R/residDF.R b/R/residDF.R
index d458a60..f20259f 100644
--- a/R/residDF.R
+++ b/R/residDF.R
@@ -1,3 +1,42 @@
+.comboGroups <- function(truths) 
+# Function that returns a list of vectors of indices,
+# where each vector refers to the rows with the same
+# combination of TRUE/FALSE values in 'truths'.
+# 
+# written by Aaron Lun
+# Created 24 October 2014
+{
+#	Integer packing will only work for 31 libraries at a time.	
+	assembly <- list()
+	collected <- 0L
+	step <- 31L
+	bits <- as.integer(2^(1:step-1L))
+
+	while (collected < ncol(truths)) {
+		upper <- pmin(ncol(truths) - collected, step)
+		keys <- t(truths[,collected+1:upper,drop=FALSE]) * bits[1:upper]
+		assembly[[length(assembly)+1L]] <- as.integer(colSums(keys))
+		collected <- collected + step
+	}
+
+#	Figuring out the unique components.
+	o <- do.call(order, assembly)
+	nr <- nrow(truths)
+	is.different <- logical(nr)
+	for (i in 1:length(assembly)) { 
+		is.different <- is.different | c(TRUE, diff(assembly[[i]][o])!=0L)
+	}
+	first.of.each <- which(is.different)
+	last.of.each <- c(first.of.each[-1]-1L, nr)
+
+#	Returning the groups.
+	output <- list()
+	for (u in 1:length(first.of.each)) {
+		output[[u]] <- o[first.of.each[u]:last.of.each[u]]
+	}
+	return(output)
+}
+
 .residDF <- function(zero, design)
 #	Effective residual degrees of freedom after adjusting for exact zeros
 #	Gordon Smyth and Aaron Lun
@@ -17,38 +56,18 @@
 	somezero <- nzero>0L & nzero<nlib
 	if(any(somezero)) {
 		zero2 <- zero[somezero,,drop=FALSE]
-
-#		Integer packing will only work for 31 libraries at a time.	
-		assembly <- list()	
-		collected <- 0L
-		step <- 31L
-		bits <- as.integer(2^(1:step-1L))
-		while (collected < ncol(zero2)) {
-			upper <- pmin(ncol(zero2) - collected, step)
-			keys <- t(zero2[,collected+1:upper,drop=FALSE]) * bits[1:upper]
-			assembly[[length(assembly)+1L]] <- as.integer(colSums(keys))
-			collected <- collected + step
-		}
-
-#		Figuring out the unique components.
-		o <- do.call(order, assembly)
-		nzeros <- sum(somezero)
-		is.different <- logical(nzeros)
-		for (i in 1:length(assembly)) { 
-			is.different <- is.different | c(TRUE, diff(assembly[[i]][o])!=0L)
-		}
-		first.of.each <- which(is.different)
-		last.of.each <- c(first.of.each[-1]-1L, nzeros)
+		groupings <- .comboGroups(zero2)
 
 #		Identifying the true residual d.f. for each of these rows.			
 		DF2 <- nlib-nzero[somezero]
-		for (u in 1:length(first.of.each)) {
-			i <- o[first.of.each[u]:last.of.each[u]]
+		for (u in 1:length(groupings)) {
+			i <- groupings[[u]]
 			zeroi <- zero2[i[1],]
 			DF2[i] <- DF2[i]-qr(design[!zeroi,,drop=FALSE])$rank
 		}
 		DF2 <- pmax(DF2, 0L)
 		DF[somezero] <- DF2
 	}
+
 	DF
 }
diff --git a/R/romer.DGEList.R b/R/romer.DGEList.R
new file mode 100644
index 0000000..d98f4b8
--- /dev/null
+++ b/R/romer.DGEList.R
@@ -0,0 +1,39 @@
+romer.DGEList <- function(y, index, design=NULL, contrast=ncol(design), ...)
+#	rotation mean-rank version of GSEA (gene set enrichment analysis) for RNA-Seq data
+#	Yunshun Chen, Gordon Smyth
+#	Created 20 Oct 2014.
+{
+#	Check dispersion estimates in y
+	dispersion <- getDispersion(y)
+	if(is.null(dispersion)) stop("Dispersion estimate not found. Please estimate the dispersion(s) before you proceed.")
+
+#	Make default design matrix from group factor
+	if(is.null(design)) {
+		if(nlevels(y$samples$group)<2) stop("design not supplied and samples all belong to the same group")
+		design <- model.matrix(~y$samples$group)
+		rownames(design) <- colnames(y)
+	}
+	nbeta <- ncol(design)
+	if(nbeta < 2) stop("design matrix must have at least two columns")
+
+#	Check contrast
+	if(length(contrast) == 1) {
+		u <- rep.int(0, nbeta)
+		u[contrast] <- 1
+		contrast <- u
+	}
+	if(length(contrast) != nbeta) stop("length of contrast must match column dimension of design")
+	if(all(contrast==0)) stop("contrast all zero")
+
+#	Construct null hypothesis design matrix
+	QR <- qr(contrast)
+	design0 <- t(qr.qty(QR, t(design))[-1, , drop=FALSE])
+
+#	Null hypothesis fit
+	fit.null <- glmFit(y, design0, prior.count=0)
+
+#	Quantile residuals from null fit
+	y <- zscoreNBinom(y$counts, mu=fit.null$fitted.values, size=1/dispersion)
+
+	NextMethod("romer")
+}
diff --git a/R/subsetting.R b/R/subsetting.R
index 734143a..b4e148c 100644
--- a/R/subsetting.R
+++ b/R/subsetting.R
@@ -3,7 +3,7 @@
 assign("[.DGEList",
 function(object, i, j, keep.lib.sizes=TRUE)
 #  Subsetting for DGEList objects
-#  24 September 2009.  Last modified 5 June 2014.
+#  24 September 2009.  Last modified 8 Feb 2015.
 {  
 #	Recognized components
 	IJ <- c("counts","pseudo.counts","offset","weights")
@@ -14,7 +14,7 @@ function(object, i, j, keep.lib.sizes=TRUE)
 
 	out <- subsetListOfArrays(object,i,j,IJ=IJ,IX=IX,I=I,JX=JX)
 	if(!(missing(i) || keep.lib.sizes)) out$samples$lib.size <- colSums(out$counts)
-	if(!missing(j)) out$samples$group <- .check.factor(out$samples$group)
+	if(!missing(j)) out$samples$group <- dropEmptyLevels(out$samples$group)
 	out
 })
 
diff --git a/R/topTags.R b/R/topTags.R
index bf8241c..e2c1228 100644
--- a/R/topTags.R
+++ b/R/topTags.R
@@ -1,7 +1,7 @@
-topTags <- function(object,n=10,adjust.method="BH",sort.by="PValue") 
+topTags <- function(object,n=10,adjust.method="BH",sort.by="PValue",p.value=1) 
 #	Summary table of the n most differentially expressed tags
 #	Mark Robinson, Davis McCarthy, Gordon Smyth
-#	Created September 2008.  Last modified 25 Oct 2012.
+#	Created September 2008.  Last modified 31 Oct 2014 (Yunshun Chen).
 {
 #	Check object
 	if(is.null(object$table)) stop("Need to run exactTest or glmLRT first")
@@ -31,9 +31,9 @@ topTags <- function(object,n=10,adjust.method="BH",sort.by="PValue")
 
 #	Choose top genes
 	o <- switch(sort.by,
-		"logFC" = order(alfc,decreasing=TRUE)[1:n],
-		"PValue" = order(object$table$PValue,-alfc)[1:n],
-		"none" = 1:n
+		"logFC" = order(alfc,decreasing=TRUE),
+		"PValue" = order(object$table$PValue,-alfc),
+		"none" = 1:nrow(object$table)
 	)
 	tab <- object$table[o,]
 
@@ -50,10 +50,21 @@ topTags <- function(object,n=10,adjust.method="BH",sort.by="PValue")
 		if(is.null(dim(object$genes))) object$genes <- data.frame(ID=object$genes,stringsAsFactors=FALSE)
 		tab <- cbind(object$genes[o,,drop=FALSE], tab)
 	}
+	
+#	Thin out fit p.value threshold
+	if(p.value < 1) {
+		sig <- adj.p.val[o] <= p.value
+		sig[is.na(sig)] <- FALSE
+		tab <- tab[sig,]
+	}
 
+#	Enough rows left?
+	if(nrow(tab) < n) n <- nrow(tab)
+	if(n < 1) return(data.frame())
+		
 #	Output object
 	new("TopTags",list(
-		table=tab,
+		table=tab[1:n,],
 		adjust.method=adjust.method,
 		comparison=as.character(object$comparison),
 		test=test
diff --git a/R/zscoreNBinom.R b/R/zscoreNBinom.R
index cdce374..a4ccc65 100644
--- a/R/zscoreNBinom.R
+++ b/R/zscoreNBinom.R
@@ -2,22 +2,32 @@
 
 zscoreNBinom <- function(q, size, mu) 
 #  Z-score equivalents for negative binomial deviates
-#  Gordon Smyth
-#  10 December 2011
+#  Gordon Smyth, Aaron Lun
+#  created 10 December 2011
+#  last modified 9 March 2015
 {
 	z <- round(q)
 	n <- length(q)
 	size <- rep(size,length.out=n)
 	mu <- rep(mu,length.out=n)
-	d <- dnbinom(q,size=size,mu=mu)
+	d <- dnbinom(q,size=size,mu=mu, log=TRUE)
 	up <- (q >= mu)
+
 	if(any(up)) {
-		p <- pnbinom(q[up],size=size[up],mu=mu[up],lower.tail=FALSE)
-		z[up] <- qnorm(p+d[up]/2,lower.tail=FALSE)
+		p <- pnbinom(q[up],size=size[up],mu=mu[up],lower.tail=FALSE, log.p=TRUE)
+		refined.p <- p + suppressWarnings(log1p(exp(d[up] - p)/2))
+		z[up] <- suppressWarnings(qnorm(refined.p, lower.tail=FALSE, log.p=TRUE))
 	}
 	if(any(!up)) {
-		p <- pnbinom(q[!up],size=size[!up],mu=mu[!up],lower.tail=TRUE)
-		z[!up] <- qnorm(p-d[!up]/2,lower.tail=TRUE)
+		p <- pnbinom(q[!up],size=size[!up],mu=mu[!up],lower.tail=TRUE, log.p=TRUE)
+		refined.p <- p + suppressWarnings(log1p(-exp(d[!up] - p)/2))
+		z[!up] <- suppressWarnings(qnorm(refined.p,lower.tail=TRUE, log.p=TRUE))
+	}
+
+	not.fin <- !is.finite(z)
+	if (any(not.fin)) { 
+		z[not.fin] <- (2L*up[not.fin] -1L) * sqrt(nbinomUnitDeviance(q[not.fin], 
+			mean=mu[not.fin], dispersion=1/size[not.fin])) 
 	}
 	z
 }
diff --git a/build/vignette.rds b/build/vignette.rds
index d55a8e8..3114448 100644
Binary files a/build/vignette.rds and b/build/vignette.rds differ
diff --git a/inst/CITATION b/inst/CITATION
index 1729cef..48c71f3 100644
--- a/inst/CITATION
+++ b/inst/CITATION
@@ -31,7 +31,7 @@ citEntry(
    volume = 23,
    pages = 2881-2887,
    year = 2007,
-	textVersion = "Robinson MD and Smyth GK (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23, 2881-2887"
+   textVersion = "Robinson MD and Smyth GK (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23, 2881-2887"
 )
 
 citEntry(
diff --git a/inst/NEWS.Rd b/inst/NEWS.Rd
index 3e4697c..e6de424 100644
--- a/inst/NEWS.Rd
+++ b/inst/NEWS.Rd
@@ -2,6 +2,48 @@
 \title{edgeR News}
 \encoding{UTF-8}
 
+\section{Version 3.10.0}{\itemize{
+\item
+An DGEList method for romer() has been added, allowing access to rotation gene set enrichment analysis.
+
+\item
+New function dropEmptyLevels() to remove unused levels from a factor.
+
+\item
+New argument p.value for topTags(), allowing users to apply a p-value or FDR cutoff for the results.
+
+\item
+New argument prior.count for aveLogCPM().
+
+\item
+New argument pch for the plotMDS method for DGEList objects.
+Old argument col is now removed, but can be passed using ....
+Various other improvements to the plotMDS method for DGEList objects, better labelling of the axes and protection against degenerate dimensions.
+
+\item
+treatDGE() is renamed as glmTreat().
+It can now optionally work with either likelihood ratio tests or with quasi-likelihood F-tests.
+
+\item
+glmQLFit() is now an S3 generic function.
+
+\item
+glmQLFit() now breaks the output component s2.fit into three separate components: df.prior, var.post and var.prior.
+
+\item
+estimateDisp() now protects against fitted values of zeros, giving a more accurate estimate of prior.df.
+
+\item
+DGEList() now gives a message rather than an error when the count matrix has non-unique column names.
+
+\item
+Minor corrections to User's Guide.
+
+\item
+requireNamespace() is now used internally instead of require() to access functions in suggested packages.
+}}
+
+
 \section{Version 3.8.0}{\itemize{
 
 \item
diff --git a/inst/doc/edgeR.pdf b/inst/doc/edgeR.pdf
index 5f809ce..94ca095 100644
Binary files a/inst/doc/edgeR.pdf and b/inst/doc/edgeR.pdf differ
diff --git a/man/DGEExact-class.Rd b/man/DGEExact-class.Rd
index 36e9d1d..8c6430e 100644
--- a/man/DGEExact-class.Rd
+++ b/man/DGEExact-class.Rd
@@ -14,9 +14,9 @@ The genomic features are called genes, but in reality might correspond to transc
 
 Objects of this class contain the following list components:
 \describe{
-  \item{\code{table:}}{ data frame containing columns for the log2-fold-change, \code{logFC}, the average log2-counts-per-million, \code{logCPM}, and the two-sided p-value \code{PValue}.}
-  \item{\code{comparison:}}{ vector giving the two experimental groups/conditions being compared.}
-  \item{\code{genes:}}{ a data frame containing information about each gene (can be \code{NULL}).}
+  \item{\code{table}:}{ data frame containing columns for the log2-fold-change, \code{logFC}, the average log2-counts-per-million, \code{logCPM}, and the two-sided p-value \code{PValue}.}
+  \item{\code{comparison}:}{ vector giving the two experimental groups/conditions being compared.}
+  \item{\code{genes}:}{ a data frame containing information about each gene (can be \code{NULL}).}
 }
 }
 
diff --git a/man/DGEGLM-class.Rd b/man/DGEGLM-class.Rd
index cc0647d..496a901 100644
--- a/man/DGEGLM-class.Rd
+++ b/man/DGEGLM-class.Rd
@@ -10,22 +10,22 @@ A list-based S4 class for storing results of a GLM fit to each gene in a DGE dat
 
 \section{List Components}{
 For objects of this class, rows correspond to genomic features and columns to coefficients in the linear model.
-The genomic features are called genes, but in reality might correspond to transcripts, tags, exons etc.
+The genomic features are called gene, but in reality might correspond to transcripts, tags, exons, etc.
 
 Objects of this class contain the following list components:
 \describe{
-  \item{\code{coefficients:}}{ matrix containing the coefficients computed from fitting the model defined by the design matrix to each gene in the dataset.}
-  \item{\code{df.residual:}}{ vector containing the residual degrees of freedom for the model fit to each gene in the dataset.}
-  \item{\code{deviance:}}{ vector giving the deviance from the model fit to each gene.}
-  \item{\code{design:}}{ design matrix for the full model from the likelihood ratio test.}
-  \item{\code{offset:}}{ scalar, vector or matrix of offset values to be included in the GLMs for each gene.}
-  \item{\code{samples:}}{ data frame containing information about the samples comprising the dataset.}
-  \item{\code{genes:}}{ data frame containing information about the genes or tags for which we have DGE data (can be \code{NULL} if there is no information available).}
-  \item{\code{dispersion:}}{ scalar or vector providing the value of the dispersion parameter used in the negative binomial GLM for each gene.}
-  \item{\code{lib.size:}}{ vector providing the effective library size for each sample in the dataset.}
-  \item{\code{weights:}}{ matrix of weights used in the GLM fitting for each gene.}
-  \item{\code{fitted.values:}}{ the fitted (expected) values from the GLM for each gene.}
-  \item{\code{AveLogCPM:}}{ numeric vector giving average log2 counts per million for each gene.}
+  \item{\code{coefficients}:}{ matrix containing the coefficients computed from fitting the model defined by the design matrix to each gene in the dataset.}
+  \item{\code{df.residual}:}{ vector containing the residual degrees of freedom for the model fit to each gene in the dataset.}
+  \item{\code{deviance}:}{ vector giving the deviance from the model fit to each gene.}
+  \item{\code{design}:}{ design matrix for the full model from the likelihood ratio test.}
+  \item{\code{offset}:}{ scalar, vector or matrix of offset values to be included in the GLMs for each gene.}
+  \item{\code{samples}:}{ data frame containing information about the samples comprising the dataset.}
+  \item{\code{genes}:}{ data frame containing information about the tags for which we have DGE data (can be \code{NULL} if there is no information available).}
+  \item{\code{dispersion}:}{ scalar or vector providing the value of the dispersion parameter used in the negative binomial GLM for each gene.}
+  \item{\code{lib.size}:}{ vector providing the effective library size for each sample in the dataset.}
+  \item{\code{weights}:}{ matrix of weights used in the GLM fitting for each gene.}
+  \item{\code{fitted.values}:}{ the fitted (expected) values from the GLM for each gene.}
+  \item{\code{AveLogCPM}:}{ numeric vector giving average log2 counts per million for each gene.}
 }
 }
 
diff --git a/man/DGELRT-class.Rd b/man/DGELRT-class.Rd
index 751faca..185083e 100644
--- a/man/DGELRT-class.Rd
+++ b/man/DGELRT-class.Rd
@@ -14,18 +14,18 @@ The genomic features are called genes, but in reality might correspond to transc
 
 Objects of this class contain the following list components:
 \describe{
-  \item{\code{table:}}{ data frame containing the log-concentration (i.e. expression level), the log-fold change in expression between the two groups/conditions and the exact p-value for differential expression, for each gene.}
-  \item{\code{coefficients.full:}}{ matrix containing the coefficients
+  \item{\code{table}:}{ data frame containing the log-concentration (i.e. expression level), the log-fold change in expression between the two groups/conditions and the exact p-value for differential expression, for each gene.}
+  \item{\code{coefficients.full}:}{ matrix containing the coefficients
   computed from fitting the full model (fit using \code{glmFit} and a
   given design matrix) to each gene in the dataset.}
-  \item{\code{coefficients.null:}}{ matrix containing the coefficients
+  \item{\code{coefficients.null}:}{ matrix containing the coefficients
   computed from fitting the null model to each gene in the
   dataset.  The null model is the model to which the full model is
   compared, and is fit using \code{glmFit} and dropping selected
   column(s) (i.e. coefficient(s)) from the design matrix for the full model.}
-  \item{\code{design:}}{ design matrix for the full model from the likelihood
+  \item{\code{design}:}{ design matrix for the full model from the likelihood
   ratio test.}
-  \item{\code{\dots:}}{ if the argument \code{y} to \code{glmLRT} (which
+  \item{\code{\dots}:}{ if the argument \code{y} to \code{glmLRT} (which
   produces the \code{DGELRT} object) was itself a \code{DGEList} object, then
   the \code{DGELRT} will contain all of the elements of \code{y},
   except for the table of counts and the table of pseudocounts.}
diff --git a/man/DGEList-class.Rd b/man/DGEList-class.Rd
index 3ee757d..2d11ae1 100644
--- a/man/DGEList-class.Rd
+++ b/man/DGEList-class.Rd
@@ -12,18 +12,18 @@ For objects of this class, rows correspond to genomic features and columns to sa
 The genomic features are called genes, but in reality might correspond to transcripts, tags, exons etc.
 Objects of this class contain the following essential list components:
 \describe{
-  \item{\code{counts:}}{ numeric matrix of read counts, one row for each gene and one column for each sample.}
-  \item{\code{samples:}}{ data.frame with a row for each sample and columns \code{group}, \code{lib.size} and \code{norm.factors} containing the group labels, library sizes and normalization factors.
+  \item{\code{counts}:}{ numeric matrix of read counts, one row for each gene and one column for each sample.}
+  \item{\code{samples}:}{ data.frame with a row for each sample and columns \code{group}, \code{lib.size} and \code{norm.factors} containing the group labels, library sizes and normalization factors.
   Other columns can be optionally added to give more detailed sample information.}
 }
 Optional components include:
 \describe{
-  \item{\code{genes:}}{ data.frame giving annotation information for each gene.  Same number of rows as \code{counts}.}
-  \item{\code{AveLogCPM:}}{ numeric vector giving average log2 counts per million for each gene.}
-  \item{\code{common.dispersion:}}{ numeric scalar giving the overall dispersion estimate.}
-  \item{\code{trended.dispersion:}}{ numeric vector giving trended dispersion estimates for each gene.}
-  \item{\code{tagwise.dispersion:}}{ numeric vector giving tagwise dispersion estimates for each gene.}
-  \item{\code{offset:}}{ numeric matrix of same size as \code{counts} giving offsets for use in log-linear models.}
+  \item{\code{genes}:}{ data.frame giving annotation information for each gene.  Same number of rows as \code{counts}.}
+  \item{\code{AveLogCPM}:}{ numeric vector giving average log2 counts per million for each gene.}
+  \item{\code{common.dispersion}:}{ numeric scalar giving the overall dispersion estimate.}
+  \item{\code{trended.dispersion}:}{ numeric vector giving trended dispersion estimates for each gene.}
+  \item{\code{tagwise.dispersion}:}{ numeric vector giving tagwise dispersion estimates for each gene (note that `tag' and `gene' are synonymous here).}
+  \item{\code{offset}:}{ numeric matrix of same size as \code{counts} giving offsets for use in log-linear models.}
 }
 }
 
diff --git a/man/DGEList.Rd b/man/DGEList.Rd
index 5c6fefa..7d20f42 100644
--- a/man/DGEList.Rd
+++ b/man/DGEList.Rd
@@ -20,7 +20,7 @@ DGEList(counts = matrix(0, 0, 0), lib.size = colSums(counts),
   \item{lib.size}{numeric vector giving the total count (sequence depth) for each library.}
   \item{norm.factors}{numeric vector of normalization factors that modify the library sizes.}
   \item{group}{vector or factor giving the experimental group/condition for each sample/library.}
-  \item{genes}{data frame containing annotation information for the tags/transcripts/genes.}
+  \item{genes}{data frame containing annotation information for each gene.}
   \item{remove.zeros}{logical, whether to remove rows that have 0 total count.}
 }
 
diff --git a/man/asdataframe.Rd b/man/asdataframe.Rd
index 19f83db..f606109 100644
--- a/man/asdataframe.Rd
+++ b/man/asdataframe.Rd
@@ -16,7 +16,7 @@ Turn a \code{TopTags} object into a \code{data.frame}.
   \item{\dots}{additional arguments to be passed to or from methods.}
 }
 \details{
-This method combines all the components of \code{x} which have a row for each tag (transcript) into a \code{data.frame}.
+This method combines all the components of \code{x} which have a row for each gene into a \code{data.frame}.
 }
 \value{
 A data.frame.
diff --git a/man/aveLogCPM.Rd b/man/aveLogCPM.Rd
index b9e2a95..e9c6da4 100644
--- a/man/aveLogCPM.Rd
+++ b/man/aveLogCPM.Rd
@@ -17,9 +17,9 @@ Compute average log2 counts-per-million for each row of counts.
 }
 
 \arguments{
-\item{y}{numeric matrix containing counts.  Rows for tags and columns for libraries.}
+\item{y}{numeric matrix containing counts.  Rows for genes and columns for libraries.}
 \item{normalized.lib.sizes}{logical, use normalized library sizes?}
-\item{prior.count}{average value to be added to each count, to avoid infinite values on the log-scale.}
+\item{prior.count}{numeric scalar or vector of length \code{nrow(y)}, containing the average value(s) to be added to each count to avoid infinite values on the log-scale.}
 \item{dispersion}{numeric scalar or vector of negative-binomial dispersions. Defaults to 0.05.}
 \item{lib.size}{numeric vector of library sizes. Defaults to \code{colSums(y)}. Ignored if \code{offset} is not \code{NULL}.}
 \item{offset}{numeric matrix of offsets for the log-linear models.}
@@ -30,6 +30,7 @@ Compute average log2 counts-per-million for each row of counts.
 \details{
 This function uses \code{mglmOneGroup} to compute average counts-per-million (AveCPM) for each row of counts, and returns log2(AveCPM).
 An average value of \code{prior.count} is added to the counts before running \code{mglmOneGroup}.
+If \code{prior.count} is a vector, each entry will be added to all counts in the corresponding row of \code{y}.
 
 This function is similar to
 
@@ -57,10 +58,14 @@ cpm(y, log=TRUE, prior.count=2)
 aveLogCPM(y,prior.count=0,dispersion=0)
 cpms <- rowSums(y)/sum(lib.size)*1e6
 log2(cpms)
+
+# The function works perfectly with prior.count or dispersion vectors:
+aveLogCPM(y, prior.count=runif(nrow(y), 1, 5))
+aveLogCPM(y, dispersion=runif(nrow(y), 0, 0.2))
 }
 
 \seealso{
-See \code{\link{cpm}} for individual logCPM values, rather than tagwise averages.
+See \code{\link{cpm}} for individual logCPM values, rather than genewise averages.
 
 The computations for \code{aveLogCPM} are done by \code{\link{mglmOneGroup}}.
 }
diff --git a/man/binomTest.Rd b/man/binomTest.Rd
index 7618bec..a013247 100644
--- a/man/binomTest.Rd
+++ b/man/binomTest.Rd
@@ -3,8 +3,8 @@
 
 \title{Exact Binomial Tests for Comparing Two Digital Libraries}
 \description{
-Computes p-values for differential abundance for each tag between two digital libraries,
-conditioning on the total count for each tag.
+Computes p-values for differential abundance for each gene between two digital libraries,
+conditioning on the total count for each gene.
 The counts in each group as a proportion of the whole are assumed to follow a binomial distribution.
 }
 
@@ -13,23 +13,23 @@ binomTest(y1, y2, n1=sum(y1), n2=sum(y2), p=n1/(n1+n2))
 }
 
 \arguments{
-\item{y1}{integer vector giving counts in first library.
+\item{y1}{integer vector giving the count for each gene in the first library.
 Non-integer values are rounded to the nearest integer.}
-\item{y2}{integer vector giving counts in second library.
-Of same length as \code{x}.
+\item{y2}{integer vector giving the count for each gene in the second library.
+Of same length as \code{y1}.
 Non-integer values are rounded to the nearest integer.}
-\item{n1}{total number of tags in first library.
+\item{n1}{total number of counts in the first library, across all genes.
 Non-integer values are rounded to the nearest integer. Not required if \code{p} is supplied.}
-\item{n2}{total number of tags in second library.
+\item{n2}{total number of counts in the second library, across all genes.
 Non-integer values are rounded to the nearest integer. Not required if \code{p} is supplied.}
-\item{p}{expected proportion of y1 to the total under the null hypothesis.}
+\item{p}{expected proportion of \code{y1} to the total for each gene under the null hypothesis.}
 }
 
 \details{
 This function can be used to compare two libraries from SAGE, RNA-Seq, ChIP-Seq or other sequencing technologies with respect to technical variation.
 
-An exact two-sided binomial test is computed for each tag.
-This test is closely related to Fisher's exact test for 2x2 contingency tables but, unlike Fisher's test, it conditions on the total number of counts for each tag.
+An exact two-sided binomial test is computed for each gene.
+This test is closely related to Fisher's exact test for 2x2 contingency tables but, unlike Fisher's test, it conditions on the total number of counts for each gene.
 The null hypothesis is that the expected counts are in the same proportions as the library sizes, i.e., that the binomial probability for the first library is \code{n1/(n1+n2)}.
 
 The two-sided rejection region is chosen analogously to Fisher's test.
diff --git a/man/calcNormFactors.Rd b/man/calcNormFactors.Rd
index 9d17f19..9a79136 100644
--- a/man/calcNormFactors.Rd
+++ b/man/calcNormFactors.Rd
@@ -21,7 +21,7 @@ Calculate normalization factors to scale the raw library sizes.
   \item{object}{either a \code{matrix} of raw (read) counts or a \code{DGEList} object}
   \item{lib.size}{numeric vector of library sizes of the \code{object}.}
   \item{method}{normalization method to be used}
-  \item{refColumn}{column to use as reference for \code{method="TMM"}. Can be a column number or a numeric vector of length \code{nrow(object))}.}
+  \item{refColumn}{column to use as reference for \code{method="TMM"}. Can be a column number or a numeric vector of length \code{nrow(object)}.}
   \item{logratioTrim}{amount of trim to use on log-ratios ("M" values) for \code{method="TMM"}}
   \item{sumTrim}{amount of trim to use on the combined absolute levels ("A" values) for \code{method="TMM"}}
   \item{doWeighting}{logical, whether to compute (asymptotic binomial precision) weights for \code{method="TMM"}}
@@ -38,7 +38,7 @@ If \code{refColumn} is unspecified, the library whose upper quartile is closest
 We call it "relative log expression", as median library is calculated from the geometric mean of all columns and the median ratio of each sample to the median library is taken as the scale factor.
 
 \code{method="upperquartile"} is the upper-quartile normalization method of Bullard et al (2010), in which the scale factors are calculated from the 75\% quantile of the counts for each library,
-after removing transcripts which are zero in all libraries.
+after removing genes which are zero in all libraries.
 This idea is generalized here to allow scaling by any quantile of the distributions.
 
 If \code{method="none"}, then the normalization factors are set to 1.
diff --git a/man/commonCondLogLikDerDelta.Rd b/man/commonCondLogLikDerDelta.Rd
index 16e6d0d..ca13f09 100644
--- a/man/commonCondLogLikDerDelta.Rd
+++ b/man/commonCondLogLikDerDelta.Rd
@@ -21,7 +21,7 @@ commonCondLogLikDerDelta(y, delta, der = 0)
 
 
 \details{
-The common conditional log-likelihood is constructed by summing over all of the individual tag conditional log-likelihoods. The common conditional log-likelihood is taken as a function of the dispersion parameter (\code{phi}), and here parameterized in terms of delta (\code{phi / (phi+1)}). The value of delta that maximizes the common conditional log-likelihood is converted back to the \code{phi} scale, and this value is the estimate of the common dispersion parameter used by all tags.
+The common conditional log-likelihood is constructed by summing over all of the individual genewise conditional log-likelihoods. The common conditional log-likelihood is taken as a function of the dispersion parameter (\code{phi}), and here parameterized in terms of delta (\code{phi / (phi+1)}). The value of delta that maximizes the common conditional log-likelihood is converted back to the \code{phi} scale, and this value is the estimate of the common dispersion parameter used by all genes.
 }
 
 \author{Davis McCarthy}
@@ -39,4 +39,4 @@ ll2<-commonCondLogLikDerDelta(y,delta=0.5,der=1)
 }
 
 
-\keyword{file}
\ No newline at end of file
+\keyword{file}
diff --git a/man/condLogLikDerSize.Rd b/man/condLogLikDerSize.Rd
index 0fe8a9f..8ca3203 100755
--- a/man/condLogLikDerSize.Rd
+++ b/man/condLogLikDerSize.Rd
@@ -4,7 +4,7 @@
 
 \title{Conditional Log-Likelihood of the Dispersion for a Single Group of Replicate Libraries}
 
-\description{Derivatives of the negative-binomial log-likelihood with respect to the dispersion parameter for each tag/transcript, conditional on the mean count, for a single group of replicate libraries of the same size.}
+\description{Derivatives of the negative-binomial log-likelihood with respect to the dispersion parameter for each gene, conditional on the mean count, for a single group of replicate libraries of the same size.}
 
 \usage{
 condLogLikDerSize(y, r, der=1L)
@@ -20,7 +20,7 @@ condLogLikDerDelta(y, delta, der=1L)
 
 \value{vector of row-wise derivatives with respect to \code{r} or \code{delta}}
 
-\details{The library sizes must be equalized before running this function. This function carries out the actual mathematical computations for the conditional log-likelihood and its derivatives, calculating the conditional log-likelihood for each tag/transcript.
+\details{The library sizes must be equalized before running this function. This function carries out the actual mathematical computations for the conditional log-likelihood and its derivatives, calculating the conditional log-likelihood for each gene.
 Derivatives are with respect to either the size (\code{r}) or the delta parametrization (\code{delta}) of the dispersion.
 }
 
diff --git a/man/cpm.Rd b/man/cpm.Rd
index 80f606d..45cc321 100644
--- a/man/cpm.Rd
+++ b/man/cpm.Rd
@@ -29,7 +29,7 @@
 \value{numeric matrix of CPM or RPKM values.}
 
 \details{
-CPM or RPKM values are useful descriptive measures for the expression level of a gene or transcript.
+CPM or RPKM values are useful descriptive measures for the expression level of a gene.
 By default, the normalized library sizes are used in the computation for \code{DGEList} objects but simple column sums for matrices.
 
 If log-values are computed, then a small count, given by \code{prior.count} but scaled to be proportional to the library size, is added to \code{x} to avoid taking the log of zero.
diff --git a/man/decidetestsDGE.Rd b/man/decidetestsDGE.Rd
index 5fe522f..06ebc3e 100644
--- a/man/decidetestsDGE.Rd
+++ b/man/decidetestsDGE.Rd
@@ -9,8 +9,8 @@ A number of different multiple testing schemes are offered which adjust for mult
 decideTestsDGE(object, adjust.method="BH", p.value=0.05, lfc=0)
 }
 \arguments{
-  \item{object}{\code{deDGElist} object, output from \code{exactTest},
-  or \code{DGELRT} object, output from \code{DGELRT}, from which
+  \item{object}{\code{DGEExact} object, output from \code{exactTest},
+  or \code{DGELRT} object, output from \code{glmLRT} or \code{glmQLFTest}, from which
   p-values for differential expression and log-fold change values may be extracted.}
 
  \item{adjust.method}{character string specifying p-value adjustment method.  Possible values are \code{"none"}, \code{"BH"}, \code{"fdr"} (equivalent to \code{"BH"}), \code{"BY"} and \code{"holm"}. See \code{\link[stats]{p.adjust}} for details.}
diff --git a/man/dglmStdResid.Rd b/man/dglmStdResid.Rd
index c520f8e..897b01f 100644
--- a/man/dglmStdResid.Rd
+++ b/man/dglmStdResid.Rd
@@ -4,7 +4,7 @@
 
 \title{Visualize the mean-variance relationship in DGE data using standardized residuals}
 
-\description{Appropriate modelling of the mean-variance relationship in DGE data is important for making inferences about differential expression. However, the standard approach to visualizing the mean-variance relationship is not appropriate for general, complicated experimental designs that require generalized linear models (GLMs) for analysis. Here are functions to compute standardized residuals from a Poisson GLM and plot them for bins based on overall expression level of tags as a w [...]
+\description{Appropriate modelling of the mean-variance relationship in DGE data is important for making inferences about differential expression. However, the standard approach to visualizing the mean-variance relationship is not appropriate for general, complicated experimental designs that require generalized linear models (GLMs) for analysis. Here are functions to compute standardized residuals from a Poisson GLM and plot them for bins based on overall expression level of genes as a  [...]
 
 
 \usage{
@@ -14,11 +14,11 @@ getDispersions(binned.object)
 }
 
 \arguments{ 
-\item{y}{numeric matrix of counts, each row represents one tag, each column represents one DGE library.}
+\item{y}{numeric matrix of counts, each row represents one genes, each column represents one DGE library.}
 
 \item{design}{numeric matrix giving the design matrix of the GLM. Assumed to be full column rank.}
 
-\item{dispersion}{numeric scalar or vector giving the dispersion parameter for each GLM. Can be a scalar giving one value for all tags, or a vector of length equal to the number of tags giving tag-wise dispersions.}
+\item{dispersion}{numeric scalar or vector giving the dispersion parameter for each GLM. Can be a scalar giving one value for all genes, or a vector of length equal to the number of genes giving genewise dispersions.}
 
 \item{offset}{numeric vector or matrix giving the offset that is to be included in teh log-linear model predictor. Can be a vector of length equal to the number of libraries, or a matrix of the same size as \code{y}.}
 
@@ -36,16 +36,16 @@ getDispersions(binned.object)
 
 }
 
-\value{ \code{dglmStdResid} produces a mean-variance plot based on standardized residuals from a Poisson model fitfor each tag for the DGE data. \code{dglmStdResid} returns a list with the following elements:
+\value{ \code{dglmStdResid} produces a mean-variance plot based on standardized residuals from a Poisson model fit for each gene for the DGE data. \code{dglmStdResid} returns a list with the following elements:
 	\item{ave.means}{vector of the average expression level within each bin of observations}
-	\item{ave.std.resid}{vector of the average standardized Poisson residual within each bin of tags}
+	\item{ave.std.resid}{vector of the average standardized Poisson residual within each bin of genes}
 	\item{bin.means}{list containing the average (mean) expression level (given by the fitted value from the given Poisson model) for observations divided into bins based on amount of expression}
 	\item{bin.std.resid}{list containing the standardized residual from the given Poisson model for observations divided into bins based on amount of expression}
 	\item{means}{vector giving the fitted value for each observed count}
 	\item{standardized.residuals}{vector giving approximate standardized residual for each observed count}
 	\item{bins}{list containing the indices for the observations, assigning them to bins}
 	\item{nbins}{scalar giving the number of bins used to split up the observed counts}
-	\item{ngenes}{scalar giving the number of genes/tags in the dataset}
+	\item{ngenes}{scalar giving the number of genes in the dataset}
 	\item{nlibs}{scalar giving the number of libraries in the dataset}
 
 \code{getDispersions} computes the dispersion from the standardized residuals and returns a list with the following components:
@@ -56,9 +56,9 @@ getDispersions(binned.object)
 }
 
 \details{
-This function is useful for exploring the mean-variance relationship in the data. Raw or pooled variances cannot be used for complex experimental designs, so instead we can fit a Poisson model using the appropriate design matrix to each tag and use the standardized residuals in place of the pooled variance (as in \code{plotMeanVar}) to visualize the mean-variance relationship in the data. The function will plot the average standardized residual for observations split into \code{nbins} bi [...]
+This function is useful for exploring the mean-variance relationship in the data. Raw or pooled variances cannot be used for complex experimental designs, so instead we can fit a Poisson model using the appropriate design matrix to each gene and use the standardized residuals in place of the pooled variance (as in \code{plotMeanVar}) to visualize the mean-variance relationship in the data. The function will plot the average standardized residual for observations split into \code{nbins} b [...]
 
-The function \code{mglmLS} is used to fit the Poisson models to the data. This code is fast for fitting models, but does not compute the value for the leverage, technically required to compute the standardized residuals. Here, we approximate the standardized residuals by replacing the usual denominator of \code{ ( 1 - leverage )} by \code{ ( 1 - p/n ) }, where n is the number of observations per tag (i.e. number of libraries) and p is the number of parameters in the model (i.e. number of [...]
+The function \code{mglmLS} is used to fit the Poisson models to the data. This code is fast for fitting models, but does not compute the value for the leverage, technically required to compute the standardized residuals. Here, we approximate the standardized residuals by replacing the usual denominator of \code{ ( 1 - leverage )} by \code{ ( 1 - p/n ) }, where n is the number of observations per gene (i.e. number of libraries) and p is the number of parameters in the model (i.e. number o [...]
 
 }
 
diff --git a/man/dim.Rd b/man/dim.Rd
index fbcf5dc..f03f31a 100644
--- a/man/dim.Rd
+++ b/man/dim.Rd
@@ -11,7 +11,7 @@
 \alias{length.DGELRT}
 \title{Retrieve the Dimensions of a DGEList, DGEExact, DGEGLM, DGELRT or TopTags Object}
 \description{
-Retrieve the number of rows (transcripts) and columns (libraries) for an DGEList, DGEExact or TopTags Object.
+Retrieve the number of rows (genes) and columns (libraries) for an DGEList, DGEExact or TopTags Object.
 }
 \usage{
 \method{dim}{DGEList}(x)
@@ -21,14 +21,14 @@ Retrieve the number of rows (transcripts) and columns (libraries) for an DGEList
   \item{x}{an object of class \code{DGEList}, \code{DGEExact}, \code{TopTags}, \code{DGEGLM} or \code{DGELRT}}
 }
 \details{
-Digital gene expression data objects share many analogies with ordinary matrices in which the rows correspond to transcripts or genes and the columns to arrays.
+Digital gene expression data objects share many analogies with ordinary matrices in which the rows correspond to genes and the columns to arrays.
 These methods allow one to extract the size of microarray data objects in the same way that one would do for ordinary matrices.
 
 A consequence is that row and column commands \code{nrow(x)}, \code{ncol(x)} and so on also work.
 }
 \value{
 Numeric vector of length 2.
-The first element is the number of rows (genes) and the second is the number of columns (arrays).
+The first element is the number of rows (genes) and the second is the number of columns (libraries).
 }
 \author{Gordon Smyth, Davis McCarthy}
 \seealso{
diff --git a/man/dispBinTrend.Rd b/man/dispBinTrend.Rd
index 59083d6..012dfd4 100644
--- a/man/dispBinTrend.Rd
+++ b/man/dispBinTrend.Rd
@@ -16,7 +16,7 @@ dispBinTrend(y, design=NULL, offset=NULL, df = 5, span=0.3, min.n=400,
 \arguments{
 \item{y}{numeric matrix of counts}
 \item{design}{numeric matrix giving the design matrix for the GLM that is to be fit.}
-\item{offset}{numeric scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the transcripts. If a scalar, then this value will be used as an offset for all transcripts and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each transcript. If a matrix, then each library for each transcript can have a unique offset, if desired [...]
+\item{offset}{numeric scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the genes. If a scalar, then this value will be used as an offset for all genes and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each gene. If a matrix, then each library for each gene can have a unique offset, if desired. In \code{adjustedProfi [...]
 \item{df}{degrees of freedom for spline curve.}
 \item{span}{span used for loess curve.}
 \item{min.n}{minimim number of genes in a bins.}
@@ -51,10 +51,10 @@ McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of mul
 
 \author{Davis McCarthy and Gordon Smyth}
 \examples{
-ntags <- 1000
+ngenes <- 1000
 nlibs <- 4
-means <- seq(5,10000,length.out=ntags)
-y <- matrix(rnbinom(ntags*nlibs,mu=rep(means,nlibs),size=0.1*means),nrow=ntags,ncol=nlibs)
+means <- seq(5,10000,length.out=ngenes)
+y <- matrix(rnbinom(ngenes*nlibs,mu=rep(means,nlibs),size=0.1*means),nrow=ngenes,ncol=nlibs)
 keep <- rowSums(y) > 0
 y <- y[keep,]
 group <- factor(c(1,1,2,2))
diff --git a/man/dispCoxReid.Rd b/man/dispCoxReid.Rd
index 289ead3..4d7f877 100644
--- a/man/dispCoxReid.Rd
+++ b/man/dispCoxReid.Rd
@@ -64,7 +64,7 @@ Robinson and Smyth (2008) and McCarthy et al (2011) showed the Cox-Reid estimato
 
 \code{dispCoxReid} uses \code{optimize} to maximize the adjusted profile likelihood.
 \code{dispDeviance} uses \code{uniroot} to solve the estimating equation.
-The robust options use an order statistic instead the mean statistic, and have the effect that a minority of tags with very large (outlier) dispersions should have limited influence on the estimated value.
+The robust options use an order statistic instead the mean statistic, and have the effect that a minority of genes with very large (outlier) dispersions should have limited influence on the estimated value.
 \code{dispPearson} uses a globally convergent Newton iteration.
 }
 
@@ -84,9 +84,9 @@ McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of mul
 
 \author{Gordon Smyth}
 \examples{
-ntags <- 100
+ngenes <- 100
 nlibs <- 4
-y <- matrix(rnbinom(ntags*nlibs,mu=10,size=10),nrow=ntags,ncol=nlibs)
+y <- matrix(rnbinom(ngenes*nlibs,mu=10,size=10),nrow=ngenes,ncol=nlibs)
 group <- factor(c(1,1,2,2))
 lib.size <- rowSums(y)
 design <- model.matrix(~group)
diff --git a/man/dispCoxReidInterpolateTagwise.Rd b/man/dispCoxReidInterpolateTagwise.Rd
index d7beabf..d2c27cc 100644
--- a/man/dispCoxReidInterpolateTagwise.Rd
+++ b/man/dispCoxReidInterpolateTagwise.Rd
@@ -1,10 +1,10 @@
 \name{dispCoxReidInterpolateTagwise}
 \alias{dispCoxReidInterpolateTagwise}
 
-\title{Estimate Tagwise Dispersion for Negative Binomial GLMs by Cox-Reid Adjusted Profile Likelihood}
+\title{Estimate Genewise Dispersion for Negative Binomial GLMs by Cox-Reid Adjusted Profile Likelihood}
 
 \description{
-Estimate tagwise dispersion parameters across multiple negative binomial generalized linear models using weighted Cox-Reid Adjusted Profile-likelihood and cubic spline interpolation over a tagwise grid.
+Estimate genewise dispersion parameters across multiple negative binomial generalized linear models using weighted Cox-Reid Adjusted Profile-likelihood and cubic spline interpolation over a genewise grid.
 }
 
 \usage{
@@ -19,33 +19,35 @@ dispCoxReidInterpolateTagwise(y, design, offset=NULL, dispersion, trend=TRUE,
 
 \item{design}{numeric matrix giving the design matrix for the GLM that is to be fit.}
 
-\item{offset}{numeric scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the transcripts. If a scalar, then this value will be used as an offset for all transcripts and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each transcript. If a matrix, then each library for each transcript can have a unique offset, if desired [...]
+\item{offset}{numeric scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the genes. If a scalar, then this value will be used as an offset for all genes and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each gene. If a matrix, then each library for each gene can have a unique offset, if desired. In \code{adjustedProfi [...]
 
-\item{dispersion}{numeric scalar or vector giving the dispersion(s) towards which the tagwise dispersion parameters are shrunk.}
+\item{dispersion}{numeric scalar or vector giving the dispersion(s) towards which the genewise dispersion parameters are shrunk.}
 
 \item{trend}{logical, whether abundance-dispersion trend is used for smoothing.}
 
-\item{AveLogCPM}{numeric vector giving average log2 counts per million for each tag.}
+\item{AveLogCPM}{numeric vector giving average log2 counts per million for each gene.}
 
-\item{min.row.sum}{numeric scalar giving a value for the filtering out of low abundance tags. Only tags with total sum of counts above this value are used. Low abundance tags can adversely affect the estimation of the common dispersion, so this argument allows the user to select an appropriate filter threshold for the tag abundance.}
+\item{min.row.sum}{numeric scalar giving a value for the filtering out of low abundance genes. Only genes with total sum of counts above this value are used. Low abundance genes can adversely affect the estimation of the common dispersion, so this argument allows the user to select an appropriate filter threshold for the gene abundance.}
 
-\item{prior.df}{numeric scalar, prior degsmoothing parameter that indicates the weight to give to the common likelihood compared to the individual tag's likelihood; default \code{getPriorN(object)} gives a value for \code{prior.n} that is equivalent to giving the common likelihood 20 prior degrees of freedom in the estimation of the tag/genewise dispersion.}
+\item{prior.df}{numeric scalar, prior degsmoothing parameter that indicates the weight to give to the common likelihood compared to the individual gene's likelihood; default \code{getPriorN(object)} gives a value for \code{prior.n} that is equivalent to giving the common likelihood 20 prior degrees of freedom in the estimation of the genewise dispersion.}
 
 \item{span}{numeric parameter between 0 and 1 specifying proportion of data to be used in the local regression moving window. Larger numbers give smoother fits.}
 
-\item{grid.npts}{numeric scalar, the number of points at which to place knots for the spline-based estimation of the tagwise dispersion estimates.}
+\item{grid.npts}{numeric scalar, the number of points at which to place knots for the spline-based estimation of the genewise dispersion estimates.}
 
-\item{grid.range}{numeric vector of length 2, giving relative range, in terms of \code{log2(dispersion)}, on either side of trendline for each tag for spline grid points.}
+\item{grid.range}{numeric vector of length 2, giving relative range, in terms of \code{log2(dispersion)}, on either side of trendline for each gene for spline grid points.}
 \item{weights}{optional numeric matrix giving observation weights}
 }
 
-\value{\code{dispCoxReidInterpolateTagwise} produces a vector of tagwise dispersions having the same length as the number of genes in the count data.
+\value{\code{dispCoxReidInterpolateTagwise} produces a vector of genewise dispersions having the same length as the number of genes in the count data.
 }
 
 \details{
 In the \code{edgeR} context, \code{dispCoxReidInterpolateTagwise} is a low-level function called by \code{estimateGLMTagwiseDisp}.
 
-\code{dispCoxReidInterpolateTagwise} calls the function \code{maximizeInterpolant} to fit cubic spline interpolation over a tagwise grid. 
+\code{dispCoxReidInterpolateTagwise} calls the function \code{maximizeInterpolant} to fit cubic spline interpolation over a genewise grid. 
+
+Note that the terms `tag' and `gene' are synonymous here. The function is only named `Tagwise' for historical reasons.
 }
 
 
diff --git a/man/dispCoxReidSplineTrend.Rd b/man/dispCoxReidSplineTrend.Rd
index 9773f25..6485b5c 100644
--- a/man/dispCoxReidSplineTrend.Rd
+++ b/man/dispCoxReidSplineTrend.Rd
@@ -21,13 +21,13 @@ dispCoxReidPowerTrend(y, design, offset=NULL, subset=10000, AveLogCPM=NULL,
 
 \item{design}{numeric matrix giving the design matrix for the GLM that is to be fit.}
 
-\item{offset}{numeric scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the transcripts. If a scalar, then this value will be used as an offset for all transcripts and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each transcript. If a matrix, then each library for each transcript can have a unique offset, if desired [...]
+\item{offset}{numeric scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the genes. If a scalar, then this value will be used as an offset for all genes and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each gene. If a matrix, then each library for each gene can have a unique offset, if desired. In \code{adjustedProfi [...]
 
 \item{df}{integer giving the degrees of freedom of the spline function, see \code{ns} in the splines package.}
 
 \item{subset}{integer, number of rows to use in the calculation.  Rows used are chosen evenly spaced by AveLogCPM using \code{\link{cutWithMinN}}.}
 
-\item{AveLogCPM}{numeric vector giving average log2 counts per million for each gene}
+\item{AveLogCPM}{numeric vector giving average log2 counts per million for each gene.}
 
 \item{method.optim}{the method to be used in \code{optim}. See \code{\link{optim}} for more detail.}
 
@@ -35,14 +35,14 @@ dispCoxReidPowerTrend(y, design, offset=NULL, subset=10000, AveLogCPM=NULL,
 }
 
 \value{
-List containing numeric vectors \code{dispersion} and \code{abundance} containing the estimated dispersion and abundance for each transcript.
+List containing numeric vectors \code{dispersion} and \code{abundance} containing the estimated dispersion and abundance for each gene.
 The vectors are of the same length as \code{nrow(y)}.
 }
 
 \details{
 In the \code{edgeR} context, these are low-level functions called by \code{estimateGLMTrendedDisp}.
 
-\code{dispCoxReidSplineTrend} and \code{dispCoxReidPowerTrend} fit abundance trends to the tagwise dispersions.
+\code{dispCoxReidSplineTrend} and \code{dispCoxReidPowerTrend} fit abundance trends to the genewise dispersions.
 \code{dispCoxReidSplineTrend} fits a regression spline whereas \code{dispCoxReidPowerTrend} fits a log-linear trend of the form \code{a*exp(abundance)^b+c}.
 In either case, \code{optim} is used to maximize the adjusted profile likelihood (Cox and Reid, 1987).
 }
diff --git a/man/dropEmptyLevels.Rd b/man/dropEmptyLevels.Rd
new file mode 100644
index 0000000..e50bdad
--- /dev/null
+++ b/man/dropEmptyLevels.Rd
@@ -0,0 +1,32 @@
+\name{dropEmptyLevels}
+\alias{dropEmptyLevels}
+\title{Drop Levels of a Factor that Never Occur}
+\description{
+Reform a factor so that only necessary levels are kept.
+}
+\usage{
+dropEmptyLevels(x)
+}
+\arguments{
+  \item{x}{a factor or a vector to be converted to a factor.}
+}
+\details{
+In general, the levels of a factor, \code{levels(x)}, may include values that never actually occur.
+This function drops any levels of that do not occur.
+
+If \code{x} is not a factor, then the function returns \code{factor(x)}.
+If \code{x} is a factor, then the function returns the same value as \code{factor(x)} or \code{x[,drop=TRUE]} but somewhat more efficiently.
+}
+\value{
+A factor with the same values as \code{x} but with a possibly reduced set of levels.
+}
+\author{Gordon Smyth}
+\seealso{
+   \code{\link{factor}}.
+}
+
+\examples{
+x <- factor(c("a","b"), levels=c("c","b","a"))
+x
+dropEmptyLevels(x)
+}
diff --git a/man/edgeR-package.Rd b/man/edgeR-package.Rd
index 8e396d0..fa6d6ce 100644
--- a/man/edgeR-package.Rd
+++ b/man/edgeR-package.Rd
@@ -9,6 +9,9 @@ edgeR is a package for the analysis of digital gene expression data arising from
 Particular strengths of the package include the ability to estimate biological variation between replicate libraries, and to conduct exact tests of significance which are suitable for small counts.
 The package is able to make use of even minimal numbers of replicates.
 
+The supplied counts are assumed to be those of genes in a RNA-seq experiment.
+However, counts can be supplied for any genomic feature of interest, e.g., tags, transcripts, exons, or even arbitrary intervals of the genome.
+
 An extensive User's Guide is available, and can be opened by typing \code{edgeRUsersGuide()} at the R prompt.
 Detailed help pages are also provided for each individual function.
 
diff --git a/man/estimateCommonDisp.Rd b/man/estimateCommonDisp.Rd
index 6ad981f..c291994 100644
--- a/man/estimateCommonDisp.Rd
+++ b/man/estimateCommonDisp.Rd
@@ -4,7 +4,7 @@
 \title{Estimate Common Negative Binomial Dispersion by Conditional Maximum Likelihood}
 
 \description{
-Maximizes the negative binomial conditional common likelihood to give the estimate of the common dispersion across all tags.
+Maximizes the negative binomial conditional common likelihood to give the estimate of the common dispersion across all genes.
 }
 
 \usage{
@@ -16,8 +16,8 @@ estimateCommonDisp(object, tol=1e-06, rowsum.filter=5, verbose=FALSE)
 
 \item{tol}{the desired accuracy, passed to \code{\link{optimize}}}
 
-\item{rowsum.filter}{numeric scalar giving a value for the filtering out of low abundance tags in the estimation of the common dispersion.
-Only tags with total sum of counts above this value are used in the estimation of the common dispersion.}
+\item{rowsum.filter}{numeric scalar giving a value for the filtering out of low abundance genes in the estimation of the common dispersion.
+Only genes with total sum of counts above this value are used in the estimation of the common dispersion.}
 
 \item{verbose}{logical, if \code{TRUE} estimated dispersion and BCV will be printed to standard output.}
 }
diff --git a/man/estimateDisp.Rd b/man/estimateDisp.Rd
index de54828..6cfc8d9 100644
--- a/man/estimateDisp.Rd
+++ b/man/estimateDisp.Rd
@@ -40,15 +40,21 @@ estimateDisp(y, design=NULL, prior.df=NULL, trend.method="locfit", span=NULL,
 \value{Returns \code{object} with the following added components:
 	\item{common.dispersion}{estimate of the common dispersion.}
 	\item{trended.dispersion}{estimates of the trended dispersions.}
-	\item{tagwise.dispersion}{tag- or gene-wise estimates of the dispersion parameter.}
-	\item{logCPM}{the tag abundance in log average counts per million.}
+	\item{tagwise.dispersion}{tagwise estimates of the dispersion parameter.}
+	\item{logCPM}{the average abundance of each tag, in log average counts per million.}
 	\item{prior.df}{prior degrees of freedom. It is a vector when robust method is used.}
 	\item{prior.n}{estimate of the prior weight, i.e. the smoothing parameter that indicates the weight to put on the common likelihood compared to the individual tag's likelihood.}
 	\item{span}{width of the smoothing window used in estimating dispersions.}
 }
 
 \details{
-This function calculates a matrix of likelihoods for each gene at a set of dispersion grid points, and then applies weighted likelihood empirical Bayes method to obtain posterior dispersion estimates. If there is no design matrix, it calculates the quantile conditional likelihood for each gene (tag) and then maximize it. The method is same as in the function \code{estimateCommonDisp} and \code{estimateTagwiseDisp}. If a design matrix is given, it then calculates the adjusted profile log- [...]
+This function calculates a matrix of likelihoods for each tag at a set of dispersion grid points, and then applies weighted likelihood empirical Bayes method to obtain posterior dispersion estimates. If there is no design matrix, it calculates the quantile conditional likelihood for each tag and then maximizes it. In this case, it is similar to the function \code{estimateCommonDisp} and \code{estimateTagwiseDisp}. If a design matrix is given, it calculates the adjusted profile log-likeli [...]
+
+Note that the terms `tag' and `gene' are synonymous here.
+}
+
+\note{
+The \code{estimateDisp} function doesn't give exactly the same estimates as the traditional calling sequences.
 }
 
 \references{
diff --git a/man/estimateGLMCommonDisp.Rd b/man/estimateGLMCommonDisp.Rd
index 32421e6..ae466c4 100644
--- a/man/estimateGLMCommonDisp.Rd
+++ b/man/estimateGLMCommonDisp.Rd
@@ -29,7 +29,7 @@ Possible values are \code{"CoxReid"}, \code{"Pearson"} or \code{"deviance"}.}
 
 \item{subset}{maximum number of rows of \code{y} to use in the calculation.  Rows used are chosen evenly spaced by AveLogCPM using \code{\link{systematicSubset}}.}
 
-\item{AveLogCPM}{numeric vector giving average log2 counts per million for each gene}
+\item{AveLogCPM}{numeric vector giving average log2 counts per million for each gene.}
 
 \item{verbose}{logical, if \code{TRUE} estimated dispersion and BCV will be printed to standard output.}
 \item{weights}{optional numeric matrix giving observation weights}
@@ -74,9 +74,9 @@ d2 <- estimateCommonDisp(d, verbose=TRUE)
 \seealso{
 \code{\link{dispCoxReid}}, \code{\link{dispPearson}}, \code{\link{dispDeviance}}
 
-\code{\link{estimateGLMTrendedDisp}} for trended dispersion and \code{\link{estimateGLMTagwiseDisp}} for tagwise dispersions in the context of a generalized linear model.
+\code{\link{estimateGLMTrendedDisp}} for trended dispersions or \code{\link{estimateGLMTagwiseDisp}} for genewise dispersions in the context of a generalized linear model.
 
-\code{\link{estimateCommonDisp}} for common dispersion or \code{\link{estimateTagwiseDisp}} for tagwise dispersion in the context of a multiple group experiment (one-way layout).
+\code{\link{estimateCommonDisp}} for the common dispersion or \code{\link{estimateTagwiseDisp}} for genewise dispersions in the context of a multiple group experiment (one-way layout).
 }
 
 \keyword{models}
diff --git a/man/estimateGLMRobustDisp.Rd b/man/estimateGLMRobustDisp.Rd
index cf4c927..db1f0d1 100644
--- a/man/estimateGLMRobustDisp.Rd
+++ b/man/estimateGLMRobustDisp.Rd
@@ -4,7 +4,7 @@
 \title{Empirical Robust Bayes Tagwise Dispersions for Negative Binomial GLMs using Observation Weights}
 
 \description{
-Compute a robust estimate of the negative binomial dispersion parameter for each tag or transcript, with expression levels specified by a log-linear model, using observation weights.  These observation weights will be stored and used later for estimating regression parameters.
+Compute a robust estimate of the negative binomial dispersion parameter for each gene, with expression levels specified by a log-linear model, using observation weights.  These observation weights will be stored and used later for estimating regression parameters.
 }
 
 \usage{
@@ -28,15 +28,18 @@ estimateGLMRobustDisp(y, design = NULL, prior.df = 10, update.trend = TRUE,
 }
 
 \value{
-\code{estimateGLMRobustDisp} produces a \code{DGEList} object, which contains the (robust) tagwise dispersion parameter estimate for each tag for the negative binomial model that maximizes the weighted Cox-Reid adjusted profile likelihood, as well as the observation weights.  The observation weights are calculated using residuals and the Huber function.
+\code{estimateGLMRobustDisp} produces a \code{DGEList} object, which contains the (robust) genewise dispersion parameter estimate for each gene for the negative binomial model that maximizes the weighted Cox-Reid adjusted profile likelihood, as well as the observation weights.  The observation weights are calculated using residuals and the Huber function.
 
 Note that when \code{record=TRUE}, a simple list of \code{DGEList} objects is returned, one for each iteration (this is for debugging or tracking purposes).
 }
 
 \details{
-At times, because of the moderation of dispersion estimates towards a trended values, features (typically, genes) can be sensitive to outliers and causing false positives.  That is, since the dispersion estimates are moderated downwards toward the trend and because the regression parameter estimates may be affected by the outliers, genes are deemed significantly differential expressed.  The function uses an iterative procedure where weights are calculated from residuals and estimates are [...]
+Moderation of dispersion estimates towards a trend can be sensitive to outliers, resulting in an increase in false positives. That is, since the dispersion estimates are moderated downwards toward the trend and because the regression parameter estimates may be affected by the outliers, some genes are incorrectly deemed to be significantly differentially expressed. This function uses an iterative procedure where weights are calculated from residuals and estimates are made after re-weighting.
 
-Note: it is not necessary to first calculate the common, trended and tagwise dispersion estimates.  If these are not available, the function will first calculate this (in an unweighted) fashion.
+The robustly computed genewise estimates are reported in the \code{tagwise.dispersion} vector of the returned \code{DGEList}.
+The terms `tag' and `gene' are synonymous in this context.
+
+Note: it is not necessary to first calculate the common, trended and genewise dispersion estimates.  If these are not available, the function will first calculate this (in an unweighted) fashion.
 }
 
 \references{
@@ -47,6 +50,7 @@ Zhou X, Lindsay H, Robinson MD (2014). Robustly detecting differential expressio
 \examples{
 y <- matrix(rnbinom(100*6,mu=10,size=1/0.1),ncol=6)
 d <- DGEList(counts=y,group=c(1,1,1,2,2,2),lib.size=c(1000:1005))
+d <- calcNormFactors(d)
 design <- model.matrix(~group, data=d$samples) # Define the design matrix for the full model
 d <- estimateGLMRobustDisp(d, design)
 summary(d$tagwise.dispersion)
diff --git a/man/estimateGLMTagwiseDisp.Rd b/man/estimateGLMTagwiseDisp.Rd
index ca0711f..d587cc4 100644
--- a/man/estimateGLMTagwiseDisp.Rd
+++ b/man/estimateGLMTagwiseDisp.Rd
@@ -7,15 +7,15 @@
 \title{Empirical Bayes Tagwise Dispersions for Negative Binomial GLMs}
 
 \description{
-Compute an empirical Bayes estimate of the negative binomial dispersion parameter for each tag or transcript, with expression levels specified by a log-linear model.
+Compute an empirical Bayes estimate of the negative binomial dispersion parameter for each tag, with expression levels specified by a log-linear model.
 }
 
 \usage{
 \S3method{estimateGLMTagwiseDisp}{DGEList}(y, design=NULL, prior.df=10,
-                       trend=!is.null(y$trended.dispersion), span=NULL, \dots)
+            trend=!is.null(y$trended.dispersion), span=NULL, \dots)
 \S3method{estimateGLMTagwiseDisp}{default}(y, design=NULL, offset=NULL, dispersion,
-                            prior.df=10, trend=TRUE, span=NULL, AveLogCPM=NULL,
-                            weights=NULL, \dots)
+            prior.df=10, trend=TRUE, span=NULL, AveLogCPM=NULL,
+            weights=NULL, \dots)
 }
 
 \arguments{
@@ -26,7 +26,7 @@ Compute an empirical Bayes estimate of the negative binomial dispersion paramete
 \item{dispersion}{common or trended dispersion estimates, used as an initial estimate for the tagwise estimates. By default uses values stored in the \code{DGEList} object.}
 \item{prior.df}{prior degrees of freedom.}
 \item{span}{width of the smoothing window, in terms of proportion of the data set. Default value decreases with the number of tags.}
-\item{AveLogCPM}{numeric vector giving average log2 counts per million for each gene}
+\item{AveLogCPM}{numeric vector giving average log2 counts per million for each tag}
 \item{weights}{optional numeric matrix giving observation weights}
 \item{\ldots}{other arguments are passed to \code{\link{dispCoxReidInterpolateTagwise}}.}
 }
@@ -47,6 +47,9 @@ and the conditional likelihood is computed using Cox-Reid approximate conditiona
 The prior degrees of freedom determines the weight given to the global dispersion trend.
 The larger the prior degrees of freedom, the more the tagwise dispersions are squeezed towards the global trend.
 
+Note that the terms `tag' and `gene' are synonymous here. 
+The function is only named `Tagwise' for historical reasons.
+
 This function calls the lower-level function \code{\link{dispCoxReidInterpolateTagwise}}.
 }
 
@@ -70,7 +73,7 @@ summary(d$tagwise.dispersion)
 }
 
 \seealso{
-\code{\link{estimateGLMCommonDisp}} for common dispersion and \code{\link{estimateGLMTrendedDisp}} for trended dispersion in the context of a generalized linear model.
+\code{\link{estimateGLMCommonDisp}} for common dispersion or \code{\link{estimateGLMTrendedDisp}} for trended dispersion in the context of a generalized linear model.
 
-\code{\link{estimateCommonDisp}} for common dispersion or \code{\link{estimateTagwiseDisp}} for tagwise dispersion in the context of a multiple group experiment (one-way layout).
+\code{\link{estimateCommonDisp}} for common dispersion or \code{\link{estimateTagwiseDisp}} for tagwise dispersions in the context of a multiple group experiment (one-way layout).
 }
diff --git a/man/estimateGLMTrendedDisp.Rd b/man/estimateGLMTrendedDisp.Rd
index fbbf89c..0eca628 100644
--- a/man/estimateGLMTrendedDisp.Rd
+++ b/man/estimateGLMTrendedDisp.Rd
@@ -19,7 +19,7 @@ Estimates the abundance-dispersion trend by Cox-Reid approximate profile likelih
 \item{y}{a matrix of counts or a \code{DGEList} object.)}
 \item{design}{numeric design matrix, as for \code{\link{glmFit}}.}
 \item{method}{method (low-level function) used to estimated the trended dispersions.
-Possible values are \code{"auto"} (default, switch to \code{"bin.spline"} method if the number of tags is great than 200 and \code{"power"} method otherwise),\code{"bin.spline"}, \code{"bin.loess"} (which both result in a call to \code{dispBinTrend}), \code{"power"} (call to \code{dispCoxReidPowerTrend}), or \code{"spline"} (call to \code{dispCoxReidSplineTrend}).}
+Possible values are \code{"auto"} (default, switch to \code{"bin.spline"} method if the number of genes is great than 200 and \code{"power"} method otherwise),\code{"bin.spline"}, \code{"bin.loess"} (which both result in a call to \code{dispBinTrend}), \code{"power"} (call to \code{dispCoxReidPowerTrend}), or \code{"spline"} (call to \code{dispCoxReidSplineTrend}).}
 \item{offset}{numeric scalar, vector or matrix giving the linear model offsets, as for \code{\link{glmFit}}.}
 \item{AveLogCPM}{numeric vector giving average log2 counts per million for each gene.}
 \item{weights}{optional numeric matrix giving observation weights}
@@ -33,7 +33,8 @@ When the input object is a numeric matrix, the output of one of the lower-level
 }
 
 \details{
-Estimates the dispersion parameter for each transcript (tag) with a trend that depends on the overall level of expression for the transcript for a DGE dataset for general experimental designs by using Cox-Reid approximate conditional inference for a negative binomial generalized linear model for each transcript (tag) with the unadjusted counts and design matrix provided.
+Estimates the dispersion parameter for each gene with a trend that depends on the overall level of expression for that gene.
+This is done for a DGE dataset for general experimental designs by using Cox-Reid approximate conditional inference for a negative binomial generalized linear model for each gene with the unadjusted counts and design matrix provided.
 
 The function provides an object-orientated interface to lower-level functions.
 }
@@ -49,9 +50,9 @@ McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of mul
 
 \author{Gordon Smyth, Davis McCarthy, Yunshun Chen}
 \examples{
-ntags <- 250
+ngenes <- 250
 nlibs <- 4
-y <- matrix(rnbinom(ntags*nlibs,mu=10,size=10),ntags,nlibs)
+y <- matrix(rnbinom(ngenes*nlibs,mu=10,size=10),ngenes,nlibs)
 d <- DGEList(counts=y,group=c(1,1,2,2),lib.size=c(1000:1003))
 design <- model.matrix(~group, data=d$samples)
 disp <- estimateGLMTrendedDisp(d, design, min.n=25, df=3)
diff --git a/man/estimateTagwiseDisp.Rd b/man/estimateTagwiseDisp.Rd
index c6b7957..fa168dd 100644
--- a/man/estimateTagwiseDisp.Rd
+++ b/man/estimateTagwiseDisp.Rd
@@ -9,7 +9,7 @@ Estimates tagwise dispersion values by an empirical Bayes method based on weight
 
 \usage{
 estimateTagwiseDisp(object, prior.df=10, trend="movingave", span=NULL, method="grid",
-          grid.length=11, grid.range=c(-6,6), tol=1e-06, verbose=FALSE)
+            grid.length=11, grid.range=c(-6,6), tol=1e-06, verbose=FALSE)
 }
 
 \arguments{
@@ -52,12 +52,14 @@ Otherwise, the trend is determined by a moving average (\code{trend="movingave"}
 
 \code{method="optimize"} is not recommended for routine use as it is very slow.
 It is included for testing purposes.
+
+Note that the terms `tag' and `gene' are synonymous here. The function is only named `Tagwise' for historical reasons.
 }
 
 \value{
 An object of class \code{DGEList} with the same components as for \code{\link{estimateCommonDisp}} plus the following:
-	\item{prior.n}{estimate of the prior weight, i.e. the smoothing parameter that indicates the weight to put on the common likelihood compared to the individual tag's likelihood; prior.n of 10 means that the common likelihood is given 10 times the weight of the individual tag/gene's likelihood in the estimation of the tag/genewise dispersion}
-	\item{tagwise.dispersion}{tag- or gene-wise estimates of the dispersion parameter}
+	\item{prior.n}{estimate of the prior weight, i.e. the smoothing parameter that indicates the weight to put on the common likelihood compared to the individual tag's likelihood; prior.n of 10 means that the common likelihood is given 10 times the weight of the individual tag's likelihood in the estimation of the tagwise dispersion.}
+	\item{tagwise.dispersion}{tag-wise estimates of the dispersion parameter.}
 }
 
 \references{
diff --git a/man/estimateTrendedDisp.Rd b/man/estimateTrendedDisp.Rd
index d6e88ed..ffb8271 100644
--- a/man/estimateTrendedDisp.Rd
+++ b/man/estimateTrendedDisp.Rd
@@ -27,7 +27,7 @@ This function takes the binned common dispersion and abundance, and fits a smoot
 }
 
 \value{
-An object of class \code{DGEList} with the same components as for \code{\link{estimateCommonDisp}} plus the trended dispersion estimates for each gene or tag.
+An object of class \code{DGEList} with the same components as for \code{\link{estimateCommonDisp}} plus the trended dispersion estimates for each gene.
 }
 
 \author{Yunshun Chen and Gordon Smyth}
@@ -48,6 +48,6 @@ plotBCV(y)
 }
 
 \seealso{
-\code{\link{estimateCommonDisp}} estimates a common value for the dispersion parameter for all tags/genes - should generally be run before \code{estimateTrendedDisp}.
+\code{\link{estimateCommonDisp}} estimates a common value for the dispersion parameter for all genes - should generally be run before \code{estimateTrendedDisp}.
 }
 
diff --git a/man/exactTest.Rd b/man/exactTest.Rd
index f579394..ab354f1 100644
--- a/man/exactTest.Rd
+++ b/man/exactTest.Rd
@@ -24,7 +24,7 @@ exactTestBetaApprox(y1, y2, dispersion=0)
 \item{pair}{vector of length two, either numeric or character, providing the pair of groups to be compared; if a character vector, then should be the names of two groups (e.g. two levels of \code{object$samples$group}); if numeric, then groups to be compared are chosen by finding the levels of \code{object$samples$group} corresponding to those numeric values and using those levels as the groups to be compared; if \code{NULL}, then first two levels of \code{object$samples$group} (a factor [...]
 
 \item{dispersion}{either a numeric vector of dispersions or a character string indicating that dispersions should be taken from the data object.
-If a numeric vector, then can be either of length one or of length equal to the number of tags.
+If a numeric vector, then can be either of length one or of length equal to the number of genes.
 Allowable character values are \code{"common"}, \code{"trended"}, \code{"tagwise"} or \code{"auto"}.
 Default behavior (\code{"auto"} is to use most complex dispersions found in data object.}
 
@@ -35,11 +35,11 @@ Default behavior (\code{"auto"} is to use most complex dispersions found in data
 \item{prior.count}{average prior count used to shrink log-fold-changes. Larger values produce more shrinkage.}
 
 \item{y1}{numeric matrix of counts for the first the two experimental groups to be tested for differences.
-Rows correspond to genes or transcripts and columns to libraries.
+Rows correspond to genes and columns to libraries.
 Libraries are assumed to be equal in size - e.g. adjusted pseudocounts from the output of \code{\link{equalizeLibSizes}}.}
 
 \item{y2}{numeric matrix of counts for the second of the two experimental groups to be tested for differences.
-Rows correspond to genes or transcripts and columns to libraries.
+Rows correspond to genes and columns to libraries.
 Libraries are assumed to be equal in size - e.g. adjusted pseudocounts from the output of \code{\link{equalizeLibSizes}}. Must have the same number of rows as \code{y1}.}
 }
 
@@ -47,7 +47,7 @@ Libraries are assumed to be equal in size - e.g. adjusted pseudocounts from the
 \code{exactTest} produces an object of class \code{DGEExact} containing the following components:
 \item{table}{data frame containing columns for the log2-fold-change, \code{logFC}, the average log2-counts-per-million, \code{logCPM}, and the two-sided p-value \code{PValue}}
 \item{comparison}{character vector giving the names of the two groups being compared} 
-\item{genes}{optional data frame containing annotation for transcript; taken from \code{object}}
+\item{genes}{optional data frame containing annotation for each gene; taken from \code{object}}
 
 The low-level functions, \code{exactTestDoubleTail} etc, produce a numeric vector of genewise p-values, one for each row of \code{y1} and \code{y2}.
 }
diff --git a/man/getCounts.Rd b/man/getCounts.Rd
index 60da625..7443ab7 100644
--- a/man/getCounts.Rd
+++ b/man/getCounts.Rd
@@ -21,7 +21,7 @@ getDispersion(y)
 \code{getOffset(y)} returns offsets for the log-linear predictor account for sequencing depth and possibly other normalization factors.
 Specifically it returns the matrix \code{y$offset} if it is non-null, otherwise it returns the log product of \code{lib.size} and \code{norm.factors} from \code{y$samples}.
 
-\code{getDispersion(y)} returns the most complex dispersion estimates (common, trended or tagwise) found in \code{y}.
+\code{getDispersion(y)} returns the most complex dispersion estimates (common, trended or genewise) found in \code{y}.
 }
 
 \value{\code{getCounts} returns the matrix of counts.
diff --git a/man/getPriorN.Rd b/man/getPriorN.Rd
index 4eca256..c33b8c9 100644
--- a/man/getPriorN.Rd
+++ b/man/getPriorN.Rd
@@ -14,14 +14,14 @@ getPriorN(y, design=NULL, prior.df=20)
 
 \item{design}{numeric matrix (optional argument) giving the design matrix for the GLM that is to be fit. Must be of full column rank. If provided \code{design} is used to determine the number of parameters to be fit in the statistical model and therefore the residual degrees of freedom. If left as the default (\code{NULL}) then the \code{y$samples$group} element of the \code{DGEList} object is used to determine the residual degrees of freedom.}
 
-\item{prior.df}{numeric scalar giving the weight, in terms of prior degrees of freedom, to be given to the common parameter likelihood when estimating tagwise dispersion estimates.}
+\item{prior.df}{numeric scalar giving the weight, in terms of prior degrees of freedom, to be given to the common parameter likelihood when estimating genewise dispersion estimates.}
 
 }
 
 \value{\code{getPriorN} returns a numeric scalar }
 
 \details{
-When estimating tagwise dispersion values using \code{\link{estimateTagwiseDisp}} or \code{\link{estimateGLMTagwiseDisp}} we need to decide how much weight to give to the common parameter likelihood in order to smooth (or stabilize) the dispersion estimates. The best choice of value for the \code{prior.n} parameter varies between datasets depending on the number of samples in the dataset and the complexity of the model to be fit. The value of \code{prior.n} should be inversely proportion [...]
+When estimating genewise dispersion values using \code{\link{estimateTagwiseDisp}} or \code{\link{estimateGLMTagwiseDisp}} we need to decide how much weight to give to the common parameter likelihood in order to smooth (or stabilize) the dispersion estimates. The best choice of value for the \code{prior.n} parameter varies between datasets depending on the number of samples in the dataset and the complexity of the model to be fit. The value of \code{prior.n} should be inversely proportio [...]
 }
 
 \author{Davis McCarthy, Gordon Smyth}
diff --git a/man/glmQLFTest.Rd b/man/glmQLFTest.Rd
index 3effb29..cd14e57 100644
--- a/man/glmQLFTest.Rd
+++ b/man/glmQLFTest.Rd
@@ -1,93 +1,122 @@
 \name{glmQLFit}
 \alias{glmQLFit}
+\alias{glmQLFit.DGEList}
+\alias{glmQLFit.default}
 \alias{glmQLFTest}
 
-\title{Quasi-likelihood methods with empirical Bayes shrinkage}
+\title{Genewise Negative Binomial Generalized Linear Models with Quasi-likelihood Tests}
 
 \description{Fit a quasi-likelihood negative binomial generalized log-linear model to count data.
-Conduct genewise statistical tests for a given coefficient or coefficient contrast.}
+Conduct genewise statistical tests for a given coefficient or contrast.}
 
 \usage{
-glmQLFit(y, design=NULL, dispersion=NULL, abundance.trend=TRUE, robust=FALSE, winsor.tail.p=c(0.05, 0.1), \dots)
+\method{glmQLFit}{DGEList}(y, design=NULL, dispersion=NULL, offset=NULL, abundance.trend=TRUE, 
+        robust=FALSE, winsor.tail.p=c(0.05, 0.1), \dots)
+\method{glmQLFit}{default}(y, design=NULL, dispersion=0.05, offset=NULL, lib.size=NULL, 
+        abundance.trend=TRUE, AveLogCPM=NULL, robust=FALSE, 
+        winsor.tail.p=c(0.05, 0.1), \dots)
 glmQLFTest(glmfit, coef=ncol(glmfit$design), contrast=NULL) 
 }
 
 \arguments{
-\item{y}{a \code{DGEList} object containing count and sample data.}
+\item{y}{a matrix of counts, or a \code{DGEList} object with (at least) elements \code{counts} (table of unadjusted counts) and \code{samples} (data frame containing information about experimental group, library size and normalization factor for the library size)}
 
-\item{design}{numeric matrix giving the design matrix for the tagwise linear models.}
+\item{design}{numeric matrix giving the design matrix for the genewise linear models.}
 
-\item{dispersion}{numeric scalar or vector of negative binomial dispersions. Defaults to the trended dispersion, or the common dispersion (if no trend is available), or a value of 0.05 (if no common value is available).}
+\item{dispersion}{numeric scalar or vector of negative binomial dispersions.  If \code{NULL}, then will be extracted from the \code{DGEList} object \code{y}, with order of precedence: trended dispersions, common dispersion, a constant value of 0.05.}
+
+\item{offset}{numeric matrix of same size as \code{y} giving offsets for the log-linear models.  Can be a scalor or a vector of length \code{ncol(y)}, in which case it is expanded out to a matrix. If \code{NULL} will be computed by \code{getOffset(y)}.}
+
+\item{lib.size}{numeric vector of length \code{ncol(y)} giving library sizes. Only used if \code{offset=NULL}, in which case \code{offset} is set to \code{log(lib.size)}. Defaults to \code{colSums(y)}.}
 
 \item{abundance.trend}{logical, whether to allow an abundance-dependent trend when estimating the prior values for the quasi-likelihood multiplicative dispersion parameter.}
 
-\item{robust}{logical, whether to estimate the prior degrees of freedom robustly.}
+\item{AveLogCPM}{average log2-counts per million, the average taken over all libraries in \code{y}. If \code{NULL} will be computed by \code{aveLogCPM(y)}.}
+
+\item{robust}{logical, whether to estimate the prior QL dispersion distribution robustly.}
 
 \item{winsor.tail.p}{numeric vector of length 2 giving proportion to trim (Winsorize) from lower and upper tail of the distribution of genewise deviances when estimating the hyperparameters. Positive values produce robust empirical Bayes ignoring outlier small or large deviances. Only used when \code{robust=TRUE}.} 
 
 \item{\dots}{other arguments are passed to \code{\link{glmFit}}.}
 
-\item{glmfit}{a \code{DGEGLM} object, usually output from \code{qlmQLFit}.}
+\item{glmfit}{a \code{DGEGLM} object, usually output from \code{glmQLFit}.}
 
-\item{coef}{integer or character vector indicating which coefficients of the linear model are to be tested equal to zero.}
+\item{coef}{integer or character index vector indicating which coefficients of the linear model are to be tested equal to zero. Ignored if \code{contrast} is not \code{NULL}.}
 
 \item{contrast}{numeric vector or matrix specifying one or more contrasts of the linear model coefficients to be tested equal to zero.}
 }
 
-\value{
-\code{glmQLFit} produces an object of class \code{DGEGLM} with the same components as that produced by \code{\link{glmFit}}, plus:
-	\item{df.residual}{a numeric vector containing the number of residual degrees of freedom for the GLM fit of each gene.}	
-	\item{s2.fit}{a list containing \code{df.prior}, the prior degrees of fredom; and \code{var.prior}, the location of the prior distribution. Both are numeric vectors if \code{abundance.trend=TRUE} and scalars otherwise. \code{var.post} is a numeric vector containing the shrunk quasi-likelihood dispersion for each gene.}
-	\item{df.prior}{a numeric vector or scalar containing the prior degrees of freedom, same as that in \code{s2.fit}.}
-
-\code{glmQFTest} produces objects of class \code{DGELRT} with the same components as for \code{glmfit} plus the following:
-	\item{table}{data frame with the same rows as \code{y} containing the log2-fold changes, F-statistics and p-values, ready to be displayed by \code{topTags.}.}
-	\item{comparison}{character string describing the coefficient or the contrast being tested.}
-
-The data frame \code{table} contains the following columns:
-	\item{logFC}{log2-fold change of expression between conditions being tested.}
-	\item{logCPM}{average log2-counts per million, the average taken over all libraries in \code{y}.}
-	\item{F}{F-statistics.}
-	\item{PValue}{p-values.}
+\details{
+\code{glmQLFit} and \code{glmQLFTest} implement the quasi-likelihood (QL) methods of Lund et al (2012), with some enhancements and with slightly different glm, trend and FDR methods.
+See Lun et al (2015) for a tutorial describing the use of \code{glmQLFit} and \code{glmQLFit} as part of a complete analysis pipeline.
+Another case study using \code{glmQLFit} and \code{glmQLFTest} is given in Section 4.7 of the edgeR User's Guide.
+
+\code{glmQLFit} is similar to \code{glmFit} except that it also estimates QL dispersion values.
+It calls the limma function \code{\link{squeezeVar}} to conduct empirical Bayes moderation of the genewise QL dispersions.
+If \code{robust=TRUE}, then the robust hyperparameter estimation features of \code{squeezeVar} are used (Phipson et al, 2013).
+If \code{abundance.trend=TRUE}, then a prior trend is estimated based on the average logCPMs.
+
+\code{glmQLFit} gives special attention to handling of zero counts, and in particular to situations when fitted values of zero provide no useful residual degrees of freedom for estimating the QL dispersion.
+The usual residual degrees of freedom are returned as \code{df.residual} while the adjusted residual degrees of freedom are returned as \code{df.residuals.zeros}.
+
+\code{glmQLFTest} is similar to \code{glmLRT} except that it replaces likelihood ratio tests with empirical Bayes quasi-likelihood F-tests.
+The p-values from \code{glmQLFTest} are always greater than or equal to those that would be obtained from \code{glmLRT} using the same negative binomial dispersions.
 }
 
-\details{
-\code{glmQLFTest} implements the quasi-likelihood method of Lund et al (2012).
-It behaves the same as \code{glmLRT} except that it replaces likelihood ratio tests with quasi-likelihood F-tests for coefficients in the linear model.
-This function calls the limma function \code{\link{squeezeVar}} to conduct empirical Bayes smoothing of the genewise multiplicative dispersions.
-Note that the \code{QuasiSeq} package provides a alternative implementation of Lund et al (2012), with slightly different glm, trend and FDR methods.
-
-There are a number of subtleties involved in the use of QL models.
-The first is that the negative binomial dispersions \emph{must} be trended or common values.
-This is because the function assumes that the supplied values are the true values.
-For the trended/common values, the assumption is reasonable as information from many genes improves precision.
-This is not the case for the tagwise dispersions due to the limited information for each gene.
-
-Another subtlety involves the handling of zero counts.
-Observations with fitted values of zero provide no residual degrees of freedom.
-This must be considered when computing the value of the quasi-likelihood dispersion for genes with many zeros.
-Finally, a lower bound is defined for the p-value of each gene, based on the likelihood ratio test.
-This avoids spurious results involving weak shrinkage with very low quasi-likelihood dispersions.
+\note{
+The negative binomial dispersions \code{dispersion} supplied to \code{glmQLFit} and \code{glmQLFTest} must be based on a global model, that is, they must be either trended or common dispersions.
+It is not correct to supply genewise dispersions because \code{glmQLFTest} estimates genewise variability using the QL dispersion.
+}
+
+\value{
+\code{glmQLFit} produces an object of class \code{DGEGLM} with the same components as produced by \code{\link{glmFit}}, plus:
+	\item{df.residual.zeros}{a numeric vector containing the number of effective residual degrees of freedom for each gene, taking into account any treatment groups with all zero counts.}
+	\item{df.prior}{a numeric vector or scalar, giving the prior degrees of freedom for the QL dispersions.}
+	\item{var.prior}{a numeric vector of scalar, giving the location of the prior distribution for the QL dispersions.}
+	\item{var.post}{a numeric vector containing the posterior empirical Bayes QL dispersions.} 
+\code{df.prior} is a vector of length \code{nrow(y)} if \code{robust=TRUE}, otherwise it has length 1.
+\code{var.prior} is a vector of length \code{nrow(y)} if \code{abundance.trend=TRUE}, otherwise it has length 1.
+
+\code{glmQFTest} produce an object of class \code{DGELRT} with the same components as produced by \code{\link{glmLRT}}, except that the \code{table$LR} column becomes \code{table$F} and contains quasi-likelihood F-statistics.
+It also stores \code{df.total}, a numeric vector containing the denominator degrees of freedom for the F-test, equal to \code{df.prior + df.residual.zeros}.
 }
 
 \references{
+Lun, ATL, Chen, Y, and Smyth, GK (2015).
+It's DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR.
+Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia.
+\url{http://www.statsci.org/smyth/pubs/QLedgeRPreprint.pdf">Preprint 8 April 2015}
+
 Lund, SP, Nettleton, D, McCarthy, DJ, and Smyth, GK (2012).
 Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates.
 \emph{Statistical Applications in Genetics and Molecular Biology} Volume 11, Issue 5, Article 8.
 \url{http://www.statsci.org/smyth/pubs/QuasiSeqPreprint.pdf}
+
+Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2013).
+Empirical Bayes in the presence of exceptional cases, with application to microarray data.
+Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia.
+\url{http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf}
 }
 
-\author{Davis McCarthy and Gordon Smyth, with modifications by Aaron Lun}
+\author{Yunshun Chen, Aaron Lun, Davis McCarthy and Gordon Smyth}
+
+\seealso{
+\code{\link{topTags}} displays results from \code{glmQLFTest}.
+
+\code{\link{plotQLDisp}} can be used to visualize the distribution of QL dispersions after EB shrinkage from \code{glmQLFit}.
+
+The \code{QuasiSeq} package gives an alternative implementation of the Lund et al (2012) methods.
+}
 
 \examples{
 nlibs <- 4
-ntags <- 1000
-dispersion.true <- 1/rchisq(ntags, df=10)
+ngenes <- 1000
+dispersion.true <- 1/rchisq(ngenes, df=10)
 design <- model.matrix(~factor(c(1,1,2,2)))
 
 # Generate count data
-y <- rnbinom(ntags*nlibs,mu=20,size=1/dispersion.true)
-y <- matrix(y,ntags,nlibs)
+y <- rnbinom(ngenes*nlibs,mu=20,size=1/dispersion.true)
+y <- matrix(y,ngenes,nlibs)
 d <- DGEList(y)
 d <- calcNormFactors(d)
 
@@ -104,12 +133,4 @@ results <- glmQLFTest(fit)
 topTags(results)
 }
 
-\seealso{
-\code{\link{topTags}} displays results from \code{glmQLFTest}.
-
-\code{\link{plotQLDisp}} can be used to visualize the distribution of QL dispersions after EB shrinkage from \code{glmQLFit}.
-
-The \code{QuasiSeq} package gives an alternative implementation of \code{glmQLFTest} based on the same statistical ideas.
-}
-
 \keyword{models}
diff --git a/man/treatDGE.Rd b/man/glmTreat.Rd
similarity index 56%
rename from man/treatDGE.Rd
rename to man/glmTreat.Rd
index 53dbf95..3aafb61 100644
--- a/man/treatDGE.Rd
+++ b/man/glmTreat.Rd
@@ -1,16 +1,18 @@
-\name{treatDGE}
+\name{glmTreat}
+\alias{glmTreat}
 \alias{treatDGE}
 
-\title{Testing for Differential Expression Relative to a Threshold}
+\title{Test for Differential Expression Relative to a Threshold}
 
-\description{Conduct genewise statistical tests for a given coefficient or coefficient contrast relative to a specified threshold.}
+\description{Conduct genewise statistical tests for a given coefficient or contrast relative to a specified fold-change threshold.}
 
 \usage{
+glmTreat(glmfit, coef=ncol(glmfit$design), contrast=NULL, lfc=0)
 treatDGE(glmfit, coef=ncol(glmfit$design), contrast=NULL, lfc=0)
 }
 
 \arguments{
-\item{glmfit}{a \code{DGEGLM} object, usually output from \code{glmFit}.}
+\item{glmfit}{a \code{DGEGLM} object, usually output from \code{glmFit} or \code{glmQLFit}.}
 
 \item{coef}{integer or character vector indicating which coefficients of the linear model are to be tested equal to zero.  Values must be columns or column names of \code{design}. Defaults to the last coefficient.  Ignored if \code{contrast} is specified.}
 
@@ -20,7 +22,7 @@ treatDGE(glmfit, coef=ncol(glmfit$design), contrast=NULL, lfc=0)
 }
 
 \value{
-\code{treatDGE} produces an object of class \code{DGELRT} with the same components as for \code{glmfit} plus the following:
+\code{glmTreat} produces an object of class \code{DGELRT} with the same components as for \code{glmfit} plus the following:
 	\item{lfc}{absolute value of the specified log2-fold change threshold.}
 	\item{table}{data frame with the same rows as \code{glmfit} containing the log2-fold changes, average log2-counts per million and p-values, ready to be displayed by \code{topTags.}.}
 	\item{comparison}{character string describing the coefficient or the contrast being tested.}
@@ -32,11 +34,33 @@ The data frame \code{table} contains the following columns:
 }
 
 \details{
-\code{treatDGE} implements a two-sided modified likelihood ratio test.
+\code{glmTreat} implements a test for differential expression relative to a minimum required fold-change threshold.
+Instead of testing for genes which have log-fold-changes different from zero, it tests whether the log2-fold-change is greater than \code{lfc} in absolute value.
+\code{glmTreat} is analogous to the TREAT approach developed by McCarthy and Smyth (2009) for microarrays.
+
+\code{glmTreat} detects whether \code{glmfit} was produced by \code{glmFit} or \code{glmQLFit}.
+In the former case, it conducts a modified likelihood ratio test (LRT) against the fold-change threshold.
+In the latter case, it conducts a quasi-likelihood (QL) F-test against the threshold.
+
+If \code{lfc=0}, then \code{glmTreat} is equivalent to \code{glmLRT} or \code{glmQLFTest}, depending on whether likelihood or quasi-likelihood is being used.
+
+\code{glmTreat} was previously called \code{treatDGE}.
+The old function name is now deprecated and will be removed in a future release of edgeR.
 }
 
 \author{Yunshun Chen and Gordon Smyth}
 
+\seealso{
+\code{\link{topTags}} displays results from \code{glmTreat}.
+}
+
+\references{
+McCarthy, D. J., and Smyth, G. K. (2009).
+Testing significance relative to a fold-change threshold is a TREAT.
+\emph{Bioinformatics} 25, 765-771.
+\url{http://bioinformatics.oxfordjournals.org/content/25/6/765}
+}
+
 \examples{
 ngenes <- 100
 n1 <- 3
@@ -57,6 +81,6 @@ counts <- matrix(rnbinom(ngenes*nlibs, mu=mu, size=1/phi), ngenes, nlibs)
 d <- DGEList(counts=counts,lib.size=rep(1e6, nlibs), group=group)
 
 gfit <- glmFit(d, design, dispersion=phi)
-tr <- treatDGE(gfit, coef=2, lfc=1)
+tr <- glmTreat(gfit, coef=2, lfc=1)
 topTags(tr)
 }
diff --git a/man/glmfit.Rd b/man/glmfit.Rd
index 48af6bf..541342c 100644
--- a/man/glmfit.Rd
+++ b/man/glmfit.Rd
@@ -6,7 +6,7 @@
 
 \title{Genewise Negative Binomial Generalized Linear Models}
 
-\description{Fit a negative binomial generalized log-linear model to the read counts for each gene or transcript.
+\description{Fit a negative binomial generalized log-linear model to the read counts for each gene.
 Conduct genewise statistical tests for a given coefficient or coefficient contrast.}
 
 \usage{
@@ -19,17 +19,17 @@ glmLRT(glmfit, coef=ncol(glmfit$design), contrast=NULL, test="chisq")
 \arguments{
 \item{y}{an object that contains the raw counts for each library (the measure of expression level); alternatively, a matrix of counts, or a \code{DGEList} object with (at least) elements \code{counts} (table of unadjusted counts) and \code{samples} (data frame containing information about experimental group, library size and normalization factor for the library size)}
 
-\item{design}{numeric matrix giving the design matrix for the tagwise linear models.
+\item{design}{numeric matrix giving the design matrix for the genewise linear models.
 Must be of full column rank.
 Defaults to a single column of ones, equivalent to treating the columns as replicate libraries.}
 
-\item{dispersion}{numeric scalar or vector of negative binomial dispersions. Can be a common value for all tags, or a vector of values can provide a unique dispersion value for each tag. If \code{NULL} will be extracted from \code{y}, with order of precedence: tagwise dispersion, trended dispersions, common dispersion.}
+\item{dispersion}{numeric scalar or vector of negative binomial dispersions. Can be a common value for all genes, or a vector of values can provide a unique dispersion value for each gene. If \code{NULL} will be extracted from \code{y}, with order of precedence: genewise dispersion, trended dispersions, common dispersion.}
 
-\item{offset}{numeric matrix of same size as \code{y} giving offsets for the log-linear models.  Can be a scalor or a vector of length \code{ncol{y}}, in which case it is expanded out to a matrix.}
+\item{offset}{numeric matrix of same size as \code{y} giving offsets for the log-linear models.  Can be a scalor or a vector of length \code{ncol(y)}, in which case it is expanded out to a matrix.}
 
-\item{weights}{optional numeric matrix giving prior weights for the observations (for each library and transcript) to be used in the GLM calculations.  Not supported by methods \code{"linesearch"} or \code{"levenberg"}.}
+\item{weights}{optional numeric matrix giving prior weights for the observations (for each library and gene) to be used in the GLM calculations.}
 
-\item{lib.size}{numeric vector of length \code{ncol(y)} giving library sizes. Only used if \code{offset=NULL}, in which case \code{offset} is set to \code{log(lib.size}). Defaults to \code{colSums(y)}.}
+\item{lib.size}{numeric vector of length \code{ncol(y)} giving library sizes. Only used if \code{offset=NULL}, in which case \code{offset} is set to \code{log(lib.size)}. Defaults to \code{colSums(y)}.}
 
 \item{prior.count}{average prior count to be added to observation to shrink the estimated log-fold-changes towards zero.}
 
@@ -50,12 +50,12 @@ Defaults to a single column of ones, equivalent to treating the columns as repli
 \code{glmFit} produces an object of class \code{DGEGLM} containing components \code{counts}, \code{samples}, \code{genes} and \code{abundance} from \code{y} plus the following new components:
 	\item{design}{design matrix as input.}
 	\item{weights}{matrix of weights as input.}
-	\item{df.residual}{numeric vector of residual degrees of freedom, one for each tag.}
+	\item{df.residual}{numeric vector of residual degrees of freedom, one for each gene.}
 	\item{offset}{numeric matrix of linear model offsets.}
 	\item{dispersion}{vector of dispersions used for the fit.}
 	\item{coefficients}{numeric matrix of estimated coefficients from the glm fits, on the natural log scale, of size \code{nrow(y)} by \code{ncol(design)}.}
 	\item{fitted.values}{matrix of fitted values from glm fits, same number of rows and columns as \code{y}.}
-	\item{deviance}{numeric vector of deviances, one for each tag.}
+	\item{deviance}{numeric vector of deviances, one for each gene.}
 
 \code{glmLRT} produces objects of class \code{DGELRT} with the same components as for \code{glmfit} plus the following:
 	\item{table}{data frame with the same rows as \code{y} containing the log2-fold changes, likelhood ratio statistics and p-values, ready to be displayed by \code{topTags.}.}
@@ -73,7 +73,7 @@ The data frame \code{table} contains the following columns:
 
 \code{glmFit} fits genewise negative binomial glms, all with the same design matrix but possibly different dispersions, offsets and weights.
 When the design matrix defines a one-way layout, or can be re-parametrized to a one-way layout, the glms are fitting very quickly using \code{\link{mglmOneGroup}}.
-Otherwise the default fitting method, implemented in \code{\link{mglmLevenberg}} a Fisher scoring algorithm with Levenberg-style damping.
+Otherwise the default fitting method, implemented in \code{\link{mglmLevenberg}}, uses a Fisher scoring algorithm with Levenberg-style damping.
 
 Positive \code{prior.count} cause the returned coefficients to be shrunk in such a way that fold-changes between the treatment conditions are decreased.
 In particular, infinite fold-changes are avoided.
@@ -96,20 +96,20 @@ McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of mul
 
 \examples{
 nlibs <- 3
-ntags <- 100
+ngenes <- 100
 dispersion.true <- 0.1
 
-# Make first transcript respond to covariate x
+# Make first gene respond to covariate x
 x <- 0:2
 design <- model.matrix(~x)
-beta.true <- cbind(Beta1=2,Beta2=c(2,rep(0,ntags-1)))
+beta.true <- cbind(Beta1=2,Beta2=c(2,rep(0,ngenes-1)))
 mu.true <- 2^(beta.true \%*\% t(design))
 
 # Generate count data
-y <- rnbinom(ntags*nlibs,mu=mu.true,size=1/dispersion.true)
-y <- matrix(y,ntags,nlibs)
+y <- rnbinom(ngenes*nlibs,mu=mu.true,size=1/dispersion.true)
+y <- matrix(y,ngenes,nlibs)
 colnames(y) <- c("x0","x1","x2")
-rownames(y) <- paste("Gene",1:ntags,sep="")
+rownames(y) <- paste("gene",1:ngenes,sep=".")
 d <- DGEList(y)
 
 # Normalize
@@ -122,7 +122,7 @@ fit <- glmFit(d, design, dispersion=dispersion.true)
 results <- glmLRT(fit, coef=2)
 topTags(results)
 
-# Estimate the dispersion (may be unreliable with so few tags)
+# Estimate the dispersion (may be unreliable with so few genes)
 d <- estimateGLMCommonDisp(d, design, verbose=TRUE)
 }
 
diff --git a/man/gof.Rd b/man/gof.Rd
index c047736..4acc05c 100644
--- a/man/gof.Rd
+++ b/man/gof.Rd
@@ -43,20 +43,20 @@ McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of mul
 
 \examples{
 nlibs <- 3
-ntags <- 100
+ngenes <- 100
 dispersion.true <- 0.1
 
-# Make first transcript respond to covariate x
+# Make first gene respond to covariate x
 x <- 0:2
 design <- model.matrix(~x)
-beta.true <- cbind(Beta1=2,Beta2=c(2,rep(0,ntags-1)))
+beta.true <- cbind(Beta1=2,Beta2=c(2,rep(0,ngenes-1)))
 mu.true <- 2^(beta.true \%*\% t(design))
 
 # Generate count data
-y <- rnbinom(ntags*nlibs,mu=mu.true,size=1/dispersion.true)
-y <- matrix(y,ntags,nlibs)
+y <- rnbinom(ngenes*nlibs,mu=mu.true,size=1/dispersion.true)
+y <- matrix(y,ngenes,nlibs)
 colnames(y) <- c("x0","x1","x2")
-rownames(y) <- paste("Gene",1:ntags,sep="")
+rownames(y) <- paste("gene",1:ngenes,sep=".")
 d <- DGEList(y)
 
 # Normalize
diff --git a/man/goodTuring.Rd b/man/goodTuring.Rd
index a8a5431..20dd2ef 100644
--- a/man/goodTuring.Rd
+++ b/man/goodTuring.Rd
@@ -46,7 +46,7 @@ which gives identical results to this function.
 \code{goodTuringPlot} plots log-probability (i.e., log frequencies of frequencies) versus log-frequency.
 
 \code{goodTuringProportions} runs \code{goodTuring} on each column of data, then
-uses the results to predict the proportion of each tag in each library.
+uses the results to predict the proportion of each gene in each library.
 }
 
 \references{
diff --git a/man/maPlot.Rd b/man/maPlot.Rd
index 8576edc..100ae8f 100644
--- a/man/maPlot.Rd
+++ b/man/maPlot.Rd
@@ -17,8 +17,8 @@ maPlot(x, y, logAbundance=NULL, logFC=NULL, normalize=FALSE, plot.it=TRUE,
 \arguments{
   \item{x}{vector of counts or concentrations (group 1)}
   \item{y}{vector of counts or concentrations (group 2)}
-  \item{logAbundance}{vector providing the abundance of each tag on the log2 scale. Purely optional (default is \code{NULL}), but in combination with \code{logFC} provides a more direct way to create an MA-plot if the log-abundance and log-fold change are available.}
-  \item{logFC}{vector providing the log-fold change for each tag for a given experimental contrast. Default is \code{NULL}, only to be used together with \code{logAbundance} as both need to be non-null for their values to be used.}
+  \item{logAbundance}{vector providing the abundance of each gene on the log2 scale. Purely optional (default is \code{NULL}), but in combination with \code{logFC} provides a more direct way to create an MA-plot if the log-abundance and log-fold change are available.}
+  \item{logFC}{vector providing the log-fold change for each gene for a given experimental contrast. Default is \code{NULL}, only to be used together with \code{logAbundance} as both need to be non-null for their values to be used.}
   \item{normalize}{logical, whether to divide \code{x} and \code{y} vectors by their sum}
   \item{plot.it}{logical, whether to produce a plot}
   \item{smearWidth}{scalar, width of the smear}
@@ -26,7 +26,7 @@ maPlot(x, y, logAbundance=NULL, logFC=NULL, normalize=FALSE, plot.it=TRUE,
   \item{allCol}{colour of the non-smeared points}
   \item{lowCol}{colour of the smeared points}
   \item{deCol}{colour of the DE (differentially expressed) points}
- \item{de.tags}{indices for tags identified as being differentially expressed; use \code{exactTest} to identify DE genes}
+ \item{de.tags}{indices for genes identified as being differentially expressed; use \code{exactTest} or \code{glmLRT} to identify DE genes. Note that `tag' and `gene' are synonymous here.}
   \item{smooth.scatter}{logical, whether to produce a 'smooth scatter' plot using the KernSmooth::smoothScatter function or just a regular scatter plot; default is \code{FALSE}, i.e. produce a regular scatter plot}
   \item{lowess}{logical, indicating whether or not to add a lowess curve to the MA-plot to give an indication of any trend in the log-fold change with log-concentration}
   \item{\dots}{further arguments passed on to \code{plot}}
diff --git a/man/meanvar.Rd b/man/meanvar.Rd
index 2b4f553..3a34868 100644
--- a/man/meanvar.Rd
+++ b/man/meanvar.Rd
@@ -4,7 +4,7 @@
 
 \title{Explore the mean-variance relationship for DGE data}
 
-\description{Appropriate modelling of the mean-variance relationship in DGE data is important for making inferences about differential expression. Here are functions to compute tag/gene means and variances, as well at looking at these quantities when data is binned based on overall expression level.}
+\description{Appropriate modelling of the mean-variance relationship in DGE data is important for making inferences about differential expression. Here are functions to compute gene means and variances, as well at looking at these quantities when data is binned based on overall expression level.}
 
 
 \usage{
@@ -18,15 +18,15 @@ binMeanVar(x, group, nbins=100, common.dispersion=FALSE, object=NULL)
 \arguments{ 
 \item{object}{\code{DGEList} object containing the raw data and dispersion value. According the method desired for computing the dispersion, either \code{estimateCommonDisp} and (possibly) \code{estimateTagwiseDisp} should be run on the \code{DGEList} object before using \code{plotMeanVar}. The argument \code{object} must be supplied in the function \code{binMeanVar} if common dispersion values are to be computed for each bin.}
 
-\item{meanvar}{list (optional) containing the output from \code{binMeanVar} or the returned value of \code{plotMeanVar}. Providing this object as an argument will save time in computing the tag/gene means and variances when producing a mean-variance plot. }
+\item{meanvar}{list (optional) containing the output from \code{binMeanVar} or the returned value of \code{plotMeanVar}. Providing this object as an argument will save time in computing the gene means and variances when producing a mean-variance plot. }
 
-\item{show.raw.vars}{logical, whether or not to display the raw (pooled) gene/tag variances on the mean-variance plot. Default is \code{FALSE}.}
+\item{show.raw.vars}{logical, whether or not to display the raw (pooled) genewise variances on the mean-variance plot. Default is \code{FALSE}.}
 
-\item{show.tagwise.vars}{logical, whether or not to display the estimated genewise/tagwise variances on the mean-variance plot. Default is \code{FALSE}.}
+\item{show.tagwise.vars}{logical, whether or not to display the estimated genewise variances on the mean-variance plot (note that `tag' and `gene' are synonymous). Default is \code{FALSE}.}
 
-\item{show.binned.common.disp.vars}{logical, whether or not to compute the common dispersion for each bin of tags and show the variances computed from those binned common dispersions and the mean expression level of the respective bin of tags. Default is \code{FALSE}.}
+\item{show.binned.common.disp.vars}{logical, whether or not to compute the common dispersion for each bin of genes and show the variances computed from those binned common dispersions and the mean expression level of the respective bin of genes. Default is \code{FALSE}.}
 
-\item{show.ave.raw.vars}{logical, whether or not to show the average of the raw variances for each bin of tags plotted against the average expression level of the tags in the bin. Averages are taken on the square root scale as regular arithmetic means are likely to be upwardly biased for count data, whereas averaging on the square scale gives a better summary of the mean-variance relationship in the data. The default is \code{TRUE}.}
+\item{show.ave.raw.vars}{logical, whether or not to show the average of the raw variances for each bin of genes plotted against the average expression level of the genes in the bin. Averages are taken on the square root scale as regular arithmetic means are likely to be upwardly biased for count data, whereas averaging on the square scale gives a better summary of the mean-variance relationship in the data. The default is \code{TRUE}.}
 
 \item{scalar}{vector (optional) of scaling values to divide counts by. Would expect to have this the same length as the number of columns in the count matrix (i.e. the number of libraries).}
 
@@ -42,11 +42,11 @@ binMeanVar(x, group, nbins=100, common.dispersion=FALSE, object=NULL)
 
 \item{\dots}{further arguments passed on to \code{plot}}
 
-\item{x}{matrix of count data, with rows representing tags/genes and columns representing samples}
+\item{x}{matrix of count data, with rows representing genes and columns representing samples}
 
 \item{group}{factor giving the experimental group or condition to which each sample (i.e. column of \code{x} or element of {y}) belongs}
 
-\item{common.dispersion}{logical, whether or not to compute the common dispersion for each bin of tags.}
+\item{common.dispersion}{logical, whether or not to compute the common dispersion for each bin of genes.}
 
 }
 
@@ -57,11 +57,11 @@ binMeanVar(x, group, nbins=100, common.dispersion=FALSE, object=NULL)
 	\item{bin.vars}{list containing the pooled variance for genes divided into bins based on amount of expression}
 	\item{means}{vector giving the mean expression level for each gene}
 	\item{vars}{vector giving the pooled variance for each gene}
-	\item{bins}{list giving the indices of the tags in each bin, ordered from lowest expression bin to highest}
+	\item{bins}{list giving the indices of the genes in each bin, ordered from lowest expression bin to highest}
 }
 
 \details{
-This function is useful for exploring the mean-variance relationship in the data. Raw variances are, for each gene, the pooled variance of the counts from each sample, divided by a scaling factor (by default the effective library size). The function will plot the average raw variance for tags split into \code{nbins} bins by overall expression level. The averages are taken on the square-root scale as for count data the arithmetic mean is upwardly biased. Taking averages on the square-root [...]
+This function is useful for exploring the mean-variance relationship in the data. Raw variances are, for each gene, the pooled variance of the counts from each sample, divided by a scaling factor (by default the effective library size). The function will plot the average raw variance for genes split into \code{nbins} bins by overall expression level. The averages are taken on the square-root scale as for count data the arithmetic mean is upwardly biased. Taking averages on the square-roo [...]
 }
 
 
@@ -74,12 +74,12 @@ plotMeanVar(d) # Produce a straight-forward mean-variance plot
 # Produce a mean-variance plot with the raw variances shown and save the means
 # and variances for later use
 meanvar <- plotMeanVar(d, show.raw.vars=TRUE) 
-## If we want to show estimated tagwise variances on the plot, we must first estimate them!
+## If we want to show estimated genewise variances on the plot, we must first estimate them!
 d <- estimateCommonDisp(d) # Obtain an estimate of the dispersion parameter
-d <- estimateTagwiseDisp(d)  # Obtain tagwise dispersion estimates
+d <- estimateTagwiseDisp(d)  # Obtain genewise dispersion estimates
 # Use previously saved object to speed up plotting
 plotMeanVar(d, meanvar=meanvar, show.tagwise.vars=TRUE, NBline=TRUE) 
-## We could also estimate common/tagwise dispersions using the Cox-Reid methods with an
+## We could also estimate common/genewise dispersions using the Cox-Reid methods with an
 ## appropriate design matrix
 }
 
@@ -89,4 +89,4 @@ plotMeanVar(d, meanvar=meanvar, show.tagwise.vars=TRUE, NBline=TRUE)
 
 }
 
-\keyword{algebra}
\ No newline at end of file
+\keyword{algebra}
diff --git a/man/mglm.Rd b/man/mglm.Rd
index 883a9e4..ef4eeeb 100644
--- a/man/mglm.Rd
+++ b/man/mglm.Rd
@@ -22,13 +22,13 @@ designAsFactor(design)
 }
 
 \arguments{
-\item{y}{numeric matrix containing the negative binomial counts.  Rows for tags and columns for libraries.}
+\item{y}{numeric matrix containing the negative binomial counts.  Rows for genes and columns for libraries.}
 
 \item{design}{numeric matrix giving the design matrix of the GLM.
 Assumed to be full column rank.}
 
 \item{dispersion}{numeric scalar or vector giving the dispersion parameter for each GLM.
-Can be a scalar giving one value for all tags, or a vector of length equal to the number of tags giving tag-wise dispersions.}
+Can be a scalar giving one value for all genes, or a vector of length equal to the number of genes giving genewise dispersions.}
 
 \item{offset}{numeric vector or matrix giving the offset that is to be included in the log-linear model predictor.  Can be a scalar, a vector of length equal to the number of libraries, or a matrix of the same size as \code{y}.}
 
@@ -53,9 +53,9 @@ They are used as work-horses by higher-level functions in the edgeR package, esp
 
 \code{mglmOneGroup} fit the null model, with intercept term only, to each response vector.
 In other words, it treats the libraries as belonging to one group.
-It implements Fisher scoring with a score-statistic stopping criterion for each tag.
+It implements Fisher scoring with a score-statistic stopping criterion for each gene.
 Excellent starting values are available for the null model, so this function seldom has any problems with convergence.
-It is used by other edgeR functions to compute the overall abundance for each tag.
+It is used by other edgeR functions to compute the overall abundance for each gene.
 
 \code{mglmLevenberg} fits an arbitrary log-linear model to each response vector.
 It implements a Levenberg-Marquardt modification of the glm scoring algorithm to prevent divergence.
@@ -68,14 +68,14 @@ If the dispersion values are zero, then the Poisson deviance function is returne
 }
 
 \value{
-\code{mglmOneGroup} produces a vector of length equal to the number of tags/genes (number of rows of \code{y}) providing the single coefficent from the GLM fit for each tag/gene. This can be interpreted as a measure of the 'average expression' level of the tag/gene.
+\code{mglmOneGroup} produces a vector of length equal to the number of genes (number of rows of \code{y}) providing the single coefficent from the GLM fit for each gene. This can be interpreted as a measure of the 'average expression' level of the gene.
 
 \code{mglmLevenberg} produces a list with the following components: 
 	\item{coefficients}{matrix of estimated coefficients for the linear models}
 	\item{fitted.values}{matrix of fitted values}
 	\item{deviance}{residual deviances}
 	\item{iter}{number of iterations used}
-	\item{fail}{logical vector indicating tags for which the maximum damping was exceeded before convergence was achieved}
+	\item{fail}{logical vector indicating genes for which the maximum damping was exceeded before convergence was achieved}
 
 \code{deviances.function} returns a function to calculate the deviance as appropriate for the given values of the dispersion.
 
diff --git a/man/nbinomDeviance.Rd b/man/nbinomDeviance.Rd
index 58b29c0..58367f1 100644
--- a/man/nbinomDeviance.Rd
+++ b/man/nbinomDeviance.Rd
@@ -14,12 +14,12 @@ nbinomDeviance(y, mean, dispersion=0, weights=NULL)
 }
 
 \arguments{
-\item{y}{numeric vector or matrix containing the negative binomial counts.  If a matrix, then rows for tags and columns for libraries.  \code{nbinomDeviance} treats a vector as a matrix with one row.}
+\item{y}{numeric vector or matrix containing the negative binomial counts.  If a matrix, then rows for genes and columns for libraries.  \code{nbinomDeviance} treats a vector as a matrix with one row.}
 
 \item{mean}{numeric vector matrix of expected values, of same dimension as \code{y}.}
 
 \item{dispersion}{numeric vector or matrix of negative binomial dispersions.
-Can be a scalar, or a vector of length equal to the number of tags, or a matrix of same dimensions as \code{y}.}
+Can be a scalar, or a vector of length equal to the number of genes, or a matrix of same dimensions as \code{y}.}
 
 \item{weights}{numeric vector or matrix of non-negative weights, as for \code{glmFit}.}
 }
diff --git a/man/plotBCV.Rd b/man/plotBCV.Rd
index e4baf3f..37ecabe 100644
--- a/man/plotBCV.Rd
+++ b/man/plotBCV.Rd
@@ -2,7 +2,7 @@
 \name{plotBCV}
 \alias{plotBCV}
 \description{
-Plot genewise biological coefficient of variation (BCV) against gene abundance (in log2 counts per million).
+Plot the genewise biological coefficient of variation (BCV) against gene abundance (in log2 counts per million).
 }
 \usage{
 plotBCV(y, xlab="Average log CPM", ylab="Biological coefficient of variation",
@@ -16,13 +16,13 @@ plotBCV(y, xlab="Average log CPM", ylab="Biological coefficient of variation",
   \item{cex}{plot symbol expansion factor. See \code{\link{points}} for more details.}
   \item{col.common}{color of line showing common dispersion}
   \item{col.trend}{color of line showing dispersion trend}
-  \item{col.tagwise}{color of points showing tagwise dispersions}
+  \item{col.tagwise}{color of points showing genewise dispersions. Note that `tag' and `gene' are synonymous here.}
   \item{\dots}{any other arguments are passed to \code{plot}.}
 }
 
 \details{
 The BCV is the square root of the negative binomial dispersion.
-This function displays the common, trended and tagwise BCV estimates.
+This function displays the common, trended and genewise BCV estimates.
 }
 
 \value{
diff --git a/man/plotMDS.DGEList.Rd b/man/plotMDS.DGEList.Rd
index de23f70..f6c6644 100644
--- a/man/plotMDS.DGEList.Rd
+++ b/man/plotMDS.DGEList.Rd
@@ -1,54 +1,51 @@
-\title{Multidimensional scaling plot of digital gene expression profiles}
+\title{Multidimensional scaling plot of distances between digital gene expression profiles}
 \name{plotMDS.DGEList}
 \alias{plotMDS.DGEList}
 \description{
-Calculate distances between RNA-seq or DGE libraries, then produce a multidimensional scaling plot.
-Distances on the plot represent coefficient of variation of expression between samples
-for the top genes that best distinguish the samples.
+Plot samples on a two-dimensional scatterplot so that distances on the plot approximate the expression differences between the samples.
 }
 \usage{
-\method{plotMDS}{DGEList}(x, top=500, labels=colnames(x), col=NULL, cex=1,
-	        dim.plot=c(1,2), ndim=max(dim.plot), xlab=NULL,
-	        ylab=NULL, method="logFC", prior.count=2,
-	        gene.selection="pairwise", \dots)
+\method{plotMDS}{DGEList}(x, top = 500, labels = NULL, pch = NULL, cex = 1,
+	        dim.plot = c(1,2), ndim = max(dim.plot), gene.selection = "pairwise",
+	        xlab = NULL, ylab = NULL, method = "logFC", prior.count = 2,
+	         \dots)
 }
 \arguments{
-  \item{x}{an \code{DGEList} object.}
+  \item{x}{a \code{DGEList} object.}
   \item{top}{number of top genes used to calculate pairwise distances.}
   \item{labels}{character vector of sample names or labels. If \code{x} has no column names, then defaults the index of the samples.}
-  \item{col}{numeric or character vector of colors for the plotting characters. See \code{\link[graphics]{text}} for possible values.}
+  \item{pch}{plotting symbol or symbols. See \code{\link{points}} for possible values. Ignored if \code{labels} is non-\code{NULL}.}
   \item{cex}{numeric vector of plot symbol expansions. See \code{\link[graphics]{text}} for possible values.}
   \item{dim.plot}{which two dimensions should be plotted, numeric vector of length two.}
   \item{ndim}{number of dimensions in which data is to be represented}
+  \item{gene.selection}{character, \code{"pairwise"} to choose the top genes separately for each pairwise comparison between the samples, or \code{"common"} to select the same genes for all comparisons. Only used when \code{method="logFC"}.}
   \item{xlab}{x-axis label}
   \item{ylab}{y-axis label}
-  \item{method}{how to compute distances.  Possible values are "logFC" or \code{"bcv"}.}
+  \item{method}{method used to compute distances.  Possible values are "logFC" or \code{"bcv"}.}
   \item{prior.count}{average prior count to be added to observation to shrink the estimated log-fold-changes towards zero. Only used when \code{method="logFC"}.}
-  \item{gene.selection}{character, \code{"pairwise"} to choose the top genes separately for each pairwise comparison between the samples or \code{"common"} to select the same genes for all comparisons. Only used when \code{method="logFC"}.}
   \item{\dots}{any other arguments are passed to \code{plot}.}
 }
 
 \details{
-This function is a variation on the usual multdimensional scaling (or principle coordinate) plot, in that a distance measure particularly appropriate for the digital gene expression (DGE) context is used.
+The default method (\code{method="logFC"}) is to convert the counts to log-counts-per-million using \code{cpm} and to pass these to the limma \code{plotMDS} function.
+This method calculates distances between samples based on log2 fold changes.
+See the \code{\link[limma:plotMDS]{plotMDS help page}} for details.
+
+The alternative method (\code{method="bcv"}) calculates distances based on biological coefficient of variation.
 A set of top genes are chosen that have largest biological variation between the libraries
-(those with largest tagwise dispersion treating all libraries as one group).
+(those with largest genewise dispersion treating all libraries as one group).
 Then the distance between each pair of libraries (columns) is the biological coefficient of variation (square root of the common dispersion) between those two libraries alone, using
 the top genes.
 
-If \code{x} is a \code{DGEList}, then the library sizes and normalization factors found in the object are used.
-If \code{x} is a matrix, then library sizes are computed from the column sums, but no other normalization is done.
-
-The number \code{top} of top genes chosen for this exercise should roughly correspond to the number of differentially expressed genes with materially large fold-changes.
+The number of genes (\code{top}) chosen for this exercise should roughly correspond to the number of differentially expressed genes with materially large fold-changes.
 The default setting of 500 genes is widely effective and suitable for routine use, but a smaller value might be chosen for when the samples are distinguished by a specific focused molecular pathway.
 Very large values (greater than 1000) are not usually so effective.
 
-This function can be slow when there are many libraries.
+Note that the \code{"bcv"} method is slower than the \code{"logFC"} method when there are many libraries.
 }
 
 \value{
-A plot is created on the current graphics device.
-
-An object of class \code{\link[limma:plotMDS]{MDS}} is invisibly returned.
+An object of class \code{\link[limma:plotMDS]{MDS}} is invisibly returned and a plot is created on the current graphics device.
 }
 
 \author{Yunshun Chen, Mark Robinson and Gordon Smyth}
@@ -58,14 +55,14 @@ An object of class \code{\link[limma:plotMDS]{MDS}} is invisibly returned.
 }
 
 \examples{
-# Simulate DGE data for 1000 genes(tags) and 6 samples.
+# Simulate DGE data for 1000 genes and 6 samples.
 # Samples are in two groups
 # First 200 genes are differentially expressed in second group
 
 ngenes <- 1000
 nlib <- 6
 counts <- matrix(rnbinom(ngenes*nlib, size=1/10, mu=20),ngenes,nlib)
-rownames(counts) <- paste("Gene",1:ngenes)
+rownames(counts) <- paste("gene",1:ngenes, sep=".")
 group <- gl(2,3,labels=c("Grp1","Grp2"))
 counts[1:200,group=="Grp2"] <- counts[1:200,group=="Grp2"] + 10
 y <- DGEList(counts,group=group)
diff --git a/man/plotQLDisp.Rd b/man/plotQLDisp.Rd
index 68d2b4f..0d78dff 100644
--- a/man/plotQLDisp.Rd
+++ b/man/plotQLDisp.Rd
@@ -5,8 +5,8 @@
 Plot the genewise quasi-likelihood dispersion against the gene abundance (in log2 counts per million).
 }
 \usage{
-plotQLDisp(glmfit, xlab="Average Log2 CPM", ylab="Quarter-Root Mean Deviance", pch=16, cex=0.2, 
-    col.shrunk="red", col.trend="blue", col.raw="black", \dots)
+plotQLDisp(glmfit, xlab="Average Log2 CPM", ylab="Quarter-Root Mean Deviance", pch=16, 
+       cex=0.2, col.shrunk="red", col.trend="blue", col.raw="black", \dots)
 }
 \arguments{
   \item{glmfit}{a \code{DGEGLM} object produced by \code{\link{glmQLFit}}.}
diff --git a/man/plotSmear.Rd b/man/plotSmear.Rd
index fc8e5f3..dd8faa5 100644
--- a/man/plotSmear.Rd
+++ b/man/plotSmear.Rd
@@ -4,7 +4,7 @@
 Plots log-Fold Change versus log-Concentration (or, M versus A) for Count Data
 }
 \description{
-Both of these functions plot the log-fold change (i.e. the log of the ratio of expression levels for each tag between two experimential groups) against the log-concentration (i.e. the overall average expression level for each tag across the two groups). To represent counts that were low (e.g. zero in 1 library and non-zero in the other) in one of the two conditions, a 'smear' of points at low A value is presented in \code{plotSmear}.
+Both of these functions plot the log-fold change (i.e. the log of the ratio of expression levels for each gene between two experimential groups) against the log-concentration (i.e. the overall average expression level for each gene across the two groups). To represent counts that were low (e.g. zero in 1 library and non-zero in the other) in one of the two conditions, a 'smear' of points at low A value is presented in \code{plotSmear}.
 }
 \usage{
 plotSmear(object, pair=NULL, de.tags=NULL, xlab="Average logCPM", ylab="logFC", pch=19,
@@ -13,7 +13,7 @@ plotSmear(object, pair=NULL, de.tags=NULL, xlab="Average logCPM", ylab="logFC",
 \arguments{
  \item{object}{\code{DGEList}, \code{DGEExact} or \code{DGELRT} object containing data to produce an MA-plot.}
   \item{pair}{pair of experimental conditions to plot (if \code{NULL}, the first two conditions are used). Ignored if \code{object} is a \code{DGELRT} object.}
-  \item{de.tags}{rownames for tags identified as being differentially expressed; use \code{exactTest} to identify DE genes}
+  \item{de.tags}{rownames for genes identified as being differentially expressed; use \code{exactTest} or \code{glmLRT} to identify DE genes. Note that `tag' and `gene' are synonymous here.}
   \item{xlab}{x-label of plot}
   \item{ylab}{y-label of plot}
   \item{pch}{scalar or vector giving the character(s) to be used in the plot; default value of \code{19} gives a round point.}
@@ -28,7 +28,7 @@ plotSmear(object, pair=NULL, de.tags=NULL, xlab="Average logCPM", ylab="logFC",
 \value{A plot to the current device}
 
 \details{
-\code{plotSmear} is a more sophisticated and superior way to produce an 'MA plot'. \code{plotSmear} resolves the problem of plotting tags that have a total count of zero for one of the groups by adding the 'smear' of points at low A value. The points to be smeared are identified as being equal to the minimum estimated concentration in one of the two groups.  The smear is created by using random uniform numbers of width \code{smearWidth} to the left of the minimum A. \code{plotSmear} also [...]
+\code{plotSmear} is a more sophisticated and superior way to produce an 'MA plot'. \code{plotSmear} resolves the problem of plotting genes that have a total count of zero for one of the groups by adding the 'smear' of points at low A value. The points to be smeared are identified as being equal to the minimum estimated concentration in one of the two groups.  The smear is created by using random uniform numbers of width \code{smearWidth} to the left of the minimum A. \code{plotSmear} als [...]
 }
 
 \author{Mark Robinson, Davis McCarthy}
@@ -40,14 +40,14 @@ plotSmear(object, pair=NULL, de.tags=NULL, xlab="Average logCPM", ylab="logFC",
 \examples{
 y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
 d <- DGEList(counts=y, group=rep(1:2,each=2), lib.size=colSums(y))
-rownames(d$counts) <- paste("tag",1:nrow(d$counts),sep=".")
+rownames(d$counts) <- paste("gene",1:nrow(d$counts),sep=".")
 d <- estimateCommonDisp(d)
 plotSmear(d)
 
 # find differential expression
 de <- exactTest(d)
 
-# highlighting the top 500 most DE tags
-de.tags <- rownames(topTags(de, n=500)$table)
-plotSmear(d, de.tags=de.tags)
+# highlighting the top 500 most DE genes
+de.genes <- rownames(topTags(de, n=500)$table)
+plotSmear(d, de.tags=de.genes)
 }
diff --git a/man/plotSpliceDGE.Rd b/man/plotSpliceDGE.Rd
index 4ef358f..433ce03 100644
--- a/man/plotSpliceDGE.Rd
+++ b/man/plotSpliceDGE.Rd
@@ -1,8 +1,8 @@
-\title{Plot exons on differentially spliced gene}
+\title{Plot exons of a differentially spliced gene}
 \name{plotSpliceDGE}
 \alias{plotSpliceDGE}
 \description{
-Plot exons of differentially spliced gene.
+Plot the exon-level log-fold changes for a differentially spliced gene.
 }
 \usage{
 plotSpliceDGE(lrt, geneid=NULL, rank=1L, FDR = 0.05)
@@ -23,4 +23,4 @@ Plots interaction log-fold-change by exon for the specified gene.
 \seealso{
 \code{\link{diffSpliceDGE}}
 }
-\examples{# See \code{\link{diffSpliceDGE}}}
\ No newline at end of file
+\examples{# See \code{\link{diffSpliceDGE}}}
diff --git a/man/predFC.Rd b/man/predFC.Rd
index 482ee59..c447854 100644
--- a/man/predFC.Rd
+++ b/man/predFC.Rd
@@ -40,7 +40,7 @@ This has the effect that any log-fold-change that was zero prior to augmentation
 The prior counts can be viewed as equivalent to a prior belief that the log-fold changes are small, and the output can be viewed as posterior log-fold-changes from this Bayesian viewpoint.
 The output coefficients are called \emph{predictive} log fold-changes because, depending on the prior, they may be a better prediction of the true log fold-changes than the raw estimates.
 
-Log-fold changes for transcripts with low counts are shrunk more than transcript with high counts.
+Log-fold changes for genes with low counts are shrunk more than those for genes with high counts.
 In particular, infinite log-fold-changes arising from zero counts are avoided.
 The exact degree to which this is done depends on the negative binomail dispersion.
 
diff --git a/man/processAmplicons.Rd b/man/processAmplicons.Rd
index a7efce8..3235273 100644
--- a/man/processAmplicons.Rd
+++ b/man/processAmplicons.Rd
@@ -10,6 +10,7 @@ Given a list of sample-specific index (barcode) sequences and hairpin/sgRNA-spec
 \usage{
 processAmplicons(readfile, readfile2=NULL, barcodefile, hairpinfile,
                     barcodeStart=1, barcodeEnd=5, 
+                    barcode2Start=NULL, barcode2End=NULL,
                     barcodeStartRev=NULL, barcodeEndRev=NULL, 
                     hairpinStart=37, hairpinEnd=57,
                     allowShifting=FALSE, shiftingBase=3,
@@ -25,6 +26,8 @@ processAmplicons(readfile, readfile2=NULL, barcodefile, hairpinfile,
 \item{hairpinfile}{filename containing hairpin/sgRNA-specific ids and sequences}
 \item{barcodeStart}{numeric value, starting position (inclusive) of barcode sequence in reads}
 \item{barcodeEnd}{numeric value, ending position (inclusive) of barcode sequence in reads}
+\item{barcode2Start}{numeric value, starting position (inclusive) of second barcode sequence in forward reads}
+\item{barcode2End}{numeric value, ending position (inclusive) of second barcode sequence in forward reads}
 \item{barcodeStartRev}{numeric value, starting position (inclusive) of barcode sequence in reverse reads, default to NULL}
 \item{barcodeEndRev}{numeric value, ending position (inclusive) of barcode sequence in reverse reads, default to NULL}
 \item{hairpinStart}{numeric value, starting position (inclusive) of hairpin/sgRNA sequence in reads}
@@ -45,9 +48,9 @@ processAmplicons(readfile, readfile2=NULL, barcodefile, hairpinfile,
 }
 
 \details{
-The input barcode file and hairpin/sgRNA files are tab-separated text files with at least two columns (named 'ID' and 'Sequences') containing the sample or hairpin/sgRNA ids and a second column indicating the sample index or hairpin/sgRNA sequences to be matched. If \code{readfile2}, \code{barcodeStartRev} and \code{barcodeEndRev} are specified, a third column 'SequencesReverse' is expected in the barcode file. The barcode file may also contain a 'group' column that indicates which exper [...]
+The input barcode file and hairpin/sgRNA files are tab-separated text files with at least two columns (named 'ID' and 'Sequences') containing the sample or hairpin/sgRNA ids and a second column indicating the sample index or hairpin/sgRNA sequences to be matched. If \code{barcode2Start} and \code{barcode2End} are specified, a third column 'Sequences2' is expected in the barcode file. If \code{readfile2}, \code{barcodeStartRev} and \code{barcodeEndRev} are specified, another column 'Seque [...]
 
-To compute the count matrix, matching to the given barcodes and hairpins/sgRNAs is conducted in two rounds. The first round looks for an exact sequence match for the given barcode sequences and hairpin/sgRNA sequences at the locations specified. If \code{allowShifting} is set to \code{TRUE}, the program also checks if a given hairpin/sgRNA sequence can be found at a neighbouring position in the read. For hairpins/sgRNAs without a match, the program performs a second round of matching whi [...]
+To compute the count matrix, matching to the given barcodes and hairpins/sgRNAs is conducted in two rounds. The first round looks for an exact sequence match for the given barcode sequences and hairpin/sgRNA sequences at the locations specified. If \code{allowShifting} is set to \code{TRUE}, the program also checks if a given hairpin/sgRNA sequence can be found at a neighbouring position in the read. If a match isn't found, the program performs a second round of matching which allows for [...]
 
 The program outputs a \code{\link[edgeR:DGEList-class]{DGEList}} object, with a count matrix indicating the number of times each barcode and hairpin/sgRNA combination could be matched in reads from input fastq file/(s).
 
@@ -57,5 +60,5 @@ For further examples and data, refer to the Case studies available from http://b
 \author{Zhiyin Dai and Matthew Ritchie}
 
 \references{
-Dai Z, Sheridan JM, et al. (2014). shRNA-seq data analysis with edgeR. F1000Research, http://f1000research.com/articles/10.12688/f1000research.4204/doi.
+Dai Z, Sheridan JM, et al. (2014). edgeR: a versatile tool for the analysis of shRNA-seq and CRISPR-Cas9 genetic screens. F1000Research, 3:95. http://f1000research.com/articles/3-95/v2. PMID: 24860646.
 }
diff --git a/man/readDGE.Rd b/man/readDGE.Rd
index 74ce789..4ea86f5 100755
--- a/man/readDGE.Rd
+++ b/man/readDGE.Rd
@@ -11,16 +11,16 @@
 \item{files}{character vector of filenames, or alternatively a data.frame with a column containing the file names of the files containing the libraries of counts and, optionally, columns containing the \code{group} to which each library belongs, descriptions of the other samples and other information.}
 \item{path}{character string giving the directory containing the files.
 The default is the current working directory.}
-\item{columns}{numeric vector stating which two columns contain the tag names and counts, respectively}
+\item{columns}{numeric vector stating which two columns contain the gene names and counts, respectively}
 \item{group}{vector, or preferably a factor, indicating the experimental group to which each library belongs. If \code{group} is not \code{NULL}, then this argument overrides any group information included in the \code{files} argument.}
 \item{labels}{character vector giving short names to associate with the libraries. Defaults to the file names.}
 \item{\dots}{other are passed to \code{read.delim}}
 }
 
 \details{
-Each file is assumed to contained digital gene expression data for one sample (or library), with transcript identifiers in the first column and counts in the second column. Transcript identifiers are assumed to be unique and not repeated in any one file. 
+Each file is assumed to contained digital gene expression data for one sample (or library), with gene identifiers in the first column and counts in the second column. Gene identifiers are assumed to be unique and not repeated in any one file. 
 By default, the files are assumed to be tab-delimited and to contain column headings.
-The function forms the union of all transcripts and creates one big table with zeros where necessary.
+The function forms the union of all genes and creates one big table with zeros where necessary.
 }
 
 \value{DGEList object}
diff --git a/man/roast.DGEList.Rd b/man/roast.DGEList.Rd
index ed9e026..45593ef 100644
--- a/man/roast.DGEList.Rd
+++ b/man/roast.DGEList.Rd
@@ -16,7 +16,7 @@ Rotation gene set testing for Negative Binomial generalized linear models.
   \item{index}{index vector specifying which rows (genes) of \code{y} are in the test set.  This can be a vector of indices, or a logical vector of the same length as \code{statistics}, or any vector such as \code{y[iset,]} contains the values for the gene set to be tested. Defaults to all genes. For \code{mroast} a list of index vectors.}
   \item{design}{design matrix}
   \item{contrast}{contrast for which the test is required. Can be an integer specifying a column of \code{design}, or else a contrast vector of length equal to the number of columns of \code{design}.}
-  \item{\dots}{other arguments are passed to \code{link{roast.default}} or \code{link{mroast.default}}.}
+  \item{\dots}{other arguments are passed to \code{\link{roast.default}} or \code{\link{mroast.default}}.}
 }
 
 \value{
diff --git a/man/romer.DGEList.Rd b/man/romer.DGEList.Rd
new file mode 100644
index 0000000..994ba38
--- /dev/null
+++ b/man/romer.DGEList.Rd
@@ -0,0 +1,80 @@
+\name{romer.DGEList}
+\alias{romer.DGEList}
+\title{Rotation Gene Set Tests for Digital Gene Expression Data}
+\description{
+Rotation gene set testing for Negative Binomial generalized linear models.
+}
+
+\usage{
+\method{romer}{DGEList}(y, index, design=NULL, contrast=ncol(design), \dots)
+}
+
+\arguments{
+  \item{y}{\code{DGEList} object.}
+  \item{index}{list of indices specifying the rows of \code{y} in the gene sets. The list can be made using \link{ids2indices}.}
+  \item{design}{design matrix}
+  \item{contrast}{contrast for which the test is required. Can be an integer specifying a column of \code{design}, or else a contrast vector of length equal to the number of columns of \code{design}.}
+  \item{\dots}{other arguments passed to \code{\link{romer.default}}.}
+}
+
+\value{
+Numeric matrix giving p-values and the number of matched genes in each gene set.
+Rows correspond to gene sets.
+There are four columns giving the number of genes in the set and p-values for the alternative hypotheses up, down or mixed.
+See \code{\link{romer}} for details.
+}
+
+\details{
+The ROMER procedure described by Majewski et al (2010) is implemented in \code{romer} in the \code{limma} package.
+This function makes the romer test available for digital gene expression data such as RNA-Seq data.
+The negative binomial count data is converted to approximate normal deviates by computing mid-p quantile residuals (Dunn and Smyth, 1996; Routledge, 1994) under the null hypothesis that the contrast is zero.
+See \code{\link{romer}} for more description of the test and for a complete list of possible arguments.
+
+The design matrix defaults to the \code{model.matrix(~y$samples$group)}.
+}
+
+\seealso{
+\code{\link{romer}}
+}
+
+\author{Yunshun Chen and Gordon Smyth}
+
+\references{
+Majewski, IJ, Ritchie, ME, Phipson, B, Corbin, J, Pakusch, M, Ebert, A, Busslinger, M, Koseki, H, Hu, Y, Smyth, GK, Alexander, WS, Hilton, DJ, and Blewitt, ME (2010).
+Opposing roles of polycomb repressive complexes in hematopoietic stem and progenitor cells.
+\emph{Blood}, published online 5 May 2010.
+\url{http://www.ncbi.nlm.nih.gov/pubmed/20445021}
+
+Dunn, K. P., and Smyth, G. K. (1996).
+Randomized quantile residuals.
+\emph{J. Comput. Graph. Statist.}, 5, 236-244. 
+\url{http://www.statsci.org/smyth/pubs/residual.html}
+
+Routledge, RD (1994).
+Practicing safe statistics with the mid-p.
+\emph{Canadian Journal of Statistics} 22, 103-110.
+}
+
+\examples{
+mu <- matrix(10, 100, 4)
+group <- factor(c(0,0,1,1))
+design <- model.matrix(~group)
+
+# First set of 10 genes that are genuinely differentially expressed
+iset1 <- 1:10
+mu[iset1,3:4] <- mu[iset1,3:4]+20
+
+# Second set of 10 genes are not DE
+iset2 <- 11:20
+
+# Generate counts and create a DGEList object
+y <- matrix(rnbinom(100*4, mu=mu, size=10),100,4)
+y <- DGEList(counts=y, group=group)
+
+# Estimate dispersions
+y <- estimateDisp(y, design)
+
+romer(y, iset1, design, contrast=2)
+romer(y, iset2, design, contrast=2)
+romer(y, list(set1=iset1, set2=iset2), design, contrast=2)
+}
diff --git a/man/splitIntoGroups.Rd b/man/splitIntoGroups.Rd
index 52a2903..c4d0ab0 100644
--- a/man/splitIntoGroups.Rd
+++ b/man/splitIntoGroups.Rd
@@ -29,7 +29,7 @@ splitIntoGroupsPseudo(pseudo, group, pair)
 # generate raw counts from NB, create list object
 y<-matrix(rnbinom(80,size=1,mu=10),nrow=20)
 d<-DGEList(counts=y,group=rep(1:2,each=2),lib.size=rep(c(1000:1001),2))
-rownames(d$counts)<-paste("tagno",1:nrow(d$counts),sep=".")
+rownames(d$counts)<-paste("gene",1:nrow(d$counts),sep=".")
 z1<-splitIntoGroups(d)
 
 z2<-splitIntoGroupsPseudo(d$counts,d$group,pair=c(1,2))
diff --git a/man/subsetting.Rd b/man/subsetting.Rd
index afc2411..e5ca468 100644
--- a/man/subsetting.Rd
+++ b/man/subsetting.Rd
@@ -18,7 +18,7 @@ Extract a subset of a \code{DGEList}, \code{DGEGLM}, \code{DGEExact} or \code{DG
 }
 \arguments{
   \item{object}{object of class \code{DGEList}, \code{DGEGLM}, \code{DGEExact} or \code{DGELRT}. For \code{subsetListOfArrays}, any list of conformal matrices and vectors.}
-  \item{i,j}{elements to extract. \code{i} subsets the tags or genes while \code{j} subsets the libraries.
+  \item{i,j}{elements to extract. \code{i} subsets the genes while \code{j} subsets the libraries.
   Note that columns of \code{DGEGLM}, \code{DGEExact} and \code{DGELRT} objects cannot be subsetted.}
   \item{keep.lib.sizes}{logical, if \code{TRUE} the lib.sizes will be kept unchanged on output, otherwise they will be recomputed as the column sums of the counts of the remaining rows.}
 }
diff --git a/man/topTags.Rd b/man/topTags.Rd
index 5593f59..f1bbd0e 100755
--- a/man/topTags.Rd
+++ b/man/topTags.Rd
@@ -8,7 +8,7 @@
 \description{Extracts the top DE tags in a data frame for a given pair of groups, ranked by p-value or absolute log-fold change.}
 
 \usage{
-topTags(object, n=10, adjust.method="BH", sort.by="PValue")
+topTags(object, n=10, adjust.method="BH", sort.by="PValue", p.value=1)
 }
 
 \arguments{ 
@@ -19,6 +19,8 @@ topTags(object, n=10, adjust.method="BH", sort.by="PValue")
 \item{adjust.method}{character string stating the method used to adjust p-values for multiple testing, passed on to \code{p.adjust}}
 
 \item{sort.by}{character string, should the top tags be sorted by p-value (\code{"PValue"}), by absolute log-fold change (\code{"logFC"}), or not sorted (\code{"none"}).}
+
+\item{p.value}{cutoff value for adjusted p-values. Only tags with lower p-values are listed.}
 }
 
 \value{
@@ -34,6 +36,7 @@ The dimensions, row names and column names of a \code{TopTags} object are define
 
 \code{TopTags} objects also have a \code{show} method so that printing produces a compact summary of their contents.
 
+Note that the terms `tag' and `gene' are synonymous here. The function is only named as `Tags' for historical reasons.
 }
 
 \author{Mark Robinson, Davis McCarthy, Gordon Smyth}
@@ -41,7 +44,7 @@ The dimensions, row names and column names of a \code{TopTags} object are define
 # generate raw counts from NB, create list object
 y <- matrix(rnbinom(80,size=1,mu=10),nrow=20)
 d <- DGEList(counts=y,group=rep(1:2,each=2),lib.size=rep(c(1000:1001),2))
-rownames(d$counts) <- paste("tag",1:nrow(d$counts),sep=".")
+rownames(d$counts) <- paste("gene",1:nrow(d$counts),sep=".")
 
 # estimate common dispersion and find differences in expression
 # here we demonstrate the 'exact' methods, but the use of topTags is
@@ -51,7 +54,7 @@ de <- exactTest(d)
 
 # look at top 10
 topTags(de)
-# Can specify how many tags to view
+# Can specify how many genes to view
 tp <- topTags(de, n=15)
 # Here we view top 15
 tp
diff --git a/man/weightedCondLogLikDerDelta.Rd b/man/weightedCondLogLikDerDelta.Rd
index 5104238..8afb964 100644
--- a/man/weightedCondLogLikDerDelta.Rd
+++ b/man/weightedCondLogLikDerDelta.Rd
@@ -3,7 +3,7 @@
 
 \title{Weighted Conditional Log-Likelihood in Terms of Delta}
 
-\description{Weighted conditional log-likelihood parameterized in terms of delta (\code{phi / (phi+1)}) for a given tag/gene - maximized to find the smoothed (moderated) estimate of the dispersion parameter}
+\description{Weighted conditional log-likelihood parameterized in terms of delta (\code{phi / (phi+1)}) for a given gene, maximized to find the smoothed (moderated) estimate of the dispersion parameter}
 
 \usage{
 weightedCondLogLikDerDelta(y, delta, tag, prior.n=10, ntags=nrow(y[[1]]), der=0)
@@ -14,19 +14,20 @@ weightedCondLogLikDerDelta(y, delta, tag, prior.n=10, ntags=nrow(y[[1]]), der=0)
 
 \item{delta}{delta (\code{phi / (phi+1)})parameter of negative binomial}
 
-\item{tag}{tag/gene at which the weighted conditional log-likelihood is evaluated}
+\item{tag}{gene at which the weighted conditional log-likelihood is evaluated}
 
-\item{prior.n}{smoothing paramter that indicates the weight to put on the common likelihood compared to the individual tag's likelihood; default \code{10} means that the common likelihood is given 10 times the weight of the individual tag/gene's likelihood in the estimation of the tag/genewise dispersion}
+\item{prior.n}{smoothing paramter that indicates the weight to put on the common likelihood compared to the individual gene's likelihood; default \code{10} means that the common likelihood is given 10 times the weight of the individual gene's likelihood in the estimation of the genewise dispersion}
 
-\item{ntags}{numeric scalar number of tags/genes in the dataset to be analysed}
+\item{ntags}{numeric scalar number of genes in the dataset to be analysed}
 
 \item{der}{derivative, either 0 (the function), 1 (first derivative) or 2 (second derivative)}
 }
 
-\value{ numeric scalar of function/derivative evaluated for the  given tag/gene and delta}
+\value{ numeric scalar of function/derivative evaluated for the  given gene and delta}
 
 \details{
-This function computes the weighted conditional log-likelihood for a given tag, parameterized in terms of delta. The value of delta that maximizes the weighted conditional log-likelihood is converted back to the \code{phi} scale, and this value is the estimate of the smoothed (moderated) dispersion parameter for that particular tag. The delta scale for convenience (delta is bounded between 0 and 1). 
+This function computes the weighted conditional log-likelihood for a given gene, parameterized in terms of delta. The value of delta that maximizes the weighted conditional log-likelihood is converted back to the \code{phi} scale, and this value is the estimate of the smoothed (moderated) dispersion parameter for that particular gene. The delta scale for convenience (delta is bounded between 0 and 1). 
+Users should note that `tag' and `gene' are synonymous when interpreting the names of the arguments for this function.
 }
 
 
diff --git a/src/R_one_group.cpp b/src/R_one_group.cpp
index 20388f6..4459ff3 100644
--- a/src/R_one_group.cpp
+++ b/src/R_one_group.cpp
@@ -1,10 +1,11 @@
 #include "glm.h"
 #include "matvec_check.h"
 
-SEXP R_one_group (SEXP y, SEXP disp, SEXP offsets, SEXP weights, SEXP max_iterations, SEXP tolerance, SEXP beta) try {
+SEXP R_one_group (SEXP nlib, SEXP ntag, SEXP y, SEXP disp, SEXP offsets, SEXP weights, SEXP max_iterations, SEXP tolerance, SEXP beta) try {
+    const int num_tags=asInteger(ntag);
+    const int num_libs=asInteger(nlib);
 	if (!isNumeric(disp)) { throw std::runtime_error("dispersion vector must be double precision"); }
-	const int num_tags=LENGTH(disp);
-	const int num_libs=LENGTH(y)/num_tags;
+	if (num_tags!=LENGTH(disp)) { throw std::runtime_error("length of dispersion vector is not equal to number of tags"); }
 	if (num_tags*num_libs != LENGTH(y) ) { throw std::runtime_error("dimensions of the count table are not as specified"); }  // Checking that it is an exact division.
   
 	if (!isNumeric(beta)) { throw std::runtime_error("beta start vector must be double precision"); }
@@ -33,7 +34,7 @@ SEXP R_one_group (SEXP y, SEXP disp, SEXP offsets, SEXP weights, SEXP max_iterat
 	const double* const* optr2=allo.access();
 	matvec_check allw(num_libs, num_tags, weights, false, "weight", true);
 	const double* const* wptr2=allw.access();
-	double* dptr=REAL(disp);
+	const double* dptr=REAL(disp);
 
     // Setting up beta for output. 
 	SEXP output=PROTECT(allocVector(VECSXP, 2));
@@ -44,15 +45,14 @@ SEXP R_one_group (SEXP y, SEXP disp, SEXP offsets, SEXP weights, SEXP max_iterat
 	try {
         
     	// Iterating through tags and fitting.
-    	int counter=0;
+    	int lib=0;
     	for (int tag=0; tag<num_tags; ++tag) {
-			counter=0;
 			if (is_integer) { 
-				for (int i=0; i<num_libs; ++i, counter+=num_tags) { yptr[i]=double(yiptr[counter]); }	
-				++yiptr;
+				for (lib=0; lib<num_libs; ++lib) { yptr[lib]=double(yiptr[lib]); }	
+				yiptr+=num_libs;
 			} else {
-				for (int i=0; i<num_libs; ++i, counter+=num_tags) { yptr[i]=ydptr[counter]; }
-				++ydptr;
+				yptr=ydptr;
+				ydptr+=num_libs;
 			}
 			std::pair<double, bool> out=glm_one_group(num_libs, maxit, tol, *optr2,
 #ifdef WEIGHTED					
diff --git a/src/R_process_hairpin_reads.c b/src/R_process_hairpin_reads.c
index df0c36b..f4a39c1 100644
--- a/src/R_process_hairpin_reads.c
+++ b/src/R_process_hairpin_reads.c
@@ -5,14 +5,13 @@
 #include <ctype.h>
 #include <time.h>
 
-#define MAX_BARCODE 1000
-#define MAX_HAIRPIN 100000
 #define BLOCKSIZE 10000000
 
 typedef struct {
-   char   *sequence;
-   char   *sequenceRev;
-   int    original_pos;
+  char   *sequence;
+  char   *sequence2;
+  char   *sequenceRev;
+  int    original_pos;
 } a_barcode;
 
 typedef struct {
@@ -20,20 +19,23 @@ typedef struct {
    int    original_pos;
 } a_hairpin;
 
-a_barcode *barcodes[MAX_BARCODE];
-a_hairpin *hairpins[MAX_HAIRPIN];
+a_barcode **barcodes;
+a_hairpin **hairpins;
 
-int isPairedReads;
+int is_PairedReads; 
+int is_DualIndexingReads;
 int num_barcode;
 int num_hairpin;
 long num_read;
-long hairpinreadcount[MAX_HAIRPIN];
-long summary[MAX_HAIRPIN][MAX_BARCODE];
+long **summary;
 int barcode_start;
 int barcode_end;
+int barcode2_start;
+int barcode2_end;
 int barcode_start_rev;
 int barcode_end_rev;
 int barcode_length;
+int barcode2_length;
 int barcode_length_rev;
 int hairpin_start;
 int hairpin_end;
@@ -41,7 +43,6 @@ int hairpin_length;
 int allow_shifting;
 int shifting_n_base; 
 int allow_mismatch;
-int num_mismatch_hairpin;
 int barcode_n_mismatch;
 int hairpin_n_mismatch;
 int allow_shifted_mismatch;
@@ -51,7 +52,20 @@ long barcodecount;
 long hairpincount;
 long bchpcount;
 
-a_hairpin *mismatch_hairpins[MAX_HAIRPIN];
+int Get_Lines_In_File(FILE* fin) {
+  int N=0, ch, last_ch='\n';
+  while (1) { 
+    ch=fgetc(fin);
+    if (ch=='\n') { ++N; }
+    else if (ch==EOF) { 
+      if (last_ch!='\n') { ++N; } // Capture non-newline-terminated last line.
+      break;
+    }
+    last_ch=ch;
+  }
+  rewind(fin);
+  return N;
+}
 
 void
 Read_In_Barcodes(char* filename){
@@ -61,9 +75,13 @@ Read_In_Barcodes(char* filename){
   char *readline;
 
   fin = fopen(filename,"r");
+  
+  // Getting number of lines in the file.
+  num_barcode = Get_Lines_In_File(fin);
+  barcodes=(a_barcode**)R_alloc(num_barcode+1, sizeof(a_barcode*));
+
   line = (char *)malloc(len+1);
   a_barcode *new_barcode;
-
   int count = 0;
   char * token;
 
@@ -73,24 +91,44 @@ Read_In_Barcodes(char* filename){
     new_barcode->sequence = (char *)malloc(barcode_length * sizeof(char));
     strncpy(new_barcode->sequence, line, barcode_length);
     new_barcode->original_pos = count;
-    if (isPairedReads > 0) {
+    if (is_PairedReads > 0) {
       token = strtok(line, "\t");
       token = strtok(NULL, "\t");
       new_barcode->sequenceRev = (char *)malloc(barcode_length_rev * sizeof(char));
       strncpy(new_barcode->sequenceRev, token, barcode_length_rev);
+    } else if (is_DualIndexingReads > 0) {
+      token = strtok(line, "\t");
+      token = strtok(NULL, "\t");
+      new_barcode->sequence2 = (char *)malloc(barcode_length_rev * sizeof(char));
+      strncpy(new_barcode->sequence2, token, barcode2_length);
     } else {
       new_barcode->sequenceRev = NULL;
     };
     barcodes[count] = new_barcode;
   }
   fclose(fin);
-  num_barcode = count;
   free(line);
 
   Rprintf(" -- Number of Barcodes : %d\n", num_barcode);
 }
 
 int
+Valid_Match(char *sequence1, char *sequence2, int length, int threshold){
+  int i_base;
+  int mismatchbasecount = 0;
+  for (i_base = 0; i_base < length; i_base++) {
+    if (sequence1[i_base] != sequence2[i_base]) {
+      mismatchbasecount++;		  
+    }
+  }
+  if (mismatchbasecount <= threshold) {
+    return 1;
+  } else {
+    return -1;
+  }
+}
+
+int
 locate_barcode(char *a_barcode){
   int imin, imax, imid;
   imin = 1;
@@ -107,9 +145,20 @@ locate_barcode(char *a_barcode){
       return barcodes[imid]->original_pos;
     }
   }
-  return -1;  
+ 
+  if (allow_mismatch > 0) {
+    int i;
+    for (i = 1; i <= num_barcode; i++){
+      if (Valid_Match(a_barcode, barcodes[i]->sequence, barcode_length, barcode_n_mismatch) > 0) {
+        return barcodes[i]->original_pos; 
+      }   
+    }
+  }
+  
+  return -1;
 }
 
+
 int
 locate_barcode_paired(char *a_barcode, char *a_barcode_rev){
   int imin, imax, imid;
@@ -134,102 +183,111 @@ locate_barcode_paired(char *a_barcode, char *a_barcode_rev){
       } 
     }    
   }
-  return -1; 
+  
+  if (allow_mismatch > 0){
+    int i;
+    for (i = 1; i <= num_barcode; i++){
+      if ((Valid_Match(a_barcode, barcodes[i]->sequence, barcode_length, barcode_n_mismatch) > 0) && 
+          (Valid_Match(a_barcode_rev, barcodes[i]->sequenceRev, barcode_length_rev, barcode_n_mismatch) > 0)) {
+         return barcodes[i]->original_pos;
+      }
+    }
+  }
+
+  return -1;
 }
 
 
 int
-locate_hairpin_impl(char *a_hairpin){
+locate_barcode_dualIndexing(char *a_barcode, char *a_barcode2){
   int imin, imax, imid;
   imin = 1;
-  imax = num_hairpin;
+  imax = num_barcode;
 
   while (imax >= imin) {
     imid = (imax + imin) / 2;
-    
-    if (strncmp(hairpins[imid]->sequence, a_hairpin, hairpin_length) < 0) {
+    // compare forward barcode sequence 1 
+    if (strncmp(barcodes[imid]->sequence, a_barcode, barcode_length) < 0) {
       imin = imid + 1;
-    } else if (strncmp(hairpins[imid]->sequence, a_hairpin, hairpin_length) > 0) {
+    } else if (strncmp(barcodes[imid]->sequence, a_barcode, barcode_length) > 0) {
       imax = imid - 1;
     } else {
-      return hairpins[imid]->original_pos;
-    }
+      // same forward sequence 1, compare forward barcode sequence 2
+      if (strncmp(barcodes[imid]->sequence2, a_barcode2, barcode2_length) < 0) {
+        imin = imid + 1;
+      } else if (strncmp(barcodes[imid]->sequence2, a_barcode2, barcode2_length) > 0) {
+        imax = imid - 1;
+      } else {
+        return barcodes[imid]->original_pos;     
+      } 
+    }    
   }
-  return -1;  
-}
-
-int
-Valid_Match(char *sequence1, char *sequence2, int length, int threshold){
-  int i_base;
-  int mismatchbasecount = 0;
-  for (i_base = 0; i_base < length; i_base++) {
-    if (sequence1[i_base] != sequence2[i_base]) {
-      mismatchbasecount++;		  
+  
+  if (allow_mismatch > 0) {
+    int i;
+    for (i = 1; i <= num_barcode; i++){
+      if ((Valid_Match(a_barcode, barcodes[i]->sequence, barcode_length, barcode_n_mismatch) > 0) && 
+	    (Valid_Match(a_barcode2, barcodes[i]->sequence2, barcode2_length, barcode_n_mismatch) > 0)) {
+        return barcodes[i]->original_pos;
+      }
     }
   }
-  if (mismatchbasecount <= threshold) {
-    return 1;
-  } else {
-    return -1;
-  }
+  
+  return -1;
 }
 
 int
-locate_mismatch_barcode(char *a_barcode){
-  int i;
-  int match_index = -1;
-  for (i = 1; i <= num_barcode; i++){
-    if (Valid_Match(a_barcode, barcodes[i]->sequence, barcode_length, barcode_n_mismatch) > 0) {
-      match_index = barcodes[i]->original_pos;
-      break;
-    }
-  }
-  return match_index;
-}
+locate_exactmatch_hairpin(char *a_hairpin){
+  int imin, imax, imid;
+  imin = 1;
+  imax = num_hairpin;
 
-int
-locate_mismatch_barcode_paired(char *a_barcode, char *a_barcode_rev){
-  int i;
-  int match_index = -1;
-  for (i = 1; i <= num_barcode; i++){
-    if ((Valid_Match(a_barcode, barcodes[i]->sequence, barcode_length, barcode_n_mismatch) > 0) && 
-	(Valid_Match(a_barcode_rev, barcodes[i]->sequenceRev, barcode_length_rev, barcode_n_mismatch) > 0)) {
-      match_index = barcodes[i]->original_pos;
-      break;
+  while (imax >= imin) {
+    imid = (imax + imin) / 2;
+    
+    if (strncmp(hairpins[imid]->sequence, a_hairpin, hairpin_length) < 0) {
+      imin = imid + 1;
+    } else if (strncmp(hairpins[imid]->sequence, a_hairpin, hairpin_length) > 0) {
+      imax = imid - 1;
+    } else {
+      return hairpins[imid]->original_pos;
     }
   }
-  return match_index;
+  return -1;  
 }
 
 
 int
 locate_mismatch_hairpin(char *a_hairpin){
   int i;
-  int match_index = -1;
-  for (i = 1; i <= num_mismatch_hairpin; i++){
-    if (Valid_Match(a_hairpin, mismatch_hairpins[i]->sequence, hairpin_length, hairpin_n_mismatch) > 0) {
-      match_index = mismatch_hairpins[i]->original_pos;
-      break;
+  for (i = 1; i <= num_hairpin; i++){
+    if (Valid_Match(a_hairpin, hairpins[i]->sequence, hairpin_length, hairpin_n_mismatch) > 0) {
+      return hairpins[i]->original_pos;
     }
   }
-  return match_index;
+  return -1;
 }
 
 
 int 
-locate_hairpin(char *a_hairpin, char *read, int doMismatch){
+locate_hairpin(char *a_hairpin, char *read){
   int hairpin_index;
-  if (doMismatch > 0){
-    hairpin_index = locate_mismatch_hairpin(a_hairpin);
-  } else {
-    hairpin_index = locate_hairpin_impl(a_hairpin);
+  
+  // check if a perfect match exists
+  hairpin_index = locate_exactmatch_hairpin(a_hairpin); 
+  if (hairpin_index > 0) {
+    return hairpin_index;
   }
-
-  if ((hairpin_index <= 0) && (allow_shifting > 0)) {
-    if ((doMismatch > 0) && (allow_shifted_mismatch <= 0)) {
+  
+  // if a perfect match doesn't exist, check if a mismatch exists
+  if (allow_mismatch > 0) {   
+    hairpin_index = locate_mismatch_hairpin(a_hairpin);
+	if (hairpin_index > 0) {
       return hairpin_index;
-      exit(0); 
     }
+  }
+
+  if (allow_shifting > 0) {
     // Check if given hairpin can be mapped to a shifted location. 
     char *shiftedharipinstr;
     shiftedharipinstr = (char *)malloc(hairpin_length * sizeof(char));
@@ -238,41 +296,48 @@ locate_hairpin(char *a_hairpin, char *read, int doMismatch){
     // check shifting leftwards
     for (index = 1; index <= shifting_n_base; index++){
       strncpy(shiftedharipinstr, read + hairpin_start - 1 - index, hairpin_length);
-      if (doMismatch > 0){
-        hairpin_index = locate_mismatch_hairpin(shiftedharipinstr);
-      } else {
-        hairpin_index = locate_hairpin_impl(shiftedharipinstr);
-      }
+      
+	  hairpin_index = locate_exactmatch_hairpin(shiftedharipinstr);
+	  
+	  // check if mismatch on a shifted position is allowed and try to match
+	  if ((hairpin_index <= 0) && (allow_shifted_mismatch)) {
+	    hairpin_index = locate_mismatch_hairpin(shiftedharipinstr);
+	  }
+	  
       if (hairpin_index > 0) {
-        break;
-      }
+        return hairpin_index;
+      } 
     }
-    // check shifting rightwards
-    if (hairpin_index <= 0){
-      for (index = 1; index <= shifting_n_base; index++){
-        strncpy(shiftedharipinstr, read + hairpin_start - 1 + index, hairpin_length);
-        
-        if (doMismatch > 0){
-          hairpin_index = locate_mismatch_hairpin(shiftedharipinstr);
-        } else {
-          hairpin_index = locate_hairpin_impl(shiftedharipinstr);
-        }
-
-        if (hairpin_index > 0) {
-          break;
-        }
+    
+	// check shifting rightwards
+    for (index = 1; index <= shifting_n_base; index++){
+      strncpy(shiftedharipinstr, read + hairpin_start - 1 + index, hairpin_length);
+ 
+      hairpin_index = locate_exactmatch_hairpin(shiftedharipinstr); 
+      
+      if ((hairpin_index <= 0) && (allow_shifted_mismatch)) {	  
+        hairpin_index = locate_mismatch_hairpin(shiftedharipinstr);
+      } 
+
+      if (hairpin_index > 0) {
+        return hairpin_index;
       }
     }
-  } 
-  return hairpin_index;
+  }
+  
+  return -1;
 }
 
 int 
 barcode_compare(a_barcode *barcode1, a_barcode *barcode2){
   int ans;
   ans = strncmp(barcode1->sequence, barcode2->sequence, barcode_length);
-  if ((ans == 0) && (isPairedReads > 0)){  
-    ans = strncmp(barcode1->sequenceRev, barcode2->sequenceRev, barcode_length_rev);
+  if (ans == 0) {
+    if (is_PairedReads > 0){  
+      ans = strncmp(barcode1->sequenceRev, barcode2->sequenceRev, barcode_length_rev);
+    } else if (is_DualIndexingReads > 0){
+      ans = strncmp(barcode1->sequence2, barcode2->sequence2, barcode2_length);
+    }
   }
 
   return ans;
@@ -301,10 +366,15 @@ Read_In_Hairpins(char *filename){
   char *readline;
 
   fin = fopen(filename,"r");
+
+  // Getting number of lines in the file.
+  num_hairpin = Get_Lines_In_File(fin);
+  hairpins=(a_hairpin**)R_alloc(num_hairpin+1, sizeof(a_hairpin*));
+
   line = (char *)malloc(len+1);
   a_hairpin *new_hairpin;
-
   int count = 0;
+
   while ((readline = fgets(line, len, fin)) != NULL){
     count++;
     new_hairpin = (a_hairpin *)malloc(sizeof(a_hairpin));
@@ -314,13 +384,8 @@ Read_In_Hairpins(char *filename){
     hairpins[count] = new_hairpin;
   }
   fclose(fin);
-  num_hairpin = count;
   free(line);
 
-  int i_hairpin;
-  for (i_hairpin = 1; i_hairpin <= num_hairpin; i_hairpin++){
-    hairpinreadcount[i_hairpin] = 0;
-  }
   Rprintf(" -- Number of Hairpins : %d\n", num_hairpin);
 }
 
@@ -351,18 +416,19 @@ Process_Hairpin_Reads(char *filename, char *filename2){
   long num_read_thisfile = 0;
 
   char *this_barcode_for = NULL;
+  char *this_barcode_2 = NULL;
   char *this_barcode_rev = NULL;
   char *this_hairpin = NULL;
 
   line = (char *)malloc(sizeof(char) * (len+1));
   fin = fopen(filename,"r");
-  if (isPairedReads > 0) {
+  if (is_PairedReads > 0) {
     finRev = fopen(filename2, "r");
     line2 = (char *)malloc(sizeof(char) * (len+1));
   }
 
   if (isverbose > 0){
-    if (isPairedReads > 0) {
+    if (is_PairedReads > 0) {
       Rprintf("Processing reads in %s and %s.\n", filename, filename2);
     } else {
       Rprintf("Processing reads in %s.\n", filename);
@@ -370,7 +436,10 @@ Process_Hairpin_Reads(char *filename, char *filename2){
   }
 
   this_barcode_for = (char *)malloc(barcode_length * sizeof(char));
-  if (isPairedReads > 0) {
+  if (is_DualIndexingReads > 0) {
+    this_barcode_2 = (char *)malloc(barcode2_length * sizeof(char));
+  }
+  if (is_PairedReads > 0) {
     this_barcode_rev = (char *)malloc(barcode_length_rev * sizeof(char));
   }
   this_hairpin = (char *)malloc(hairpin_length * sizeof(char));
@@ -380,7 +449,7 @@ Process_Hairpin_Reads(char *filename, char *filename2){
   int hairpin_index;
 
   while ((readline = fgets(line, len, fin)) != NULL){
-    if (isPairedReads > 0) {
+    if (is_PairedReads > 0) {
       readline2 = fgets(line2, len, finRev);
 	  if (readline2 == NULL) {
 	    break;
@@ -398,15 +467,18 @@ Process_Hairpin_Reads(char *filename, char *filename2){
     num_read_thisfile++;
   
     strncpy(this_barcode_for, line + barcode_start - 1, barcode_length);
-    if (isPairedReads > 0){    
+    if (is_PairedReads > 0){    
       strncpy(this_barcode_rev, line2 + barcode_start_rev - 1, barcode_length_rev);
       barcode_index = locate_barcode_paired(this_barcode_for, this_barcode_rev); 
+    } else if (is_DualIndexingReads > 0) {
+      strncpy(this_barcode_2, line + barcode2_start - 1, barcode2_length);
+      barcode_index = locate_barcode_dualIndexing(this_barcode_for, this_barcode_2);
     } else { 
       barcode_index = locate_barcode(this_barcode_for);
     }
 
     strncpy(this_hairpin, line + hairpin_start - 1, hairpin_length);
-    hairpin_index = locate_hairpin(this_hairpin, line, 0); // not allowing mismatch
+    hairpin_index = locate_hairpin(this_hairpin, line); 
 
     if (barcode_index > 0){
       barcodecount++;
@@ -414,7 +486,6 @@ Process_Hairpin_Reads(char *filename, char *filename2){
 
     if (hairpin_index > 0){
       hairpincount++;
-      hairpinreadcount[hairpin_index]++;
     }
        
     if ((barcode_index > 0) && (hairpin_index > 0)) {
@@ -425,7 +496,7 @@ Process_Hairpin_Reads(char *filename, char *filename2){
   } // end while
 
   if (isverbose > 0) {
-    if (isPairedReads > 0) {
+    if (is_PairedReads > 0) {
       Rprintf("Number of reads in file %s and %s: %ld\n", filename, filename2, num_read_thisfile);  
     } else {
       Rprintf("Number of reads in file %s : %ld\n", filename, num_read_thisfile);  
@@ -437,175 +508,37 @@ Process_Hairpin_Reads(char *filename, char *filename2){
   free(this_barcode_for);
   free(this_hairpin); 
 
-  if (isPairedReads > 0){  
+  if (is_PairedReads > 0){  
     fclose(finRev);
     free(line2);
     free(this_barcode_rev);
   }
 }
 
-void
-Create_Mismatch_Hairpins_List(void){
-  int i;
-  num_mismatch_hairpin = 0;
-
-  for (i = 1; i <= num_hairpin; i++){
-    if (hairpinreadcount[i] == 0){
-      num_mismatch_hairpin++;
-      mismatch_hairpins[num_mismatch_hairpin] = hairpins[i];
-    }
-  }
-  Rprintf("\nThere are %d hairpins without exact sequence match.\n", num_mismatch_hairpin);
-}
-
-
-void
-Process_Mismatch(char *filename, char *filename2){
-
-  FILE *fin = NULL;
-  FILE *finRev = NULL;
-  char * line = NULL;
-  char * line2 = NULL;
-  size_t len = 1000;
-  char *readline;
-  char *readline2;
-  long num_read_thisfile = 0;
-
-  char * this_hairpin;
-  char * this_barcode_for;
-  char * this_barcode_rev = NULL;
-
-  line = (char *)malloc(len+1);
-  fin = fopen(filename,"r");
-  if (isPairedReads > 0) {
-    finRev = fopen(filename2, "r");
-    line2 = (char *)malloc(sizeof(char) * (len+1));
-  }
-  if (isverbose > 0){
-    if (isPairedReads > 0) {
-      Rprintf("Re-processing reads in %s and %s, considering sequence mismatch\n", filename, filename2);
-    } else {
-      Rprintf("Re-processing reads in %s, considering sequence mismatch\n", filename);
-    }
-  }
-
-  this_barcode_for = (char *)malloc(barcode_length * sizeof(char));
-  if (isPairedReads > 0) {
-    this_barcode_rev = (char *)malloc(barcode_length_rev * sizeof(char));
-  }
-  this_hairpin = (char *)malloc(hairpin_length * sizeof(char));
-
-  long line_count = 0;
-
-  int barcode_index;
-  int hairpin_index;
-
-  int new_barcode_index;
-  int new_hairpin_index;
-
-  while ((readline = fgets(line, len, fin)) != NULL){
-    if (isPairedReads > 0) {
-      readline2 = fgets(line2, len, finRev);
-	  if (readline2 == NULL) {
-	    break;
-	  }
-    }
-
-    line_count++;
-    if ((line_count % 4) != 2) {
-      continue;
-    }
-
-    if ((isverbose > 0) && (num_read_thisfile % BLOCKSIZE == 0)) {
-      Rprintf(" -- Processing %d million reads\n", (num_read_thisfile / BLOCKSIZE + 1) * 10);
-    }
-    num_read++;
-    num_read_thisfile++;
-
-    // re-do the mapping in Process_Hairpin_Reads()
-    strncpy(this_barcode_for, line + barcode_start - 1, barcode_length);
-    if (isPairedReads > 0){    
-      strncpy(this_barcode_rev, line2 + barcode_start_rev - 1, barcode_length_rev);
-      barcode_index = locate_barcode_paired(this_barcode_for, this_barcode_rev); 
-    } else { 
-      barcode_index = locate_barcode(this_barcode_for);
-    }
-
-    strncpy(this_hairpin, line + hairpin_start - 1, hairpin_length);
-    hairpin_index = locate_hairpin(this_hairpin, line, 0); //not allowing mismatch
-
-    // only re-process reads without perfect hairpin match or without perfect barcode match;    
-    if ((barcode_index > 0) && (hairpin_index > 0)) {
-      continue;
-    }
-
-    if (barcode_index > 0) {
-      new_barcode_index = barcode_index;
-    } else {
-      if (isPairedReads > 0){
-        new_barcode_index = locate_mismatch_barcode_paired(this_barcode_for, this_barcode_rev);
-      } else {
-        new_barcode_index = locate_mismatch_barcode(this_barcode_for);
-      }
-
-      if (new_barcode_index > 0) {
-	barcodecount++;
-      }
-    } 
- 
-    // re-match hairpin:
-    if (hairpin_index > 0){
-      new_hairpin_index = hairpin_index;
-    } else {
-      new_hairpin_index = locate_hairpin(this_hairpin, line, 1);
-
-      if (new_hairpin_index > 0) {
-        hairpincount++;
-      } 
-    }
-  
-    if ((new_barcode_index > 0) && (new_hairpin_index > 0)) {
-      summary[new_hairpin_index][new_barcode_index]++;
-      bchpcount++;
-    }
-  } // end while 
-
-  fclose(fin); 
-  free(line);
-  free(this_barcode_for);
-  free(this_hairpin); 
-
-  if (isPairedReads > 0){  
-    fclose(finRev);
-    free(line2);
-    free(this_barcode_rev);
-  }
-
-}
-
 
 void 
-Initialise(int IsPaired, int barcodestart, int barcodeend, int barcodestartrev, int barcodeendrev, int hairpinstart, int hairpinend, 
-	   int allowshifting, int shiftingnbase,
-	   int allowMismatch, int barcodemismatch, int hairpinmismatch, 
-	   int allowShiftedMismatch, int verbose){
-  int i, j;
-  for(i = 0; i < MAX_HAIRPIN; i++) {
-    for(j = 0; j < MAX_BARCODE; j++) {
-      summary[i][j] = 0;
-    }
-  }
+Initialise(int IsPaired, int IsDualIndexing, 
+           int barcodestart, int barcodeend, int barcode2start, int barcode2end, int barcodestartrev, int barcodeendrev, 
+           int hairpinstart, int hairpinend, 
+           int allowshifting, int shiftingnbase,
+           int allowMismatch, int barcodemismatch, int hairpinmismatch, 
+           int allowShiftedMismatch, int verbose){
+			   
   num_barcode = 0;
   num_hairpin = 0;
 
-  isPairedReads = IsPaired;
+  is_PairedReads = IsPaired;
+  is_DualIndexingReads = IsDualIndexing;
   barcode_start = barcodestart;
   barcode_end = barcodeend;
+  barcode2_start = barcode2start;
+  barcode2_end = barcode2end;
   barcode_start_rev = barcodestartrev;
   barcode_end_rev = barcodeendrev;
   hairpin_start = hairpinstart;
   hairpin_end = hairpinend;
   barcode_length = barcode_end - barcode_start + 1;
+  barcode2_length = barcode2_end - barcode2_start + 1;
   barcode_length_rev = barcode_end_rev - barcode_start_rev + 1;
   hairpin_length = hairpin_end - hairpin_start + 1;
 
@@ -657,8 +590,11 @@ Clean_Up(void){
   int index;
   for (index = 1; index <= num_barcode; index++){
     free(barcodes[index]->sequence);
-    if (isPairedReads > 0){
+    if (is_PairedReads > 0){
       free(barcodes[index]->sequenceRev);
+    } 
+    if (is_DualIndexingReads > 0){
+      free(barcodes[index]->sequence2);
     }
     free(barcodes[index]);
   }
@@ -666,23 +602,45 @@ Clean_Up(void){
     free(hairpins[index]->sequence);
     free(hairpins[index]);
   }
+  
+  for (index = 0; index <= num_hairpin; index++){
+    free(summary[index]);
+  }
+  free(summary);
 }
 
-
+void
+Allocate_Summary_Table(void){
+  int i, j;
+  
+  summary = (long **)malloc((num_hairpin+1) * sizeof(long *));
+  for (i=0; i <= num_hairpin; i++){
+    summary[i] = (long *)malloc((num_barcode+1) * sizeof(long));
+  }
+ 
+  for (i = 0; i <= num_hairpin; i++){
+    for (j = 0; j <= num_barcode; j++){ 
+      summary[i][j] = 0;  
+	}
+  }
+}
 void 
-processHairpinReads(int *isPairedReads,
+processHairpinReads(int *isPairedReads, int *isDualIndexingReads, 
                     char **file, char **file2, int *filecount, 
-		    char**barcodeseqs, char**hairpinseqs, 
-		    int *barcodestart, int *barcodeend, int *barcodestartrev, int *barcodeendrev, int *hairpinstart, int *hairpinend, 
-		    int *allowShifting, int *shiftingnbase, 
-		    int *allowMismatch, int *barcodemismatch, int *hairpinmismatch,
-		    int *allowShiftedMismatch,
-		    char **output, int *verbose)
+                    char**barcodeseqs, char**hairpinseqs, 
+                    int *barcodestart, int *barcodeend, int *barcode2start, int *barcode2end, int *barcodestartrev, int *barcodeendrev, 
+                    int *hairpinstart, int *hairpinend, 
+                    int *allowShifting, int *shiftingnbase, 
+                    int *allowMismatch, int *barcodemismatch, int *hairpinmismatch,
+                    int *allowShiftedMismatch,
+                    char **output, int *verbose)
 {  
-  Initialise(*isPairedReads, *barcodestart, *barcodeend, *barcodestartrev, *barcodeendrev, *hairpinstart, *hairpinend, 
-	     *allowShifting, *shiftingnbase,
-	     *allowMismatch, *barcodemismatch, *hairpinmismatch, 
-	     *allowShiftedMismatch, *verbose);
+  Initialise(*isPairedReads, *isDualIndexingReads, 
+             *barcodestart, *barcodeend, *barcode2start, *barcode2end, *barcodestartrev, *barcodeendrev, 
+             *hairpinstart, *hairpinend, 
+             *allowShifting, *shiftingnbase,
+             *allowMismatch, *barcodemismatch, *hairpinmismatch, 
+             *allowShiftedMismatch, *verbose);
 
   Read_In_Barcodes(*barcodeseqs);
   Sort_Barcodes(); // () is necessary to invoke function without parameters
@@ -692,30 +650,20 @@ processHairpinReads(int *isPairedReads,
   Check_Hairpins();
   Sort_Hairpins();  
   
+  Allocate_Summary_Table();
+  
   int i_file;
 
   for (i_file = 0; i_file < *filecount; i_file++){
     Process_Hairpin_Reads(file[i_file], file2[i_file]);
   }
  
-  if (allow_mismatch > 0){
-    num_read = 0; 
-    // reset total number of read to 0, recheck initial barcode/hairpin index
-    Create_Mismatch_Hairpins_List();
-    if (num_mismatch_hairpin > 0){
-      for (i_file = 0; i_file < *filecount; i_file++){
-	if (isPairedReads > 0) {
-          Process_Mismatch(file[i_file], file2[i_file]);
-        } else {
-          Process_Mismatch(file[i_file], NULL);
-        }
-      }
-    }  
-  }
-
   Rprintf("\nThe input run parameters are: \n");
   Rprintf(" -- Barcode: start position %d\t end position %d\t length %d\n", barcode_start, barcode_end, barcode_length);  
-  if (isPairedReads > 0){
+  if (is_DualIndexingReads){
+    Rprintf(" -- Second Barcode in forward read: start position %d\t end position %d\t length %d\n", barcode2_start, barcode2_end, barcode2_length); 
+  }
+  if (is_PairedReads){
     Rprintf(" -- Barcode in reverse read: start position %d\t end position %d\t length %d\n", barcode_start_rev, barcode_end_rev, barcode_length_rev); 
   }
   Rprintf(" -- Hairpin: start position %d\t end position %d\t length %d\n", hairpin_start, hairpin_end, hairpin_length); 
@@ -739,10 +687,3 @@ processHairpinReads(int *isPairedReads,
 
   Clean_Up();
 }
-
-
-
-
-
-
-       
diff --git a/src/init.cpp b/src/init.cpp
index d91e718..94f6c9a 100644
--- a/src/init.cpp
+++ b/src/init.cpp
@@ -11,13 +11,13 @@ static const R_CallMethodDef all_call_entries[] = {
 	CALLDEF(R_levenberg, 11),
 	CALLDEF(R_loess_by_col, 4),
 	CALLDEF(R_maximize_interpolant, 2),
-	CALLDEF(R_one_group, 7),
+	CALLDEF(R_one_group, 9),
 	CALLDEF(R_simple_good_turing, 3),
 	{NULL, NULL, 0}
 };
 
 R_CMethodDef all_c_entries[] = {
-    {"processHairpinReads", (DL_FUNC) &processHairpinReads, 20},
+    {"processHairpinReads", (DL_FUNC) &processHairpinReads, 23},
     {NULL, NULL, 0}
   };
 
diff --git a/src/utils.h b/src/utils.h
index 98a690a..69aaa5c 100644
--- a/src/utils.h
+++ b/src/utils.h
@@ -36,14 +36,16 @@ SEXP R_loess_by_col(SEXP, SEXP, SEXP, SEXP);
 
 SEXP R_maximize_interpolant(SEXP, SEXP);
 
-SEXP R_one_group (SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP);
+SEXP R_one_group (SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP);
 
 SEXP R_simple_good_turing (SEXP, SEXP, SEXP);
 
-void processHairpinReads(int *, char**, char**, int*, 
-                char**, char**, int*, int*, int*, int*, 
-                int*, int*, int*, int*, int*, int*, 
-                int *, int *, char**, int*);
+
+void processHairpinReads(int *, int *, char**, char**, int*,
+		char**, char**, int*, int*, int*, int*, int*, int*,
+		int*, int*, int*, int*, int*, int*, int *,
+		int *, char**, int*);
+
 }
 
 #endif
diff --git a/tests/edgeR-Tests.Rout.save b/tests/edgeR-Tests.Rout.save
index b0c9742..d9479a2 100644
--- a/tests/edgeR-Tests.Rout.save
+++ b/tests/edgeR-Tests.Rout.save
@@ -147,177 +147,7 @@ Tag.21 -1.7266049 13.38327 0.03545446 0.3899990
 Tag.6  -1.6329841 12.81479 0.10632987 0.7797524
 Tag.2   4.0861092 11.54121 0.16057893 0.8831841
 Tag.16  0.9324935 13.57074 0.26348818 0.9658389
-Tag.20  0.8543140 13.76364 0.31674090 0.9658389
-Tag.19 -0.7976354 13.31405 0.35564858 0.9658389
-Tag.3  -0.7300593 13.54155 0.38833737 0.9658389
-Tag.12  0.7081041 14.31389 0.41513004 0.9658389
-Tag.8  -0.7918152 12.86353 0.48483449 0.9658389
-> 
-> # mglmOneWay
-> design <- model.matrix(~group,data=d$samples)
-> mglmOneWay(d[1:10,],design,dispersion=0.2)
-$coefficients
-        (Intercept)        group2
- [1,] -1.000000e+08  0.000000e+00
- [2,] -1.000000e+08  1.000000e+08
- [3,]  2.525729e+00 -5.108256e-01
- [4,]  2.525729e+00  1.484200e-01
- [5,]  2.140066e+00 -1.941560e-01
- [6,]  2.079442e+00 -1.163151e+00
- [7,]  2.014903e+00  2.363888e-01
- [8,]  1.945910e+00 -5.596158e-01
- [9,]  1.504077e+00  2.006707e-01
-[10,]  2.302585e+00  2.623643e-01
-
-$fitted.values
-      [,1] [,2] [,3] [,4]
- [1,]  0.0  0.0  0.0  0.0
- [2,]  0.0  0.0  2.0  2.0
- [3,] 12.5 12.5  7.5  7.5
- [4,] 12.5 12.5 14.5 14.5
- [5,]  8.5  8.5  7.0  7.0
- [6,]  8.0  8.0  2.5  2.5
- [7,]  7.5  7.5  9.5  9.5
- [8,]  7.0  7.0  4.0  4.0
- [9,]  4.5  4.5  5.5  5.5
-[10,] 10.0 10.0 13.0 13.0
-
-> mglmOneWay(d[1:10,],design,dispersion=0)
-$coefficients
-        (Intercept)        group2
- [1,] -1.000000e+08  0.000000e+00
- [2,] -1.000000e+08  1.000000e+08
- [3,]  2.525729e+00 -5.108256e-01
- [4,]  2.525729e+00  1.484200e-01
- [5,]  2.140066e+00 -1.941560e-01
- [6,]  2.079442e+00 -1.163151e+00
- [7,]  2.014903e+00  2.363888e-01
- [8,]  1.945910e+00 -5.596158e-01
- [9,]  1.504077e+00  2.006707e-01
-[10,]  2.302585e+00  2.623643e-01
-
-$fitted.values
-      [,1] [,2] [,3] [,4]
- [1,]  0.0  0.0  0.0  0.0
- [2,]  0.0  0.0  2.0  2.0
- [3,] 12.5 12.5  7.5  7.5
- [4,] 12.5 12.5 14.5 14.5
- [5,]  8.5  8.5  7.0  7.0
- [6,]  8.0  8.0  2.5  2.5
- [7,]  7.5  7.5  9.5  9.5
- [8,]  7.0  7.0  4.0  4.0
- [9,]  4.5  4.5  5.5  5.5
-[10,] 10.0 10.0 13.0 13.0
-
-> 
-> fit <- glmFit(d,design,dispersion=d$common.dispersion,prior.count=0.5/4)
-> lrt <- glmLRT(fit,coef=2)
-> topTags(lrt)
-Coefficient:  group2 
-            logFC   logCPM        LR     PValue       FDR
-Tag.17  2.0450964 13.73726 6.0485417 0.01391779 0.3058698
-Tag.2   4.0861092 11.54121 4.8400340 0.02780635 0.3058698
-Tag.21 -1.7265870 13.38327 4.0777825 0.04345065 0.3186381
-Tag.6  -1.6329986 12.81479 3.0078205 0.08286364 0.4557500
-Tag.16  0.9324996 13.57074 1.3477682 0.24566867 0.8276702
-Tag.20  0.8543138 13.76364 1.1890032 0.27553071 0.8276702
-Tag.19 -0.7976602 13.31405 0.9279151 0.33540526 0.8276702
-Tag.12  0.7081170 14.31389 0.9095513 0.34023349 0.8276702
-Tag.3  -0.7300410 13.54155 0.8300307 0.36226364 0.8276702
-Tag.8  -0.7917906 12.86353 0.7830377 0.37621371 0.8276702
-> 
-> fit <- glmFit(d,design,dispersion=d$common.dispersion,prior.count=0.5)
-> summary(fit$coef)
-  (Intercept)         group2        
- Min.   :-7.604   Min.   :-1.13681  
- 1st Qu.:-4.895   1st Qu.:-0.32341  
- Median :-4.713   Median : 0.15083  
- Mean   :-4.940   Mean   : 0.07817  
- 3rd Qu.:-4.524   3rd Qu.: 0.35163  
- Max.   :-4.107   Max.   : 1.60864  
-> 
-> fit <- glmFit(d,design,prior.count=0.5/4)
-> lrt <- glmLRT(fit,coef=2)
-> topTags(lrt)
-Coefficient:  group2 
-            logFC   logCPM        LR     PValue       FDR
-Tag.17  2.0450964 13.73726 6.0485417 0.01391779 0.3058698
-Tag.2   4.0861092 11.54121 4.8400340 0.02780635 0.3058698
-Tag.21 -1.7265870 13.38327 4.0777825 0.04345065 0.3186381
-Tag.6  -1.6329986 12.81479 3.0078205 0.08286364 0.4557500
-Tag.16  0.9324996 13.57074 1.3477682 0.24566867 0.8276702
-Tag.20  0.8543138 13.76364 1.1890032 0.27553071 0.8276702
-Tag.19 -0.7976602 13.31405 0.9279151 0.33540526 0.8276702
-Tag.12  0.7081170 14.31389 0.9095513 0.34023349 0.8276702
-Tag.3  -0.7300410 13.54155 0.8300307 0.36226364 0.8276702
-Tag.8  -0.7917906 12.86353 0.7830377 0.37621371 0.8276702
-> 
-> dglm <- estimateGLMCommonDisp(d,design)
-> dglm$common.dispersion
-[1] 0.2033282
-> dglm <- estimateGLMTagwiseDisp(dglm,design,prior.df=20)
-> summary(dglm$tagwise.dispersion)
-   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
- 0.1756  0.1879  0.1998  0.2031  0.2135  0.2578 
-> fit <- glmFit(dglm,design,prior.count=0.5/4)
-> lrt <- glmLRT(fit,coef=2)
-> topTags(lrt)
-Coefficient:  group2 
-            logFC   logCPM        LR      PValue       FDR
-Tag.17  2.0450988 13.73726 6.8001118 0.009115216 0.2005348
-Tag.2   4.0861092 11.54121 4.8594088 0.027495756 0.2872068
-Tag.21 -1.7265904 13.38327 4.2537154 0.039164558 0.2872068
-Tag.6  -1.6329904 12.81479 3.1763761 0.074710253 0.4109064
-Tag.16  0.9324970 13.57074 1.4126709 0.234613512 0.8499599
-Tag.20  0.8543183 13.76364 1.2721097 0.259371274 0.8499599
-Tag.19 -0.7976614 13.31405 0.9190392 0.337727381 0.8499599
-Tag.12  0.7081163 14.31389 0.9014515 0.342392806 0.8499599
-Tag.3  -0.7300488 13.54155 0.8817937 0.347710872 0.8499599
-Tag.8  -0.7918166 12.86353 0.7356185 0.391068049 0.8603497
-> dglm <- estimateGLMTrendedDisp(dglm,design)
-> summary(dglm$trended.dispersion)
-   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
- 0.1522  0.1676  0.1740  0.1887  0.2000  0.3469 
-> dglm <- estimateGLMTrendedDisp(dglm,design,method="power")
-> summary(dglm$trended.dispersion)
-   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
- 0.1522  0.1676  0.1740  0.1887  0.2000  0.3469 
-> dglm <- estimateGLMTrendedDisp(dglm,design,method="spline")
-Loading required package: splines
-> summary(dglm$trended.dispersion)
-   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.09353 0.11080 0.15460 0.19010 0.23050 0.52010 
-> dglm <- estimateGLMTrendedDisp(dglm,design,method="bin.spline")
-> summary(dglm$trended.dispersion)
-   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
- 0.1997  0.1997  0.1997  0.1997  0.1997  0.1997 
-> dglm <- estimateGLMTagwiseDisp(dglm,design,prior.df=20)
-> summary(dglm$tagwise.dispersion)
-   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
- 0.1385  0.1792  0.1964  0.1935  0.2026  0.2709 
-> 
-> dglm2 <- estimateDisp(dglm, design)
-> summary(dglm2$tagwise.dispersion)
-   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
- 0.1766  0.1789  0.1814  0.1846  0.1870  0.2119 
-> dglm2 <- estimateDisp(dglm, design, prior.df=20)
-> summary(dglm2$tagwise.dispersion)
-   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
- 0.1527  0.1669  0.1814  0.1858  0.1951  0.2497 
-> dglm2 <- estimateDisp(dglm, design, robust=TRUE)
-Loading required package: statmod
-> summary(dglm2$tagwise.dispersion)
-   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
- 0.1766  0.1789  0.1814  0.1846  0.1870  0.2119 
-> 
-> # Continuous trend
-> nlibs <- 3
-> ntags <- 1000
-> dispersion.true <- 0.1
-> # Make first transcript respond to covariate x
-> x <- 0:2
-> design <- model.matrix(~x)
-> beta.true <- cbind(Beta1=2,Beta2=c(2,rep(0,ntags-1)))
+Tag.20
 > mu.true <- 2^(beta.true %*% t(design))
 > # Generate count data
 > y <- rnbinom(ntags*nlibs,mu=mu.true,size=1/dispersion.true)
@@ -562,3 +392,52 @@ $n0
 > proc.time()
    user  system elapsed 
    3.58    0.04    5.32 
+                                                                                                                                                                                                                                                                                                                                                                   edgeR/vignettes/                                                                                    0000755 0001263 0001264 00000000000 1 [...]
+%\VignetteKeyword{RNA-Seq}
+%\VignetteKeyword{differential expression}
+%\VignettePackage{edgeR}
+\documentclass[12pt]{article}
+
+\textwidth=6.2in
+\textheight=8.5in
+\oddsidemargin=0.2in
+\evensidemargin=0.2in
+\headheight=0in
+\headsep=0in
+
+\begin{document}
+
+\title{edgeR Package Introduction}
+\author{Yunshun Chen, Davis McCarthy, Aaron Lun,\\
+Xiaobei Zhou, Mark Robinson, Gordon K.\ Smyth}
+\date{10 October 2012\\
+Revised 8 October 2014}
+\maketitle
+
+
+edgeR is a package for the differential expression analysis of digital gene expression data,
+that is, of count data arising from DNA sequencing technologies.
+It is especially designed for differential expression analyses of RNA-Seq or SAGE data,
+or differential marking analyses of ChIP-Seq data.
+
+edgeR implements novel statistical methods based on the negative binomial distribution
+as a model for count variability, including empirical Bayes methods, exact tests, and generalized linear models.
+The package is especially suitable for analysing designed experiments with multiple
+experimental factors but possibly small numbers of replicates.
+It has unique abilities to model transcript specific variation even in small samples,
+a capability essential for prioritizing genes or transcripts that have consistent effects across replicates.
+
+The full edgeR User's Guide is available as part of the online documentation.
+To reach the User's Guide, install the edgeR package and load it into an R session by \texttt{library(edgeR)}.
+In R for Windows, the User's Guide will then be available from the drop-down menu called ``Vignettes''.
+In other operating systems, type
+\begin{Schunk}
+\begin{Sinput}
+> library(edgeR)
+> edgeRUsersGuide()
+\end{Sinput}
+\end{Schunk}
+at the R prompt to open the User's Guide in a pdf viewer.
+
+\end{document}
+                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             [...]
\ No newline at end of file
diff --git a/vignettes/edgeR.Rnw b/vignettes/edgeR.Rnw
deleted file mode 100755
index af5781e..0000000
--- a/vignettes/edgeR.Rnw
+++ /dev/null
@@ -1,48 +0,0 @@
-%\VignetteIndexEntry{edgeR Vignette}
-%\VignetteKeyword{RNA-Seq}
-%\VignetteKeyword{differential expression}
-%\VignettePackage{edgeR}
-\documentclass[12pt]{article}
-
-\textwidth=6.2in
-\textheight=8.5in
-\oddsidemargin=0.2in
-\evensidemargin=0.2in
-\headheight=0in
-\headsep=0in
-
-\begin{document}
-
-\title{edgeR Package Introduction}
-\author{Yunshun Chen, Davis McCarthy, Aaron Lun,\\
-Xiaobei Zhou, Mark Robinson, Gordon K.\ Smyth}
-\date{10 October 2012\\
-Revised 8 October 2014}
-\maketitle
-
-
-edgeR is a package for the differential expression analysis of digital gene expression data,
-that is, of count data arising from DNA sequencing technologies.
-It is especially designed for differential expression analyses of RNA-Seq or SAGE data,
-or differential marking analyses of ChIP-Seq data.
-
-edgeR implements novel statistical methods based on the negative binomial distribution
-as a model for count variability, including empirical Bayes methods, exact tests, and generalized linear models.
-The package is especially suitable for analysing designed experiments with multiple
-experimental factors but possibly small numbers of replicates.
-It has unique abilities to model transcript specific variation even in small samples,
-a capability essential for prioritizing genes or transcripts that have consistent effects across replicates.
-
-The full edgeR User's Guide is available as part of the online documentation.
-To reach the User's Guide, install the edgeR package and load it into an R session by \texttt{library(edgeR)}.
-In R for Windows, the User's Guide will then be available from the drop-down menu called ``Vignettes''.
-In other operating systems, type
-\begin{Schunk}
-\begin{Sinput}
-> library(edgeR)
-> edgeRUsersGuide()
-\end{Sinput}
-\end{Schunk}
-at the R prompt to open the User's Guide in a pdf viewer.
-
-\end{document}

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-bioc-edger.git



More information about the debian-med-commit mailing list