[med-svn] r9642 - in trunk/packages/R/r-cran-haplo.stats/trunk/debian: . doc_source
Andreas Tille
tille at alioth.debian.org
Fri Feb 10 22:52:57 UTC 2012
Author: tille
Date: 2012-02-10 22:52:56 +0000 (Fri, 10 Feb 2012)
New Revision: 9642
Added:
trunk/packages/R/r-cran-haplo.stats/trunk/debian/README.Debian
trunk/packages/R/r-cran-haplo.stats/trunk/debian/doc-base
trunk/packages/R/r-cran-haplo.stats/trunk/debian/doc_source/
trunk/packages/R/r-cran-haplo.stats/trunk/debian/doc_source/manualHaploStats.Rnw
trunk/packages/R/r-cran-haplo.stats/trunk/debian/docs
Modified:
trunk/packages/R/r-cran-haplo.stats/trunk/debian/changelog
trunk/packages/R/r-cran-haplo.stats/trunk/debian/rules
Log:
Enhance documentation including source provided by upstream; upload to unstable
Added: trunk/packages/R/r-cran-haplo.stats/trunk/debian/README.Debian
===================================================================
--- trunk/packages/R/r-cran-haplo.stats/trunk/debian/README.Debian (rev 0)
+++ trunk/packages/R/r-cran-haplo.stats/trunk/debian/README.Debian 2012-02-10 22:52:56 UTC (rev 9642)
@@ -0,0 +1,5 @@
+Notes on how this package can be tested.
+────────────────────────────────────────
+
+This package can be tested by loading it into R with the command
+‘library(haplo.stats)’ in order to confirm its integrity.
Modified: trunk/packages/R/r-cran-haplo.stats/trunk/debian/changelog
===================================================================
--- trunk/packages/R/r-cran-haplo.stats/trunk/debian/changelog 2012-02-10 15:27:06 UTC (rev 9641)
+++ trunk/packages/R/r-cran-haplo.stats/trunk/debian/changelog 2012-02-10 22:52:56 UTC (rev 9642)
@@ -1,4 +1,4 @@
-r-cran-haplo.stats (1.5.2-1) UNRELEASED; urgency=low
+r-cran-haplo.stats (1.5.2-1) unstable; urgency=low
[ Charles Plessy ]
* debian/upstream-metadata.yaml -> debian/upstream
@@ -14,9 +14,13 @@
* debian/rules: use usual R:depends variable substitution
* debian/lintian-overrides: do not warn about duplicate license
* debian/copyright: DEP5
+ * debian/doc_source/manualHaploStats.Rnw: Source for user
+ manual provided by upstream
+ Closes: #655439
+ * debian/docs: Install user manual
+ * debian/doc-base: Doc-base control file
+ * debian/README.Debian: Hint how to test the package
- TODO: Clarify #655439 (just sendet mail to upstream)
-
-- Andreas Tille <tille at debian.org> Fri, 10 Feb 2012 15:54:27 +0100
r-cran-haplo.stats (1.4.4-1) unstable; urgency=low
Added: trunk/packages/R/r-cran-haplo.stats/trunk/debian/doc-base
===================================================================
--- trunk/packages/R/r-cran-haplo.stats/trunk/debian/doc-base (rev 0)
+++ trunk/packages/R/r-cran-haplo.stats/trunk/debian/doc-base 2012-02-10 22:52:56 UTC (rev 9642)
@@ -0,0 +1,13 @@
+Document: r-cran-haplo.stats
+Title: Statistical Methods for Haplotypes When Linkage Phase is Ambiguous
+Author: Jason P. Sinnwell∗and Daniel J. Schaid
+Abstract: GNU R package for haplotype analysis
+ Haplo Stats is a suite of R routines for the analysis of indirectly
+ measured haplotypes. The statistical methods assume that all subjects
+ are unrelated and that haplotypes are ambiguous (due to unknown
+ linkage phase of the genetic markers), while also allowing for missing
+ alleles.
+Section: Science/Biology
+
+Format: pdf
+Files: /usr/share/doc/r-cran-haplo.stats/manualHaploStats.pdf
Added: trunk/packages/R/r-cran-haplo.stats/trunk/debian/doc_source/manualHaploStats.Rnw
===================================================================
--- trunk/packages/R/r-cran-haplo.stats/trunk/debian/doc_source/manualHaploStats.Rnw (rev 0)
+++ trunk/packages/R/r-cran-haplo.stats/trunk/debian/doc_source/manualHaploStats.Rnw 2012-02-10 22:52:56 UTC (rev 9642)
@@ -0,0 +1,2066 @@
+%\VignetteIndexEntry{haplo.stats}
+%\VignetteKeywords{haplotype, score, glm}
+%\VignetteDepends{haplo.stats}
+%\VignettePackage{haplo.stats}
+
+\RequirePackage[T1]{fontenc}
+\RequirePackage{graphicx,ae,fancyvrb}
+
+\IfFileExists{upquote.sty}{\RequirePackage{upquote}}{}
+\setkeys{Gin}{width=0.8\textwidth}
+
+
+\DefineVerbatimEnvironment{Sinput}{Verbatim}{fontfamily=courier,
+ fontshape=sl, fontseries=b, xleftmargin=.25cm}
+
+\DefineVerbatimEnvironment{Soutput}{Verbatim}{fontfamily=courier,
+ fontshape=sl, xleftmargin=.5cm}
+
+
+\documentclass[11pt]{article}
+\def\version{version 1.5.0}
+\def\Rversion{version 2.14.0}
+\def\package{{\sl haplo.stats}}
+\def\Explain{\paragraph{\large {\bf Explanation of Results \vspace{.1in}} \\}}
+
+\usepackage{Sweave}
+\usepackage{fullpage}
+
+%\textwidth 6.5in
+%\oddsidemargin 0in
+%\evensidemargin 0in
+%\textheight 7.7in
+
+\title{{\bf Haplo Stats} \\
+(\version) \vspace{.25in}\\
+Statistical Methods for Haplotypes When
+Linkage Phase is Ambiguous \vspace{1.5in}\\
+}
+
+\author{Jason P. Sinnwell\thanks{sinnwell at mayo.edu} and Daniel J. Schaid \\
+Mayo Clinic Division of Health Sciences Research\\Rochester MN USA \vspace{.25in} \\}
+
+\date{{\today} \vspace{2in}}
+\begin{document}
+
+\maketitle
+
+\pagebreak
+\tableofcontents
+\pagebreak
+
+\section{Introduction}
+
+Haplo Stats is a suite of R routines for the analysis of indirectly
+measured haplotypes. The statistical methods assume that all subjects
+are unrelated and that haplotypes are ambiguous (due to unknown linkage
+phase of the genetic markers), while also allowing for missing alleles.
+
+
+The user-level functions in Haplo Stats are:
+\begin{itemize}
+ \item{{\sl haplo.em:}\ \ for the estimation of haplotype frequencies
+ and posterior probabilities of haplotype pairs for each subject,
+ conditional on the observed marker data}
+ \item{{\sl haplo.glm:}\ \ generalized linear models for the regression of
+ a trait on haplotypes, with the option of including covariates and
+ interactions}
+ \item{{\sl haplo.score:}\ \ score statistics to test associations
+ between haplotypes and a variety of traits, including binary, ordinal,
+ quantitative, and Poisson}
+ \item{{\sl haplo.score.slide:} \ \ haplo.score computed on sub-haplotypes
+ of a larger region}
+ \item{{\sl seqhap:}\ \ sequentially scan markers in enlarging a haplotype
+ for association with a trait}
+ \item{{\sl haplo.cc:} \ \ run a combined analysis for haplotype
+ frequencies, scores, and regression results for a case-control study}
+ \item{{\sl haplo.power.qt/haplo.power.cc:}\ \ power or sample size
+ calculatins for quantitative or binary trait}
+ \item{{\sl haplo.scan:}\ \ search for a trait locus for all sizes
+ of sub-haplotypes within a fixed maximum window width for all
+ markers in a region}
+ \item{{\sl haplo.design:} \ \ create a design matrix for haplotype
+ effects}
+\end{itemize}
+
+\noindent This manual explains the basic and advanced usage of these
+routines, with guidelines for running the analyses and interpreting results.
+We provide many of these details in the function help pages, which are
+accessed within an R session using
+{\sl help(haplo.em)}, for example. We also provide brief examples in the
+help files, which can be run in the R session with {\sl example(haplo.em)}.
+
+\subsection{Updates}
+This version of Haplo Stats contains updates to {\sl haplo.glm}
+in section~\ref{hapGLM} and new methods written for it that resemble glm
+class methods. These methods include residuals, fitted.values, vcov, and
+anova, and they are detailed in section~\ref{glmMethods}. For full history
+of updates see the NEWS file, or type {\sl news(package=''haplo.stats'')}
+in the R command prompt.
+
+\subsection{Operating System and Installation}
+Haplo Stats \version\ is written for R (\Rversion). It has been uploaded
+to the Comprehensive R Archive Network (CRAN), and is made available
+on various operating systems through CRAN. Package installation within R
+is made simple from within R using {\sl install.packages(``haplo.stats'')},
+but other procedures for installing R packages can be found at the R project
+website (http://www.r-project.org).
+
+\subsection{R Basics}
+For users who are new to the R environment, we demonstrate some basic
+concepts. In the following example we create a vector of character alleles and
+use the {\sl table} function to get allele counts. We first show how to save
+the results of {\sl table(snp)} into an R session variable, {\sl tab}. We show
+that {\sl tab} is an object of the {\sl table} class, and use the print and summary
+methods that are defined for {\sl table} objects. Note that when you enter just
+{\sl tab} or {\sl table(snp)} at the prompt, the print method is invoked.
+
+<<label=simpleR>>=
+snp <- c("A", "T", "T", "A", "A", "T", "T")
+
+tab <- table(snp)
+tab
+
+class(tab)
+print.table(tab)
+summary(tab)
+
+tab[2]
+table(snp)
+@
+
+\noindent The routines in \package\ are computationally intensive and
+return lots of information in the returned object. Therefore, we assign
+classes to the returned objects and provide various methods for each
+of them.
+
+\section{Data Setup}
+
+We first show some typical steps when you first load a package and look
+for details on a function of interest. In the sample code below, we
+load \package, check which functions are available in the package,
+view a help file, and run the example that is within the help file.
+
+<<getstarted, echo=TRUE, eval=FALSE>>=
+# load the library, load and preview at demo dataset
+library(haplo.stats)
+ls(name="package:haplo.stats")
+help(haplo.em)
+example(haplo.em)
+@
+
+<<hidsettings, echo=FALSE, eval=TRUE>>=
+options(width=80)
+rm(list=ls())
+library(haplo.stats, lib.loc="~/Rlib")
+@
+
+\subsection{Example Data}
+
+The \package\ package contains three example data sets. The primary data
+set used in this manual is named {\sl (hla.demo)}, which contains 11
+loci from the HLA region
+on chromosome 6, with covariates, qualitative, and quantitative
+responses. Within {\sl /haplo.stats/data/hla.demo.tab} the data is
+stored in tab-delimited format. Typically data stored in this format
+can be read in using {\sl read.table()}. Since the data is provided in the
+package, we load the data in R using {\sl data()} and view the names of
+the columns. Then to make the columns of {\sl hla.demo} accessible without
+typing it each time, we attach it to the current session.
+
+<<label=hladat, echo=TRUE>>=
+# load and preview demo dataset stored in ~/haplo.stats/data/hla.demo.tab
+data(hla.demo)
+names(hla.demo)
+# attach hla.demo to make columns available in the session
+attach(hla.demo)
+@
+
+The column names of {\sl hla.demo} are shown above. They are defined as
+follows:
+\begin{itemize}
+\item{\bf resp:\ }{quantitative antibody response to measles vaccination}
+\item{\bf resp.cat:\ }{a factor with levels "low", "normal", "high", for
+categorical antibody response}
+\item{\bf male:\ }{gender code with {\sl 1="male"}, {\sl 0="female"}}
+\item{\bf age:\ }{age (in months) at immunization}
+\end{itemize}
+
+\noindent The remaining columns are genotypes for 11 HLA loci, with a prefix
+name (e.g., "DQB") and a suffix for each of two alleles (".a1" and ".a2").
+The variables in {\sl hla.demo}\ can be accessed by typing {\sl hla.demo\$}\
+before their names, such as {\sl hla.demo\$resp}.
+Alternatively, it is easier for these examples to attach {\sl hla.demo},
+(as shown above with {\sl attach()}) so the variables can be accessed by
+simply typing their names.
+
+\subsection{Creating a Genotype Matrix}
+
+Many of the functions require a matrix of genotypes, denoted below as
+{\sl geno}. This matrix is arranged such that each locus has a pair of
+adjacent columns of alleles, and the order of columns corresponds
+to the order of loci on a chromosome. If there are K loci, then the
+number of columns of {\sl geno} is $2K$. Rows represent the alleles for each
+subject. For example, if there are three loci, in the order A-B-C,
+then the 6 columns of {\sl geno} would be arranged as A.a1, A.a2, B.a1,
+B.a2, C.a1, C.a2. For illustration, three of the loci in
+{\sl hla.demo} will be used to demonstrate some of the functions. Create
+a separate data frame for 3 of the loci, and call this {\sl geno}. Then create
+a vector of labels for the loci.
+
+<<label=makegeno, echo=TRUE>>=
+geno <- hla.demo[,c(17,18,21:24)]
+label <-c("DQB","DRB","B")
+@
+
+\noindent The {\sl hla.demo} data already had alleles in two columns for each
+locus. For many SNP datasets, the data is in a one column format,
+giving the count of the minor allele. To assist in converting this
+format to two columns, a function named {\sl geno1to2} has been added
+to the package. See its help file for more details.
+
+\subsection{Preview Missing Data: {\sl summaryGeno}}
+
+Before performing a haplotype analysis, the user will want to assess
+missing genotype data to determine the completeness of the data. If
+many genotypes are missing, the functions may take a long time to
+compute results, or even run out of memory. For these reasons, the user
+may want to remove some of the subjects with a lot of missing data.
+This step can be guided by using the {\sl summaryGeno}\ function, which checks
+for missing allele information and counts the number of potential
+haplotype pairs that are consistent with the observed data (see the
+Appendix for a description of this counting scheme).
+
+The codes for missing alleles are defined by the parameter
+{\sl miss.val}, a vector to define all possible missing value
+codes. Below, the result is saved in {\sl geno.desc}, which is a data frame,
+so individual rows may be printed. Here we show the results for
+subjects 1-10, 80-85, and 135-140, some of which have missing alleles.
+
+<<summarygeno, echo=TRUE>>=
+geno.desc <- summaryGeno(geno, miss.val=c(0,NA))
+print(geno.desc[c(1:10,80:85,135:140),])
+@
+
+\noindent The columns with {\sl 'loc miss-'} illustrate the number of loci
+missing either 0,
+1, or 2 alleles, and the last column, {\sl num\_enum\_rows}, illustrates the
+number of haplotype pairs that are consistent with the observed
+data. In the example above, subjects indexed by rows 81 and 137 have
+missing alleles. Subject \#81 has one locus missing two alleles, while
+subject \#137 has two loci missing two alleles. As indicated by
+{\sl num\_enum\_rows}, subject \#81 has 1,800 potential haplotype pairs, while
+subject \#137 has nearly 130,000.
+
+The 130,000 haplotype pairs is considered a large number, but {\sl haplo.em},
+{\sl haplo.score}, and {\sl haplo.glm} complete in roughly 3-6 minutes (depending
+on system limits or control parameter settings). If person \#137 were removed,
+the methods would take less than half that time. It is preferred to keep people
+if they provide information to the analysis, given that run time and memory usage
+are not too much of a burden.
+
+When a person has no genotype information, they do not provide
+information to any of the methods in \package. Furthermore, they cause a
+much longer run time. Below, using the {\sl table} function on the third
+column of {\sl geno.desc}, we can tabulate how many people are missing two alleles
+at any at of the three loci. If there were people missing two alleles at
+all three loci, they should be removed. The second command below shows how to
+make an index of which people to remove from {\sl hla.demo} because they are
+missing all their alleles.
+
+<<tabledesc, echo=TRUE>>=
+
+# find if there are any people missing all alleles
+table(geno.desc[,3])
+@
+
+<<rmmiss, echo=TRUE, eval=FALSE>>=
+## create an index of people missing all alleles
+miss.all <- which(geno.desc[,3]==3)
+
+# use index to subset hla.demo
+hla.demo.updated <- hla.demo[-miss.all,]
+@
+
+
+\subsection{Random Numbers and Setting Seed \label{seed} }
+
+Random numbers are used in several of the functions (e.g., to determine
+random starting frequencies within {\sl haplo.em}, and to compute permutation
+p-values in {\sl haplo.score}). To reproduce calculations involving random
+numbers, we set the seed values stored in a vector called
+{\sl .Random.seed}. This vector can be set using {\sl set.seed()}
+before any function which uses random numbers. Section \ref{glmBaseline}
+shows one example of setting the seed for {\sl haplo.glm}. We illustrate
+setting the seed below.
+
+<<setseed, echo=TRUE>>=
+# this is how to set the seed for reproducing results where haplo.em is
+# involved, and also if simulations are run. In practice, don't reset seed.
+seed <- c(17, 53, 1, 40, 37, 0, 62, 56, 5, 52, 12, 1)
+set.seed(seed)
+@
+
+
+\section{Haplotype Frequency Estimation: {\sl haplo.em}}
+
+\subsection{Algorithm}
+For genetic markers measured on unrelated subjects, with linkage phase
+unknown, {\sl haplo.em} computes maximum likelihood estimates of haplotype
+probabilities. Because there may be more than one pair of haplotypes
+that are consistent with the observed marker phenotypes, posterior
+probabilities of haplotype pairs for each subject are also
+computed. Unlike the usual EM which attempts to enumerate all possible
+haplotype pairs before iterating over the EM steps, our
+{\em progressive insertion} algorithm progressively inserts batches of
+loci into haplotypes of growing lengths, runs the EM steps, trims
+off pairs of haplotypes per subject when the posterior probability of
+the pair is below a specified threshold, and then continues these
+insertion, EM, and trimming steps until all loci are inserted into
+the haplotype. The user can choose the batch size. If the batch size
+is chosen to be all loci, and the threshold for trimming is set to 0,
+then this reduces to the usual EM algorithm. The basis of this
+progressive insertion algorithm is from the "snphap" software by
+David Clayton\cite{Clayton}.
+Although some of the features and control parameters of {\sl haplo.em} are
+modeled after {\sl snphap}, there are substantial differences, such as
+extension to allow for more than two alleles per locus, and some
+other nuances on how the algorithm is implemented.
+
+\subsection{Example Usage \label{hapEM}}
+We use {\sl haplo.em} on {\sl geno} for the 3 loci defined above and save
+the result in an object named {\sl save.em}, which has the {\sl haplo.em}
+class. The print method would normally print all 178 haplotypes from {\sl save.em},
+but to keep the results short for this manual, we give a quick glance of
+the output by using the option {\sl nlines=10}, which prints
+only the first 10 haplotypes of the full results. The {\sl nlines}
+parameter has been employed in some of Haplo Stats' print methods
+for when there are many haplotypes. In practice, it is best to exclude this
+parameter so that the default will print all results.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<label=runEM, echo=TRUE>>=
+save.em <- haplo.em(geno=geno, locus.label=label, miss.val=c(0,NA))
+names(save.em)
+print(save.em, nlines=10)
+@
+
+\Explain The print methods shows the haplotypes and their estimated frequencies,
+followed by the final log-likelihood statistic and the {\sl lr stat for no LD},
+which is the likelihood ratio test statistic contrasting the {\sl lnlike} for the
+estimated haplotype frequencies versus the {\sl lnlike} under the null assuming
+that alleles from all loci are in linkage equilibrium. We note that the trimming
+by the progressive insertion algorithm can invalidate the {\sl lr stat} and the
+degrees of freedom ({\sl df}).\\
+
+\subsection{Summary Method}
+
+The {\sl summary} method for a {\sl haplo.em} object on {\sl save.em} shows the
+list of haplotypes per subject, and their posterior probabilities:
+<<summaryEM, echo=TRUE>>=
+summary(save.em, nlines=7)
+@
+
+\Explain The first part of the {\sl summary} output lists the subject id
+(row number of input {\sl geno} matrix), the codes for the haplotypes of
+each pair, and the posterior probabilities of the haplotype pairs. The
+second part gives a table of the maximum number of pairs of haplotypes per subject,
+versus the number of pairs used in the final posterior probabilities.
+The haplotype codes remove the clutter of illustrating all the alleles
+of the haplotypes, but may not be as informative as the actual
+haplotypes themselves. To see the actual haplotypes, use the
+{\sl show.haplo=TRUE} option, as in the following example.
+
+<<summaryEMshow, echo=TRUE>>=
+# show full haplotypes, instead of codes
+summary(save.em, show.haplo=TRUE, nlines=7)
+@
+
+\subsection{Control Parameters for {\sl haplo.em} \label{emControl}}
+
+A set of control parameters can be passed together to {\sl haplo.em} as the
+``{\sl control}" argument. This is a list of parameters that control the
+EM algorithm based on progressive insertion of loci. The default values are set by a
+function called {\sl haplo.em.control} (see {\sl help(haplo.em.control)}
+for a complete description). Although the user can accept the default
+values, there are times when control parameters may need to be adjusted.
+
+These parameters are defined below:
+
+\begin{itemize}
+\item{} {\bf insert.batch.size:\ } Number of loci to be inserted in a single
+batch.
+\item{} {\bf min.posterior:\ } Minimum posterior probability of haplotype pair,
+conditional on observed marker genotypes. Posteriors below this
+minimum value will have their pair of haplotypes "trimmed" off the
+list of possible pairs.
+\item{}{\bf max.iter:\ } Maximum number of iterations allowed for the EM
+algorithm before it stops and prints an error.
+\item{}{\bf n.try:\ } Number of times to try to maximize the {\sl lnlike}
+ by the EM algorithm. The first try will use, as initial starting values for
+the posteriors, either equal values or uniform random variables, as
+determined by {\sl random.start}. All subsequent tries will use uniform
+random values as initial starting values for the posterior probabilities.
+\item{}{\bf max.haps.limit:\ }
+Maximum number of haplotypes for the input genotypes. Within
+haplo.em, the first step is to try to allocate the
+sum of the result of geno.count.pairs(), if that exceeds
+max.haps.limit, start by allocating max.haps.limit. If that is
+exceeded in the progressive-insertions steps, the C function doubles
+the memory until it can longer request more.
+\end{itemize}
+
+One reason to adjust control parameters is for finding the global
+maximum of the log-likelihood. It can be difficult in particular for
+small sample sizes and many possible haplotypes. Different
+maximizations of the log-likelihood may result in different results from
+{\sl haplo.em}, {\sl haplo.score}, or {\sl haplo.glm} when
+rerunning the analyses. The algorithm uses multiple attempts to maximize the
+log-likelihood, starting each attempt with random starting values.
+To increase the chance of finding the global maximum of the
+log-likelihood, the user can increase the number of attempts
+({\sl n.try}), increase the batch size ({\sl insert.batch.size}), or
+decrease the trimming threshold for posterior probabilities
+({\sl min.posterior}).
+
+Another reason to adjust control parameters is when the algorithm runs
+out of memory because there are too many haplotypes. If
+{\sl max.haps.limit} is exceeded when a batch of markers is added,
+the algorithm requests twice as much memory until it runs out. One option is
+to set {\sl max.haps.limit} to a different value, either to make {\sl haplo.em}
+request more memory initially, or to request more memory in smaller
+chunks. Another solution is to make the algorithm trim the number of
+haplotypes more aggressively by decreasing {\sl insert.batch.size} or
+increasing {\sl min.posterior}. Any changes to these parameters should be made
+with caution, and not drastically different from the default values.
+For instance, the default for {\sl min.posterior} used to be $1e-7$,
+and in some rare circumstances with many markers in only moderate linkage
+disequilibrium, some subjects had all their possible haplotype pairs
+trimmed. The default is now set at $1e-9$, and we recommend not
+increasing {\sl min.posterior} much greater than $1e-7$.
+
+The example below gives the command for increasing the number of tries
+to $20$, and the batch size to $2$, since not much more can be done
+for three markers.
+
+<<emcontrol, echo=TRUE, eval=FALSE>>=
+# demonstrate only the syntax of control parameters
+save.em.try20 <- haplo.em(geno=geno, locus.label=label, miss.val=c(0, NA),
+ control = haplo.em.control(n.try = 20, insert.batch.size=2))
+@
+
+\subsection{Haplotype Frequencies by Group Subsets}
+To compute the haplotype frequencies for each level of a grouping
+variable, use the function {\sl haplo.group}. The following example
+illustrates the use of a binomial response based on {\sl resp.cat},
+{\sl y.bin}, that splits the subjects into two groups.
+
+<<seed. eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<groupBin, echo=TRUE >>=
+## run haplo.em on sub-groups
+## create ordinal and binary variables
+y.bin <- 1*(resp.cat=="low")
+group.bin <- haplo.group(y.bin, geno, locus.label=label, miss.val=0)
+print(group.bin, nlines=15)
+@
+
+\Explain The {\sl group.bin} object can be very large, depending on the number
+of possible haplotypes, so only a portion of the output is illustrated above
+(limited again by {\sl nlines}). The first section gives a short summary of
+how many subjects appear in each of the groups. The second section is a table
+with the following columns:
+\begin{itemize}
+\item The first column gives row numbers.
+\item The next columns (3 in this example) illustrate the alleles of
+the haplotypes.
+\item {\sl Total} are the estimated haplotype frequencies for the entire
+data set.
+\item The last columns are the estimated haplotype frequencies for
+the subjects in the levels of the group variable ({\sl y.bin=0} and
+{\sl y.bin=1} in this example). Note that some haplotype frequencies
+have an {\sl NA}, which appears when the haplotypes do not occur in the
+subgroups.
+\end{itemize}
+
+\section{Power and Sample Size for Haplotype Association Studies\label{hapPower}}
+
+It is known that using haplotypes has greater power than single-markers
+to detect genetic association in some circumstances. There is little
+guidance, however, in determining sample size and power under
+different circumstances, some of which include: marker type, dominance,
+and effect size. The \package\ package now includes functions to calculate
+sample size and power for haplotype association studies, which is flexible
+to handle these multiple circumstances.
+
+Based on work in Schaid 2005\cite{Schaid 2005}, we can take a set of
+haplotypes with their population frequencies, assign a risk to a subset
+of the haplotypes, then determine either the sample size to achieve a
+stated power, or the power for a stated sample size. Sample size and power
+can be calculated for either quantitative traits or case-control studies.
+
+\subsection{Quantitative Traits: {\sl haplo.power.qt}\label{powerQT}}
+
+We assume that quantitative traits will be modeled by a linear
+regression. Some well-known tests for association between haplotypes and
+the trait include score statistics\cite{Schaid et al. 2002} and an
+F-test\cite{Zaykin}. For both types of tests, power depends on the
+amount of variance in the trait that is explained by haplotypes, or a
+multiple correlation coefficient, $R^{2}$. Rather than specifying the
+haplotype coefficients directly, we calculate the vector of
+coefficients based on an $R^{2}$ value.
+
+In the example below, we load an example set of haplotypes that contain 5
+markers, and specify the indices of the at-risk haplotypes; in this case,
+whichever haplotype has allele $1$ at the 2nd and 3rd markers.
+We set the first haplotype (most common) as the baseline.
+With these values we calculate the vector of coefficients for haplotype
+effects from {\sl find.haplo.beta.qt} using an $R^{2}=0.01$.
+Next, we use {\sl haplo.power.qt} to calculate the sample size for the
+set of haplotypes and their coefficients, type-I error (alpha) set to $0.05$,
+power at 80\%, and the same mean and variance used to get haplotype coefficients.
+Then we use the sample size needed for 80\% power for un-phased haplotypes
+($2,826$) to get the power for both phased and un-phased haplotypes.
+
+<<hapPowerQT, echo=TRUE >>=
+
+# load a set of haplotypes (hap-1 from Schaid 2005)
+
+data(hapPower.demo)
+
+#### an example using save.em hla markers may go like this.
+# keep <- which(save.em$hap.prob > .004) # get an index of non-rare haps
+# hfreq <- save.em$hap.prob[keep]
+# hmat <- save.em$haplotype[keep,]
+# hrisk <- which(hmat[,1]==31 & hmat[,2]==11) # contains 3 haps with freq=.01
+# hbase <- 4 # 4th hap has mas freq of .103
+####
+
+## separate the haplotype matrix and the frequencies
+hmat <- hapPower.demo[,-6]
+hfreq <- hapPower.demo[,6]
+
+## Define risk haplotypes as those with "1" allele at loc2 and loc3
+hrisk <- which(hmat$loc.2==1 & hmat$loc.3==1)
+
+# define index for baseline haplotype
+hbase <- 1
+
+hbeta.list <- find.haplo.beta.qt(haplo=hmat, haplo.freq=hfreq, base.index=hbase,
+ haplo.risk=hrisk, r2=.01, y.mu=0, y.var=1)
+hbeta.list
+
+ss.qt <- haplo.power.qt(hmat, hfreq, hbase, hbeta.list$beta,
+ y.mu=0, y.var=1, alpha=.05, power=.80)
+ss.qt
+
+power.qt <- haplo.power.qt(hmat, hfreq, hbase, hbeta.list$beta,
+ y.mu=0, y.var=1, alpha=.05, sample.size=2826)
+power.qt
+@
+
+\subsection{Case-Control Studies: {\sl haplo.power.cc} \label{powerCC}}
+
+The steps to compute sample size and power for case-control studies is
+similar to the steps for quantitative traits. If we assume a
+log-additive model for haplotype effects, the haplotype
+coefficients can be specified first as odds ratios (OR), and then converted
+to logistic regression coefficients according to $log(OR)$.
+
+In the example below, we assume the same baseline and risk haplotypes
+defined in section~\ref{powerQT}, give
+the risk haplotypes an odds ratio of 1.50, and specify a population disease
+prevalance of 10\%. We also assume cases make up 50\% ({\sl case.frac})
+of the study\'s subjects. We first compute the sample size for this scenario for
+Type-I error (alpha) at $0.05$ and 80\% power, and then compute power for the
+sample size required for un-phased haplotypes ($4,566$).
+
+<<hapPowerCC, echo=TRUE >>=
+
+## get power and sample size for quantitative response
+## get beta vector based on odds ratios
+
+cc.OR <- 1.5
+
+# determine beta regression coefficients for risk haplotypes
+
+hbeta.cc <- numeric(length(hfreq))
+hbeta.cc[hrisk] <- log(cc.OR)
+
+# Compute sample size for stated power
+
+ss.cc <- haplo.power.cc(hmat, hfreq, hbase, hbeta.cc, case.frac=.5, prevalence=.1,
+ alpha=.05, power=.8)
+ss.cc
+
+# Compute power for given sample size
+
+power.cc <- haplo.power.cc(hmat, hfreq, hbase, hbeta.cc, case.frac=.5, prevalence=.1,
+ alpha=.05, sample.size=4566)
+power.cc
+@
+
+
+\section{Haplotype Score Tests: {\sl haplo.score} \label{hapScore}}
+
+The {\sl haplo.score} routine is used to compute score statistics to test
+association between haplotypes and a wide variety of traits,
+including binary, ordinal, quantitative, and Poisson. This function
+provides several different global and haplotype-specific tests for
+association and allows for adjustment for non-genetic covariates.
+Haplotype effects can be specified as additive,
+dominant, or recessive. This method also has an option to compute
+permutation p-values, which may be needed for sparse data when
+distribution assumptions may not be met. Details on the background and
+theory of the score statistics can be found in
+Schaid et al.\cite{Schaid et al. 2002}.
+
+\subsection{Quantitative Trait Analysis}
+First, we assess a haplotype association with a quantitative trait in
+{\sl hla.demo} called {\sl resp}. To tell {\sl haplo.score} the trait is
+quantitative, specify the parameter {\sl trait.type="gaussian"}
+(a reminder that a gaussian distribution is assumed for the error terms). The other
+arguments, all set to default values, are explained in the help file. Note that
+rare haplotypes can result in unstable variance estimates, and hence
+unreliable test statistics for rare haplotypes. We restrict the
+analysis to get scores for haplotypes with a minimum sample count using
+{\sl min.count=5}. For more explanation on handling rare haplotypes, see
+section~\ref{freqmin}. Below is an example of running {\sl haplo.score} with
+a quantitative trait, then viewing the results using the {\sl print}
+method for the {\sl haplo.score} class. (again, output shortened by {\sl nlines}).
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<scoregauss, echo=TRUE>>=
+# score statistics w/ Gaussian trait
+score.gaus.add <- haplo.score(resp, geno, trait.type="gaussian",
+ min.count=5,
+ locus.label=label, simulate=FALSE)
+print(score.gaus.add, nlines=10)
+@
+
+\Explain First, the model effect chosen by {\sl haplo.effect} is
+printed across the top. The section {\sl Global Score Statistics} shows results for
+testing an overall association between haplotypes and the response. The
+{\sl global-stat} has an asymptotic $\chi^{2}$ distribution, with degrees of
+freedom ({\sl df}) and {\sl p-value} as indicated. Next, {\sl Haplotype-specific
+scores} are given in a table format. The column descriptions are as follows:
+\begin{itemize}
+\item{} The first column gives row numbers.
+\item{} The next columns (3 in this example) illustrate the alleles of
+the haplotypes.
+\item{} {\sl Hap-Freq} is the estimated frequency of the haplotype in the
+pool of all subjects.
+\item{} {\sl Hap-Score} is the score for the haplotype, the results
+ are sorted by this value. Note, the score statistic should not be
+ interpreted as a measure of the haplotype effect.
+\item{} {\sl p-val} is the asymptotic $\chi^{2}_1$ p-value, calculated
+ from the square of the score statistic.
+\end{itemize}
+
+
+\subsection{Binary Trait Analysis}
+Let us assume that "low" responders are of primary interest, so we create
+a binary trait that has values of 1 when {\sl resp.cat} is "low", and 0 otherwise.
+Then in {\sl haplo.score} specify the parameter {\sl trait.type="binomial"}.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<scorebinom, echo=TRUE>>=
+# scores, binary trait
+y.bin <- 1*(resp.cat=="low")
+score.bin <- haplo.score(y.bin, geno, trait.type="binomial",
+ x.adj = NA, min.count=5,
+ haplo.effect="additive", locus.label=label,
+ miss.val=0, simulate=FALSE)
+print(score.bin, nlines=10)
+@
+
+\subsection{Ordinal Trait Analysis}
+To create an ordinal trait, here we convert {\sl resp.cat} (described above) to
+numeric values, {\sl y.ord} (with levels 1, 2, 3). For {\sl haplo.score},
+use {\sl y.ord} as the response variable, and set the parameter
+{\sl trait.type = "ordinal"}.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<scoreOrd, echo=TRUE >>=
+# scores w/ ordinal trait
+y.ord <- as.numeric(resp.cat)
+score.ord <- haplo.score(y.ord, geno, trait.type="ordinal",
+ x.adj = NA, min.count=5,
+ locus.label=label,
+ miss.val=0, simulate=FALSE)
+print(score.ord, nlines=7)
+@
+
+\noindent{\bf \large Warning for Ordinal Traits \vspace{.06in}\\}
+
+When analyzing an ordinal trait with adjustment for covariates (using
+the {\sl x.adj} option), the software requires the {\sl rms} package,
+distributed by Frank Harrell \cite{Harrell}. If the user
+does not have these packages installed, then it will not be possible
+to use the {\sl x.adj} option. However, the unadjusted scores
+for an ordinal trait (using the default option {\sl x.adj=NA}) do not
+require these pacakgeses. Check the list of your local packages in
+the list shown from entering {\sl library()}\ in your prompt.
+
+
+\subsection{Haplotype Scores, Adjusted for Covariates}
+To adjust for covariates in {\sl haplo.score}, first set up a
+matrix of covariates from the example data. For example, use
+a column for male (1 if male; 0 if female), and
+a second column for age. Then pass the matrix to {\sl haplo.score}
+using parameter {\sl x.adj}. The results change slightly in this example.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<scoreAdj, echo=TRUE >>=
+# score w/gaussian, adjusted by covariates
+x.ma <- cbind(male, age)
+score.gaus.adj <- haplo.score(resp, geno, trait.type="gaussian",
+ x.adj = x.ma, min.count=5,
+ locus.label=label, simulate=FALSE)
+print(score.gaus.adj, nlines=10)
+@
+
+
+\subsection{Plots and Haplotype Labels}
+A convenient way to view results from {\sl haplo.score} is a plot of
+the haplotype frequencies ({\sl Hap-Freq}) versus the haplotype score
+statistics ({\sl Hap-Score}). This plot, and the syntax for creating
+it, are shown in Figure~\ref{scorePlot}.
+
+Some points on the plot may be of interest. To identify
+individual points on the plot, use \mbox{\sl locator.haplo(score.gaus)},
+which is similar to {\sl locator()}. Use the mouse to select points on
+the plot. After points are chosen, click on the middle
+mouse button, and the points are labeled with their haplotype labels.
+Note, in constructing Figure~\ref{scorePlot}, we had to define which points
+to label, and then assign labels in the same way as done within the
+{\sl locator.haplo} function.
+
+\begin{figure}[h]
+ \begin{center}
+<<scorePlot, fig=TRUE, echo=TRUE>>=
+## plot score vs. frequency, gaussian response
+plot(score.gaus.add, pch="o")
+
+## locate and label pts with their haplotypes
+## works similar to locator() function
+#> pts.haplo <- locator.haplo(score.gaus)
+
+pts.haplo <- list(x.coord=c(0.05098, 0.03018, .100),
+ y.coord=c(2.1582, 0.45725, -2.1566),
+ hap.txt=c("62:2:7", "51:1:35", "21:3:8"))
+
+text(x=pts.haplo$x.coord, y=pts.haplo$y.coord, labels=pts.haplo$hap.txt)
+@
+ \caption{Haplotype Statistics: Score vs. Frequency, Quantitative
+ Response} \label{scorePlot}
+ \end{center}
+\end{figure}
+\clearpage
+
+\subsection{Skipping Rare Haplotypes\label{freqmin}}
+For the {\sl haplo.score}, the {\sl skip.haplo} and {\sl min.count}
+parameters control which rare haplotypes are pooled
+into a common group. The {\sl min.count} parameter is a recent
+addition to {\sl haplo.score}, yet it does the same task as
+{\sl skip.haplo} and is the same idea as {\sl haplo.min.count} used in
+{\sl haplo.glm.control} for {\sl haplo.glm}. As a guideline, you may wish
+to set {\sl min.count} to calculate scores for haplotypes with expected
+haplotype counts of $5$ or greater in the sample. We concentrate on
+this expected count because it adjusts to the size of the input data.
+If $N$ is the number of subjects and $f$ the haplotype frequency,
+then the expected haplotype count is \mbox{$count=2 \times N \times f$}.
+Alternatively, you can choose \mbox{$skip.haplo= \frac{count}{2 \times N}$}.
+
+In the following example we try a different cut-off than before,
+{\sl min.count=10}, which corresponds to {\sl skip.haplo}
+of $10 \div (2 \times 220) = .045$.
+In the output, see that the global statistic, degrees of freedom, and
+p-value change because of the fewer haplotypes, while the haplotype-specific
+scores do not change.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<scoremin10, echo=TRUE>>=
+# increase skip.haplo, expected hap counts = 10
+score.gaus.min10 <- haplo.score(resp, geno, trait.type="gaussian",
+ x.adj = NA, min.count=10,
+ locus.label=label, simulate=FALSE)
+print(score.gaus.min10)
+@
+
+\subsection{Score Statistic Dependencies:
+ the {\sl eps.svd} parameter \label{scoreEPS}}
+
+The global score test is calculated using the vector of scores and
+the generalized inverse of their variance-covariance matrix,
+performed by the {\sl Ginv} function. This function
+determines the rank of the variance matrix by its singular value
+decomposition, and an epsilon value is used as the cut-off for small
+singular values. If all of the haplotypes in the sample are scored,
+then there is dependence between them and the variance matrix is
+not of full rank. However, it is more often the case that one or more
+rare haplotypes are not scored because of low frequency. It is not
+clear how strong the dependencies are between the remaining score statistics,
+and likewise, there is disparity in calculating the rank of the variance
+matrix. For these instances we give the user control over the epsilon
+parameter for {\sl haplo.score} with {\sl eps.svd}.
+
+We have seen instances where the global score test had a very
+significant p-value, but none of the haplotype-specific scores showed strong
+association. In such instances, we found the default epsilon value
+in {\sl Ginv} was incorrectly considering the variance matrix as
+having full rank, and the misleading global score test was corrected when
+we increased epsilon for {\sl Ginv}. We now set the default for {\sl eps.svd}
+at $1e-5$, which seems to perform better in the troublesome circumstances
+than the previous default of $1e-6$.
+
+
+\subsection{Haplotype Model Effect \label{scoreEffect} }
+A recent addition to {\sl haplo.score} is the ability to select
+non-additive effects to score haplotypes. The possible effects for haplotypes
+are additive, dominant, and recessive. Under recessive effects, fewer
+haplotypes may be scored, because subjects are required
+to be homozygous for haplotypes. Furthermore, there would have
+to be {\sl min.count} such persons in the sample to have the recessive
+effect scored. Therefore, a recessive model should
+only be used on samples with common haplotypes. In the example below with
+the gaussian response, set the haplotype effect to dominant using parameter
+{\sl haplo.effect = ''dominant''}. Notice the results change slightly
+compared to the {\sl score.gaus.add} results above.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<scoreDom, echo=TRUE >>=
+# score w/gaussian, dominant effect
+
+score.gaus.dom <- haplo.score(resp, geno, trait.type="gaussian",
+ x.adj=NA, min.count=5,
+ haplo.effect="dominant", locus.label=label,
+ simulate=FALSE)
+print(score.gaus.dom, nlines=10)
+@
+
+\subsection{Simulation p-values \label{scoreSim}}
+When {\sl simulate=TRUE}, {\sl haplo.score} gives simulated p-values.
+Simulated haplotype score statistics are the re-calculated score
+statistics from a permuted re-ordering of the trait and covariates and
+the original ordering of the genotype matrix. The simulated p-value
+for the global score statistic
+(\mbox{\sl Global sim. p-val}) is the number of times the simulated
+global score statistic exceeds the observed, divided by the total
+number of simulations. Likewise, simulated p-value for the maximum
+score statistic (\mbox{\sl Max-stat sim. p-val}) is the number of
+times the simulated maximum haplotype score statistic exceeds the observed
+maximum score statistic, divided by the total number of simulations.
+The maximum score statistic is the maximum of the square of the
+haplotype-specific score statistics, which has an unknown distribution, so its
+significance can only be given by the simulated p-value.
+Intuitively, if only one or two haplotypes are associated with
+the trait, the maximum score statistic should have greater power to
+detect association than the global statistic.
+
+The {\sl score.sim.control} function manages control parameters for
+simulations. The {\sl haplo.score} function employs the simulation p-value
+precision criteria of Besag and Clifford\cite{Besag and Clifford 1991}.
+These criteria ensure that the simulated p-values for both the global
+and the maximum score statistics are precise for small p-values.
+The algorithm performs a user-defined minimum number of permutations
+({\sl min.sim}) to guarantee sufficient precision for the simulated
+p-values for score statistics of individual haplotypes. Permutations
+beyond this minimum are then conducted until the sample standard
+errors for simulated p-values for both the {\sl global-stat} and {\sl
+ max-stat} score statistics are less than a threshold
+\mbox{({\sl p.threshold * p-value})}. The default value for
+\mbox{{\sl p.threshold= $\frac{1}{4}$}} provides a two-sided 95\%
+confidence interval for the p-value with a width that is approximately
+as wide as the p-value itself. Effectively, simulations are
+more precise for smaller p-values. The following example illustrates
+computation of simulation p-values with {\sl min.sim=1000}.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<scorePerm, echo=TRUE >>=
+# simulations when binary response
+score.bin.sim <- haplo.score(y.bin, geno, trait.type="binomial",
+ x.adj = NA, locus.label=label, min.count=5,
+ simulate=TRUE, sim.control = score.sim.control() )
+print(score.bin.sim)
+@
+
+\section{Regression Models: {\sl haplo.glm} \label{hapGLM}}
+The {\sl haplo.glm} function computes the regression of a trait on
+haplotypes, and possibly other covariates and their interactions with
+haplotypes. We currently support the gaussian, binomial, and Poisson
+families of traits with their canonical link functions. The effects
+of haplotypes on the link function can be modeled as either additive,
+dominant (heterozygotes and homozygotes for a particular haplotype
+assumed to have equivalent effects), or recessive (homozygotes of a
+particular haplotype considered to have an alternative effect on the
+trait). The basis of the algorithm is a two-step iteration process;
+the posterior probabilities of pairs of haplotypes per subject are
+used as weights to update the regression coefficients, and the
+regression coefficients are used to update the haplotype posterior
+probabilities. See Lake et al.\cite{Lake et al. 2003} for details.
+
+\subsection{New and Updated Methods for haplo.glm}
+
+We initially wrote {\sl haplo.glm} without much effort of making it work with
+R's glm class methods. We have now refined the {\sl haplo.glm} class to
+look and act as much like a glm object as possible with methods defined
+specifically for the {\sl haplo.glm} class. We provide print and
+summary methods that make use of the corresponding methods
+for glm and then add extra information for the haplotypes and their
+frequencies. Furthermore, we have defined for the {\sl haplo.glm} class
+some of the standard methods for
+regression fits, including residuals, fitted.values, vcov, and anova. We
+describe the challenges that haplotype regression presents with these methods
+in section~\ref{glmMethods}.
+
+\subsection{Preparing the {\sl data.frame} for {\sl haplo.glm} \label{glmSetup}}
+A critical distinction between {\sl haplo.glm} and all other functions
+in Haplo Stats is that the definition of the regression model follows
+the S/R formula standard (see {\sl lm} or {\sl glm}). So, a
+{\sl data.frame} must be defined, and this object must contain the trait and
+other optional covariates, plus a special kind of genotype matrix
+({\sl geno.glm} for this example) that contains the genotypes of the
+marker loci. We require the genotype matrix to be prepared using
+{\sl setupGeno()}, which handles character, numeric, or factor alleles,
+and keeps the columns of the genotype matrix as a single unit when
+inserting into (and extracting from) a {\sl data.frame}. The
+{\sl setupGeno} function recodes all missing genotype value codes given
+by {\sl miss.val} to NA, and also recodes alleles to integer values.
+The original allele codes are preserved within an attribute of {\sl geno.glm},
+and are utilized within {\sl haplo.glm}. The returned object has class
+{\sl model.matrix}, and it can be included in a {\sl data.frame} to
+be used in {\sl haplo.glm}. In the example below we prepare a
+genotype matrix, {\sl geno.glm}, and create a {\sl data.frame} object,
+{\sl glm.data}, for use in {\sl haplo.glm}.
+
+<<glmGeno, echo=TRUE>>=
+# set up data for haplo.glm, include geno.glm,
+# covariates age and male, and responses resp and y.bin
+geno <- hla.demo[,c(17,18,21:24)]
+geno.glm <- setupGeno(geno, miss.val=c(0,NA), locus.label=label)
+attributes(geno.glm)
+y.bin <- 1*(resp.cat=="low")
+glm.data <- data.frame(geno.glm, age=age, male=male, y=resp, y.bin=y.bin)
+@
+
+\subsection{Rare Haplotypes \label{rareHap}}
+The issue of deciding which haplotypes to use for association is
+critical in {\sl haplo.glm}. By default it will model a rare
+haplotype effect so that the effects of other haplotypes are in
+reference to the baseline effect of the one common happlotype. The
+rules for choosing haplotypes to be modeled in {\sl haplo.glm} are
+similar to the rules in {\sl haplo.score}: by a minimum frequency or a
+minimum expected count in the sample.
+
+Two control parameters in {\sl haplo.glm.control} may be used to
+control this setting: {\sl haplo.freq.min} may be set to a selected
+minimum haplotype frequency, and {\sl haplo.min.count} may be set to select
+the cut-off for minimum expected haplotype count in the sample. The
+default minimum frequency cut-off in {\sl haplo.glm} is set to
+$0.01$. More discussion on rare haplotypes takes place in
+section~\ref{controlRare}.
+
+\subsection{Regression for a Quantitative Trait \label{glmQuant}}
+
+The following illustrates how to fit a regression of a quantitative
+trait {\sl y} on the haplotypes estimated from the {\sl geno.glm} matrix,
+and the covariate {\sl male}. For {\sl na.action}, we use
+{\sl na.geno.keep}, which keeps a subject with missing values in the genotype
+matrix if they are not missing all alleles, but removes subjects
+with missing values ({\sl NA}) in either the response or covariate.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<glmGauss, echo=TRUE>>=
+# glm fit with haplotypes, additive gender covariate on gaussian response
+fit.gaus <- haplo.glm(y ~ male + geno.glm, family=gaussian, data=glm.data,
+ na.action="na.geno.keep", locus.label = label, x=TRUE,
+ control=haplo.glm.control(haplo.freq.min=.02))
+
+summary(fit.gaus)
+@
+
+\Explain The new summary function for {\sl haplo.glm} shows much the same
+information as summary for glm objects with the extra table for the haplotype
+frequencies. The above table for {\sl Coefficients} lists the estimated regression
+coefficients ({\sl coef}), standard errors ({\sl se}), the
+corresponding t-statistics ({\sl t.stat}), and p-values ({\sl pval}).
+The labels for haplotype coefficients are a concatenation of the name of the
+genotype matrix ({\sl geno.glm}) and unique haplotype codes assigned
+within {\sl haplo.glm}. The haplotypes corresponding to these haplotype
+codes are listed in the {\sl Haplotypes} table, along with the estimates of the
+haplotype frequencies ({\sl hap.freq}). The rare haplotypes, those with
+expected counts less than \mbox{\sl haplo.min.count=5} (equivalent to
+having frequencies less than \mbox{{\sl haplo.freq.min =} %.01136
+\Sexpr{5/(2*nrow(glm.data))}}) in the
+above example), are pooled into a single category labeled {\sl geno.glm.rare}.
+The haplotype chosen as the baseline category for the design matrix (most
+frequent haplotype is the default) is labeled as {\sl haplo.base}; more
+information on the baseline may be found in section \ref{glmBaseline}.
+
+
+{\subsection{Fitting Haplotype x Covariate Interactions}
+Interactions are fit by the standard S-language model syntax, using
+a '$*$' in the model formula to indicate main effects and interactions.
+Some other formula constructs are not supported, so use the formula
+parameter with caution. Below is an example of modeling the
+interaction of {\sl male} and the haplotypes. Because more terms will
+be estimated in this case, we limit how many haplotypes will be
+included by increasing {\sl haplo.min.count} to 10.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<glmInter, echo=TRUE >>=
+# glm fit haplotypes with covariate interaction
+fit.inter <- haplo.glm(formula = y ~ male * geno.glm,
+ family = gaussian, data=glm.data,
+ na.action="na.geno.keep",
+ locus.label = label,
+ control = haplo.glm.control(haplo.min.count = 10))
+summary(fit.inter)
+@
+
+\Explain The listed results are as explained under section~\ref{glmQuant}.
+The main difference is that the interaction coefficients are labeled as a
+concatenation of the covariate ({\sl male} in this example) and the
+name of the haplotype, as described above. In addition, estimates may
+differ because the model has changed.
+
+\subsection{Regression for a Binomial Trait \label{glmBinom}}
+Next we illustrate the fitting of a binomial trait with the same
+genotype matrix and covariate.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<glmBinom, echo=TRUE>>=
+# gender and haplotypes fit on binary response,
+# return model matrix
+fit.bin <- haplo.glm(y.bin ~ male + geno.glm, family = binomial,
+ data=glm.data, na.action = "na.geno.keep",
+ locus.label=label,
+ control = haplo.glm.control(haplo.min.count=10))
+summary(fit.bin)
+@
+
+\Explain The underlying methods for {\sl haplo.glm} are based on a prospective
+likelihood. Normally, this type of likelihood works well for
+case-control studies with standard covariates. For ambiguous
+haplotypes, however, one needs to be careful when interpreting
+the results from fitting {\sl haplo.glm} to case-control data. Because
+cases are over-sampled, relative to the population prevalence (or
+incidence, for incident cases), haplotypes associated with disease
+will be over-represented in the case sample, and so estimates
+of haplotype frequencies will be biased. Positively associated
+haplotypes will have haplotype frequency estimates that are higher
+than the population haplotype frequency. To avoid this problem, one
+can weight each subject. The weights for the cases should be the
+population prevalence, and the weights for controls should be 1
+(assuming the disease is rare in the population, and controls are
+representative of the general population). See Stram et al.\cite{Stram et al.
+2003} for background on using weights, and see the help file for
+{\sl haplo.glm} for how to implement weights.
+
+The estimated regression coefficients for case-control studies can be
+biased by either a large amount of haplotype ambiguity and
+mis-specified weights, or by departures from Hardy-Weinberg
+Equilibrium of the haplotypes in the pool of cases and controls.
+Generally, the bias is small, but tends to be towards the null of no
+association. See Stram et al. \cite{Stram et al. 2003} and Epstein and
+Satten \cite{Epstein and Satten 2003} for further details.
+
+\subsubsection{Caution on Rare Haplotypes with Binomial Response \label{rarebinom}}
+If a rare haplotype occurs only in cases or only in controls, the
+fitted values would go to 0 or 1, where R would issue a
+warning. Also, the coefficient estimate for that haplotype would go
+to positive or negative infinity, If the default {\sl haplo.min.count=5}
+were used above, this warning would appear. To keep this from
+occuring in other model fits, increase the minimum count or minimum frequency.
+
+\subsection{Control Parameters \label{glmControl}}
+Additional parameters are handled using {\sl control}, which is a list of
+parameters providing additional functionality in {\sl haplo.glm}. This list
+is set up by the function {\sl haplo.glm.control}. See the help
+file ({\sl help(haplo.glm.control)}) for a full list of control
+parameters, with details of their usage. Some of the options are
+described here.
+
+\subsubsection{Controlling Genetic Models: {\sl haplo.effect}}
+The {\sl haplo.effect} control parameter for {\sl haplo.glm}
+instructs whether the haplotype effects are fit as additive, dominant,
+or recessive. That is, {\sl haplo.effect} determines whether the
+covariate ($x$) coding of haplotypes follows the values in Table 1
+for each effect type. Heterozygous means a subject has one copy of a
+particular haplotype, and homozygous means a subject has two copies of
+a particular haplotype. \\ \\ \\ \\
+
+\noindent {\bf Table 1:} Coding haplotype covariates in a model matrix\\
+\begin{center}
+\begin{tabular}{|r|c|c|c|}
+\hline
+Hap - Pair & additive & dominant & recessive\\ \hline \hline
+Heterozygous & 1 & 1 & 0 \\ \hline
+Homozygous & 2 & 1 & 1 \\ \hline
+\end{tabular}
+\end{center}
+
+\noindent Note that in a recessive model, the haplotype effects are
+estimated only from subjects who are homozygous for a haplotype.
+Some of the haplotypes which meet the {\sl haplo.freq.min} and
+{\sl haplo.count.min} cut-offs may occur as homozygous in only a few of the
+subjects. As stated in \ref{scoreEffect}, recessive models should be used
+when the region has multiple common haplotypes.
+
+The default {\sl haplo.effect} is {\sl additive}, whereas the example below
+illustrates the fit of a {\sl dominant} effect of haplotypes for the
+gaussian trait with the gender covariate.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<glmDom, echo=TRUE >>=
+# control dominant effect of haplotypes (haplo.effect)
+# by using haplo.glm.control
+fit.dom <- haplo.glm(y ~ male + geno.glm, family = gaussian,
+ data = glm.data, na.action = "na.geno.keep",
+ locus.label = label,
+ control = haplo.glm.control(haplo.effect='dominant',
+ haplo.min.count=8))
+
+summary(fit.dom)
+@
+
+\subsubsection{Selecting the Baseline Haplotype \label {glmBaseline} }
+The haplotype chosen for the baseline in the model is the one with the
+highest frequency. Sometimes the most frequent haplotype may be an at-risk
+haplotype, and so the measure of its effect is desired. To specify a
+more appropriate haplotype as the baseline in the binomial example, choose
+from the list of other common haplotypes, {\sl fit.bin\$haplo.common}. To specify an
+alternative baseline, such as haplotype 77, use the control parameter
+{\sl haplo.base} and haplotype code, as in the example below.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<glmBaseline, echo=TRUE >>=
+# control baseline selection, perform the same exact run as fit.bin,
+# but different baseline by using haplo.base chosen from haplo.common
+fit.bin$haplo.common
+fit.bin$haplo.freq.init[fit.bin$haplo.common]
+fit.bin.base77 <- haplo.glm(y.bin ~ male + geno.glm, family = binomial,
+ data = glm.data, na.action = "na.geno.keep",
+ locus.label = label,
+ control = haplo.glm.control(haplo.base=77,
+ haplo.min.count=8))
+summary(fit.bin.base77)
+@
+
+\Explain The above model has the same haplotypes as
+{\sl fit.bin}, except haplotype $4$, the old baseline, now has an
+effect estimate while haplotype $77$ is the new baseline.
+Due to randomness in the starting values of the haplotype
+frequency estimation, different runs of {\sl haplo.glm} may result in a
+different set of haplotypes meeting the minimum counts requirement for being
+modeled. Therefore, once you have arrived at a suitable model, and you wish
+to modify it by changing baseline and/or effects, you can make results
+consistent by controlling the randomness using {\sl set.seed}, as described
+in section \ref{seed}. In this document, we use the same seed before
+making {\sl fit.bin} and {\sl fit.bin.base77}.
+
+
+\subsubsection{Rank of Information Matrix and eps.svd ({\bf NEW}) \label{glmSVD}}
+
+Similar to recent additions to {\sl haplo.score} in section~\ref{scoreEPS},
+we give the user control over the epsilon parameter determining the number of
+singular values when determining the rank of the information matrix in
+{\sl haplo.glm}. Finding the generalized inverse of this matrix can be
+problematic when either the response variable or a covariate has a large variance
+and is not scaled before passed to {\sl haplo.glm}. The rank of the information
+matrix is determined by the number of non-zero singular values a small cutoff,
+epsilon. When the singular values for the
+coefficients are on a larger numeric scale than those for the haplotype
+frequencies, the generalized inverse may incorrectly determine the information
+matrix is not of full rank. Therefore, we allow the user to specify the epsilon
+as {\sl eps.svd} in the control parameters for {\sl haplo.glm}.
+A simpler fix, which we strongly suggest, is for the user to pre-scale any
+continuous responses or covariates with a large variance.
+
+Here we demonstrate what happens when we increase the variance of a gaussian
+response by $2500$. We see that the coefficients are all highly significant
+and the rank of the information matrix is much smaller than the scaled gaussian fit.
+
+<<eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+
+<<glmEPS1, eval=TRUE>>=
+glm.data$ybig <- glm.data$y*50
+fit.gausbig <- haplo.glm(formula = ybig ~ male + geno.glm, family = gaussian,
+ data = glm.data, na.action = "na.geno.keep", locus.label = label,
+ control = haplo.glm.control(haplo.freq.min = 0.02), x = TRUE)
+
+summary(fit.gausbig)
+
+fit.gausbig$rank.info
+fit.gaus$rank.info
+@
+
+Now we set a smaller value for the {\sl eps.svd} control parameter and find the fit
+matches the original Gaussian fit.
+
+<<echo=FALSE, eval=TRUE>>=
+set.seed(seed)
+@
+<<glmEPS2>>=
+
+fit.gausbig.eps <- haplo.glm(formula = ybig ~ male + geno.glm, family = gaussian,
+ data = glm.data, na.action = "na.geno.keep", locus.label = label,
+ control = haplo.glm.control(eps.svd=1e-10, haplo.freq.min = 0.02), x = TRUE)
+
+summary(fit.gausbig.eps)
+fit.gausbig.eps$rank.info
+@
+
+
+\subsubsection{Rare Haplotypes and {\sl haplo.min.info} \label{controlRare}}
+
+Another notable control parameter is the minimum frequency for a rare
+haplotype to be included in the calculations for standard error (se)
+of the coefficients, or {\sl haplo.min.info}. The default value is
+$0.001$, which means that haplotypes with frequency less
+than that will be part of the rare haplotype coefficient estimate,
+but it will not be used in the standard error calculation.
+
+The following example demonstrates a possible result when dealing with
+the rare haplotype effect. We show with the hla genotype data one
+consequence for when this occurs. However, we make it happen by
+setting {\sl haplo.freq.min} equal to {\sl haplo.min.info}, which we
+advise strongly against in your analyses.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<glmRare2, echo=TRUE >>=
+## set haplo.freq.min and haplo.min.info to same value to show how the
+## rare coefficient may be modeled but standard error estimate is not
+## calculated because all haps are below haplo.min.info
+
+fit.bin.rare02 <- haplo.glm(y.bin ~ geno.glm, family = binomial,
+ data = glm.data, na.action = "na.geno.keep", locus.label = label,
+ control = haplo.glm.control(haplo.freq.min=.02, haplo.min.info=.02))
+summary(fit.bin.rare02)
+@
+
+\Explain The above results show the standard error for the rare
+haplotype coefficient is ``NaN'', or ``Not a Number'' in R, which is a
+consequence of having most, or all, of the rare haplotypes discarded for the
+standard error estimate. In other datasets there may be only a few
+haplotypes between {\sl haplo.min.info} and {\sl haplo.freq.min}, and
+may yield misleading results for the rare haplotype coefficient. For
+this reason, we recommend that any inference made on the rare
+haplotypes be made with caution, if at all.
+
+
+\section{Methods for {\sl haplo.glm} {\bf (NEW)} \label{glmMethods}}
+
+The latest updates to \package\ is our work to make {\sl haplo.glm} to
+act similar to a glm object with methods to compare and assess model
+fits. In this section we describe the challenges and caveats of defining
+these methods for a {\sl haplo.glm} object and show how to use them.
+
+
+\subsection{fitted.values}
+
+A challenge when defining methods for {\sl haplo.glm} is that we account for
+the ambiguity in a persons haplotype pair. To handle this in the glm framework, the
+response and non-haplotype covariates are expanded for each person with a
+posterior probability of the haplotype given their genotype as a weight.
+The returned object from {\sl haplo.glm} looks somewhat like a regular glm,
+but the model matrix, response, and thus the fitted values, are all expanded.
+Users who want to work with the expanded versions of those items are welcome
+to access them from the returned object.
+
+We now provide a method to get the fitted values for each person,
+{\sl fitted.haplo.glm}. These collapsed fitted values are calculated
+by a weighted sum of the expanded fitted values for each person where
+the weights are the posterior probabilities of the person's expanded
+haplotype pairs.
+
+\subsection{residuals}
+
+The residuals within the {\sl haplo.glm} object are also expanded for the
+haplotype pairs for subjects. We provide {\sl residuals.haplo.glm} to
+get the collapsed deviance, pearson, working, and response residuals for
+each person. Because we have not implemented a predict method for
+{\sl haplo.glm}, the method does not calculate partial residuals.
+
+\subsection{vcov}
+
+We provide {\sl vcov.haplo.glm} as a method to get the variance-covariance matrix of
+model parameters in the {\sl haplo.glm} object. Unlike the standard glm object, this
+matrix is computed and retained in the returned object. We do this because the model
+parameters are the model coefficients and the haplotype frequencies, and it is
+computationally-intensive to compute.
+
+We show how to get the variance matrix for all the parameters and for only the model
+coefficients.
+
+<<vcovGLM>>=
+
+varmat <- vcov(fit.gaus)
+dim(varmat)
+
+varmat <- vcov(fit.gaus, freq=FALSE)
+dim(varmat)
+print(varmat, digits=2)
+@
+
+\subsection{anova and Model Comparison}
+
+We use the {\sl anova.glm} method as a framework for {\sl anova.haplo.glm} to allow
+comparisons of model fits. We limit the model comparisons to multiple
+nested model fits, which requires that each model to be compared is either a
+{\sl haplo.glm} or {\sl glm} fitted object. We eliminate the functionality of testing
+sub-models of a single fit because removal of a single covariate would require
+re-fitting of the reduced model to get updated coefficient and haplotype frequency
+estimates with a maximized log-likelihood. We decided to simplify the usage and
+require that all models to be compared are fully fitted.
+
+As with the {\sl anova.glm} method, it is difficult to check for truly nested models,
+so we pass the responsibility on to the user. We discuss some of the requirements.
+
+One type of two-model comparison is between models with haplotypes (expanded subjects)
+and a reduced model without haplotypes. We check for the same sample size in these models
+by comparing the collapsed sample size from a {\sl haplo.glm} fit to the sample size from
+the {\sl glm} fit, which we remind users is only a loose check of model comparability.
+
+The other comparison of two models in {\sl anova.haplo.glm} is to compare two models
+that contain the same genotypes, and inherently the same haplotypes. This is more
+tricky because a subject may not have the same expanded set of possible haplotype pairs
+across two fits of {\sl haplo.glm} unless the same seed is set before both fits.
+Even if a seed is the same, the other effects that are different between the two models
+will affect the haplotype frequency estimates, and may still result in a different
+expansion of haplotype pairs per subject. Our check of the collapsed sample size for the
+two models still applies with the same pitfalls, but a better assurance of model
+comparability is to use the same seed.
+
+In the {\sl haplo.glm} fit we provide the likelihood ratio test of the null model
+against the full model, which is the most appropriate
+test available for {\sl haplo.glm} objects, but it is difficult to compare the log-likeihood
+across two {\sl haplo.glm} fits. Therefore, we remain consistent with
+glm model comparison \cite{McCullaghNelder}, and use the difference in deviance to
+compare models. Furthermore, we restrict the asymptotic test for model
+comparison to be the $\chi^{2}$ test for goodness of fit.
+
+Below we show how to get the LRT from the {\sl fit.gaus} result, then show how to
+compare some of the nested models fit above, including a regular
+glm fit of $y \sim male$. The anova method requires the nested model to be given first,
+and any anova with a {\sl haplo.glm} object should explicitly call {\sl anova.haplo.glm}.
+
+<<anovaGLM>>=
+fit.gaus$lrt
+
+glmfit.gaus <- glm(y~male, family="gaussian", data=glm.data)
+
+anova.haplo.glm(glmfit.gaus, fit.gaus)
+anova.haplo.glm(fit.gaus, fit.inter)
+anova.haplo.glm(glmfit.gaus, fit.gaus, fit.inter)
+@
+
+
+
+\section{Extended Applications}
+The following functions are designed to wrap the functionality of the
+major functions in Haplo Stats into other useful applications.
+
+\subsection{Combine Score and Group Results:
+{\sl haplo.score.merge}}
+When analyzing a qualitative trait, such as binary, it can be helpful
+to align the results from {\sl haplo.score} with {\sl haplo.group}.
+To do so, use the function {\sl haplo.score.merge}, as illustrated in
+the following example:
+
+<<scoreMerge,echo=TRUE>>=
+# merge haplo.score and haplo.group results
+merge.bin <- haplo.score.merge(score.bin, group.bin)
+print(merge.bin, nlines=10)
+@
+
+\Explain The first column is a row index, the next columns (3 in this example)
+illustrate the haplotype, the {\sl Hap.Score} column is the score
+statistic and {\sl p.val} the corresponding $ \chi^{2} $ p-value.
+{\sl Hap.Freq} is the haplotype frequency for the total sample, and
+the remaining columns are the estimated haplotype frequencies
+for each of the group levels ({\sl y.bin} in this example). The default
+print method only prints results for haplotypes appearing in the
+{\sl haplo.score} output. To view all haplotypes, use the print option
+{\sl all.haps=TRUE}, which prints all haplotypes from the
+{\sl haplo.group} output. The output is ordered by the score
+statistic, but the {\sl order.by} parameter can specify ordering by
+haplotypes or by haplotype frequencyies.
+
+
+\subsection{Case-Control Haplotype Analysis: {\sl haplo.cc} \label{hapCC} }
+It is possible to combine the results of {\sl haplo.score},
+{\sl haplo.group}, and {\sl haplo.glm} for case-control data, all
+performed within {\sl haplo.cc}. The function peforms a score test and
+a glm on the same haplotypes. The parameters that determine which
+haplotypes are used are {\sl haplo.min.count} and {\sl haplo.freq.min},
+which are set in the {\sl control} parameter, as done for {\sl
+ haplo.glm}. This is a change from previous versions, where
+{\sl haplo.min.count} was in the parameter list for {\sl haplo.cc}.
+
+Below we run {\sl haplo.cc} setting the minimum haplotype
+frequency at $0.02$. The print results are shown, in addition to the
+names of the objects stored in the {\sl cc.hla} result.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<hapCC, echo=TRUE >>=
+
+# demo haplo.cc where haplo.min.count is specified
+# use geno, and this function prepares it for haplo.glm
+y.bin <- 1*(hla.demo$resp.cat=="low")
+cc.hla <- haplo.cc(y=y.bin, geno=geno, locus.label = label,
+ control=haplo.glm.control(haplo.freq.min=.02))
+
+##, em.c=haplo.em.control(iseed=10))) perhaps not needed??
+
+print(cc.hla, nlines=25, digits=2)
+names(cc.hla)
+@
+
+\Explain First, from the names function we see that {\sl cc.hla} also contains
+{\sl score.lst} and {\sl fit.lst}, which are the {\sl haplo.score} and
+{\sl haplo.glm} objects, respectively.
+For the printed results of {\sl haplo.cc}, first are the global statistics
+from {\sl haplo.score}, followed by cell counts for cases and
+controls. The last portion of the output is a data frame containing combined
+results for individual haplotypes:
+
+\begin{itemize}
+\item{} {\bf Hap-Score:}{\ haplotype score statistic}
+\item{} {\bf p-val:}{\ haplotype score statistic p-value}
+\item{} {\bf sim p-val:}{\ (if simulations performed) simulated p-value for the haplotype score statistic}
+\item{} {\bf pool.hf:}{\ haplotype frequency for the pooled sample}
+\item{} {\bf control.hf:}{\ haplotype frequencies for the control sample only}
+\item{} {\bf case.hf:}{\ haplotype frequencies for the case sample only}
+\item{} {\bf glm.eff:}{\ one of three ways the haplotype appeared in the
+ glm model: {\sl Eff}: modeled as an effect; {\sl Base}: part of the
+ baseline; and {\sl R}: a rare haplotype, included in the effect of
+ pooled rare haplotypes}
+\item{} {\bf OR.lower:}{\ Odds Ratio confidence interval
+ lower limit}
+\item{} {\bf OR:}{\ Odds Ratio for each effect in the model}
+\item{} {\bf OR.upper:}{\ Odds Ratio confidence interval upper limit}
+\end{itemize}
+
+Significance levels are indicated by the p-values for the score statistics,
+and the odds ratio (OR) confidence intervals for the haplotype effects. Note
+that the Odds Ratios are effect sizes of haplotypes, assuming haplotype
+effects are multiplicative. Since this last table has many columns,
+lines are wrapped in the output in this manual. You can align wrapped
+lines by the haplotype code which appears on the far left.
+Alternatively, instruct the print function to only print {\sl digits}
+significant digits, and set the width settings for output in your session
+using the {\sl options()} function.
+
+
+\subsection{Score Tests on Sub-Haplotypes: {\sl haplo.score.slide}}
+To evaluate the association of sub-haplotypes (subsets of alleles from
+the full haplotype) with a trait, the user can evaluate a "window" of
+alleles by {\sl haplo.score}, and slide this window across the entire
+haplotype. This procedure is implemented by the function
+{\sl haplo.score.slide}. To illustrate this method, we use all 11 loci
+in the demo data, {\sl hla.demo}.
+
+First, make the geno matrix and the locus labels for the 11 loci.
+Then use {\sl haplo.score.slide} for a window of 3 loci
+({\sl n.slide=3}), which will slide along the haplotype for all 9
+contiguous subsets of size 3, using the previously defined gaussian trait
+{\sl resp}.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+
+<<label=scoreSlide, echo=TRUE, eval=TRUE >>=
+# haplo.score on 11 loci, slide on 3 consecutive loci at a time
+geno.11 <- hla.demo[,-c(1:4)]
+label.11 <- c("DPB","DPA","DMA","DMB","TAP1","TAP2","DQB","DQA","DRB","B","A")
+score.slide.gaus <- haplo.score.slide(hla.demo$resp, geno.11, trait.type =
+ "gaussian", n.slide=3, min.count=5, locus.label=label.11)
+print(score.slide.gaus)
+@
+
+\Explain The first column is the row index of the nine calls to
+{\sl haplo.score}, the second column is the number of the starting
+locus of the sub-haplotype, the third column is the global score
+statistic p-value for each call. The last two columns are the
+simulated p-values for the global and maximum score statistics,
+respectively. If you specify {\sl simulate=TRUE} in the function
+call, the simulated p-values would be present.
+
+\subsubsection{Plot Results from {\sl haplo.score.slide}}
+The results from {\sl haplo.score.slide} can be easily viewed in a plot
+shown in Figure~\ref{slidePlot} below.
+The x-axis has tick marks for each locus, and the y-axis is the
+$ -log_{10}(pval) $. To select which p-value to plot, use the
+parameter pval, with choices "{\sl global}", "{\sl global.sim}", and
+"{\sl max.sim}" corresponding to p-values described above. If the
+simulated p-values were not computed, the default is to plot the
+global p-values. For each p-value, a horizontal line is drawn at the
+height of $ -log_{10}(pval) $\ across the loci over which it was
+calculated. For example, the p-value {\sl score.global.p = 0.009963}
+for loci 8-10 is plotted as a horizontal line at $ y=2.002 $
+spanning the $8^{th}$, $9^{th}$, and $10^{th}$ x-axis tick marks.
+
+\begin{figure}[ht]
+<<label=plotSlide, fig=TRUE, echo=TRUE >>=
+# plot global p-values for sub-haplotypes from haplo.score.slide
+plot.haplo.score.slide(score.slide.gaus)
+@
+ \caption{Global p-values for sub-haplotypes;
+ Gaussian Response} \label{slidePlot}
+\end{figure}
+\clearpage
+
+\subsection{Scanning Haplotypes Within a Fixed-Width Window: {\sl haplo.scan}}
+Another method to search for a candidate locus within a genome region
+is {\sl haplo.scan}, an implementation of the method proposed in Cheng
+et al. 2005 \cite{Cheng2005}. This method searches for a region for which the haplotypes
+have the strongest association with a binary trait by sliding a window of fixed
+width over each marker locus, and then scans over all haplotype lengths within
+each window. This latter step, scanning over all possible haplotype
+lengths within a window, distinguishes {\sl haplo.scan} from
+{\sl haplo.score.slide} (which considers only the maximum haplotype
+length within a window). To account for unknown linkage phase, the
+function {\sl haplo.em} is called prior to scanning, to create a list of
+haplotype pairs and posterior probabilities. To illustrate the scanning
+window, consider a 10-locus dataset. When placing a window of
+width 3 over locus 5, the possible haplotype lengths that contain locus 5
+are three (loci 3-4-5, 4-5-6, and 5-6-7), two (loci 4-5 and 5-6) and one
+(locus 5).
+For each of these loci subsets a score statistic is computed, which is
+based on the difference between the mean vector of haplotype counts
+for cases and that for controls. The maximum of these score
+statistics, over all possible haplotype lengths within a window, is
+the locus-specific test statistic, or the locus scan statistic. The global
+test statistic is the maximum over all computed score statistics. To
+compute p-values, the case/control status is randomly permuted. Below we
+run {\sl haplo.scan} on the 11-locus HLA dataset with a binary
+response and a window width of $3$, but first we use the results of
+{\sl summaryGeno} to choose subjects with less than $50,000$ haplotype
+pairs to speed calculations with all $11$ polymorphic loci
+with many missing alleles.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+
+<<<hapScan, echo=TRUE, eval=TRUE >>=
+geno.11 <- hla.demo[,-c(1:4)]
+y.bin <- 1*(hla.demo$resp.cat=="low")
+hla.summary <- summaryGeno(geno.11, miss.val=c(0,NA))
+
+# track those subjects with too many possible haplotype pairs ( > 50,000)
+many.haps <- (1:length(y.bin))[hla.summary[,4] > 50000]
+
+# For speed, or even just so it will finish, make y.bin and geno.scan
+# for genotypes that don't have too many ambigous haplotypes
+geno.scan <- geno.11[-many.haps,]
+y.scan <- y.bin[-many.haps]
+
+# scan haplotypes for regions within width of 3 for each locus.
+# test statistic measures difference in haplotype counts in cases and controls
+# p-values are simulated for each locus and the maximum statistic,
+# we do 100 simuations here, should use default settings for analysis
+
+scan.hla <- haplo.scan(y.scan, geno.scan, width=3,
+ sim.control=score.sim.control(min.sim=100, max.sim=100),
+ em.control=haplo.em.control())
+
+print(scan.hla)
+@
+
+\Explain In the output we report the simulated p-values for each
+locus test statistic. Additionally, we report the loci (or locus)
+which provided the maximum observed test statistic, and the
+{\sl Max-Stat Simulated Global p-value} is the simulated p-value for that
+maximum statistic. We print the number of simulations, because they
+are performed until p-value precision criteria are met, as described
+in section~\ref{scoreSim}. We would typically allow simulations to
+run under default parameters rather than limiting to $100$ by the
+control parameters.
+
+\subsection{Sequential Haplotype Scan Methods: {\sl seqhap} \label{seqhap}}
+
+Another approach for choosing loci for haplotype associations is by
+{\sl seqhap}, as described in Yu and Schaid, 2007 \cite{Yu}.
+The {\sl seqhap} method performs three tests for association
+of a binary trait over a set of bi-allelic loci. When evaluating each
+locus, loci close to it are added in a sequential manner based on the
+Mantel-Haenszel test \cite{Mantel}. For each marker locus, three tests
+are provided:
+
+\begin{itemize}
+\item{\bf single locus}, the traditional single-locus $\chi^{2}_1$ test
+ of association,
+\item{\bf sequential haplotype}, based on a haplotype test for sequentially
+ chosen loci,
+\item{\bf sequential sum}, based on the sum of a series of conditional
+ $\chi^{2}$ statistics.
+\end{itemize}
+
+All three tests are assessed for significance with permutation
+p-values, in addition to the asymptotic p-value. The point-wise p-value
+for a statistic at a locus is the fraction of times that the statistic
+for the permuted data is larger than that for the observed data. The
+regional p-value is the chance of observing a permuted test statistic,
+maximized over a region, that is greater than that for the observed
+data.
+
+Similar to the permutation p-values in {\sl haplo.score} as described in
+section \ref{scoreSim}, permutations are performed until a precision threshold
+is reached for the regional p-values. A minimum and maximum number of
+permutations specified in the {\sl sim.control} parameter list ensure
+a certain accuracy is met for every simulation p-value, yet having a
+limit to avoid infinite run-time.
+
+Below is an example of using {\sl seqhap} on data with case-control
+response for a chromosome region. First set up the binary response, y,
+with 0=control, 1=case, then a genotype matrix with two columns per
+locus, and a vector of chromosome positions. The genotype data is available
+in the {\sl seqhap.dat} dataset while the chromosome positions are in
+{\sl seqhap.pos}. The following example runs {\sl seqhap} with default
+settings for permutations and threshold parameters.
+
+<<seed, eval=TRUE, echo=FALSE>>=
+set.seed(seed)
+@
+<<seqhap, echo=TRUE, eval=TRUE>>=
+# define binary response and genotype matrix
+data(seqhap.dat)
+data(seqhap.pos)
+y <- seqhap.dat$disease
+geno <- seqhap.dat[,-1]
+# get vector with chrom position
+pos <- seqhap.pos$pos
+seqhap.out <- seqhap(y=y, geno=geno, pos=pos, miss.val=c(0,NA),
+ r2.threshold=.95, mh.threshold=3.84)
+seqhap.out$n.sim
+print(seqhap.out)
+@
+
+\Explain
+The output from this example first shows {\sl n.sim}, the number of
+permutations needed for precision on the regional p-values. Next, in
+the printed results, the first section ({\sl Single-locus Chi-square
+Test}) shows a table with columns for single-locus tests. The table
+includes test statistics, permuted p-values, and asymptotic p-values
+based on a $\chi^{2}_1$ distribution. The second section ({\sl Sequential Scan}) shows which loci are
+combined for association. In this example, the table shows the first locus is
+not combined with other loci, whereas the second locus is combined
+with loci 3, 4, and 5. The third section ({\sl Sequential Haplotype
+ Test}), shows the test statistics for the sequential haplotype method with
+degrees of freedom and permuted and asymptotic p-values. The fourth
+section ({\sl Sequential Sum Test}) shows similar information for the
+sequential sum tests.
+
+\subsubsection{Plot Results from {\sl seqhap}}
+The results from {\sl seqhap} can be viewed in a useful plot
+shown in Figure~\ref{seqhapPlot}. The plot is similar to the plot
+for {\sl haplo.score.slide} results, with the x-axis having tick marks for all
+loci and the y-axis is the -log10() of p-value for the tests performed.
+For the sequential result for each locus, a horizontal line at the
+height of -log10(p-value) is drawn across the loci combined. The start
+locus is indicated by a filled triangle and other loci combined with
+the start locus are indicated by an asterisk or circle.
+The choices for pval include {\sl "hap"} (sequential haplotype asymptotic
+p-value), {\sl "hap.sim"} (sequential haplotype simulated p-value),
+{\sl "sum"} (sequential sum asymptotic p-value), and {\sl "sum.sim"}
+(sequential sum simulated p-value). The other parameter option
+is {\sl single}, indicating whether to plot a line for the single-locus tests.
+
+\begin{figure}[tpb]
+<<label=plotSeqHap,fig=TRUE, echo=TRUE >>=
+# plot global p-values for sub-haplotypes from haplo.score.slide
+plot(seqhap.out, pval="hap", single=TRUE, las=2)
+@
+ \caption{Plot p-values for sequential haplotype scan
+ and single-locus tests} \label{seqhapPlot}
+\end{figure}
+\clearpage
+
+\subsection{Creating Haplotype Effect Columns: {\sl haplo.design}
+\label{hapDesign}}
+
+In some instances, the desired model for haplotype effects is not
+possible with the methods given in {\sl haplo.glm}. Examples
+include modeling just one haplotype effect, or modeling an
+interaction of haplotypes from
+different chromosomes, or analyzing censored data. To circumvent
+these limitations, we provide a function called {\sl haplo.design}, which
+will set up an expected haplotype design matrix from a {\sl haplo.em}
+object, to create columns that can be used to model haplotype effects
+in other modeling functions.
+
+The function {\sl haplo.design} first creates a design marix for all
+pairs of haplotypes over all subjects, and then uses the posterior
+probabilities to create a weighted average contribution for each
+subject, so that the number of rows of the final design matrix is
+equal to the number of subjects. This is sometimes called the
+expectation-substitution method, as proposed by Zaykin et al.
+2002 \cite{Zaykin}, and using this haplotype design matrix in a
+regression model is asymptotically equivalent to the score statistics
+from {\sl haplo.score} (Xie and Stram 2005 \cite{XieStram}).
+Although this provides much flexibility, by using the design matrix in
+any type of regression model, the estimated regression parameters can
+be biased toward zero (see Lin and Zeng, 2006 \cite{LinZeng} for concerns
+about the expectation-substitution method).
+
+In the first example below, using default parameters, the returned data.frame
+contains a column for each haplotype that meets a minimum count in the
+sample {\sl min.count}. The columns are named by the code they are
+assigned in {\sl haplo.em}.
+
+<<label=hapDesign1, echo=TRUE>>=
+# create a matrix of haplotype effect columns from haplo.em result
+hap.effect.frame <- haplo.design(save.em)
+
+names(hap.effect.frame)
+
+hap.effect.frame[1:10,1:8]
+@
+
+Additionally, {\sl haplo.design} gives the user flexibility to make
+a more specific design matrix with the following parameters:
+
+\begin{itemize}
+\item{}{\bf hapcodes:\ }{codes assigned in the {\sl haplo.em} object,
+ the only haplotypes to be made into effects}
+\item{}{\bf haplo.effect:\ }{the coding of haplotypes as additive,
+ dominant, or recessive}
+\item{}{\bf haplo.base:\ }{code for the baseline haplotype}
+\item{}{\bf min.count:\ }{minimum haplotype count}
+\end{itemize}
+
+This second example below creates columns for specific haplotype
+codes that were most interesting in {\sl score.gaus.add}, haplotypes
+with alleles 21-3-8 and 62-2-7, corresponding to codes 4 and 138 in
+{\sl haplo.em}, respectively. Assume we want to test their individual
+effects when they are coded with {\sl haplo.effect=''dominant''}.
+
+<<label=hapDesign2,echo=TRUE>>=
+# create haplotype effect cols for haps 4 and 138
+hap4.hap138.frame <- haplo.design(save.em, hapcodes=c(4,138),
+ haplo.effect="dominant")
+
+hap4.hap138.frame[1:10,]
+
+dat.glm <- data.frame(resp, male, age,
+ hap.4=hap4.hap138.frame$hap.4,
+ hap.138=hap4.hap138.frame$hap.138)
+
+glm.hap4.hap138 <- glm(resp ~ male + age + hap.4 + hap.138,
+ family="gaussian", data=dat.glm)
+summary(glm.hap4.hap138)
+@
+
+\pagebreak
+
+\section{License and Warranty}
+
+\noindent License:\linebreak
+
+\noindent Copyright 2003 Mayo Foundation for Medical Education and Research. \linebreak
+
+\noindent This program is free software; you can redistribute it and/or modify
+it under the terms of
+the GNU General Public License as published by the Free Software
+Foundation; either
+version 2 of the License, or (at your option) any later version.\\
+
+\noindent This program is distributed in the hope that it will be useful, but
+WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details. \\
+
+\noindent You should have received a copy of the GNU General Public License
+along with this program; if not, write to \\
+\noindent Free Software Foundation, Inc. \\
+59 Temple Place, Suite 330 \\
+Boston, MA 02111-1307 USA \\
+
+\noindent For other licensing arrangements, please contact Daniel J. Schaid.\\
+Daniel J. Schaid, Ph.D.\\
+Division of Biostatistics\\
+Harwick Building - Room 775\\
+Mayo Clinic\\
+200 First St., SW\\
+Rochester, MN 55905\\
+phone: 507-284-0639\\
+fax:\ \ \ 507-284-9542\\
+email: schaid at mayo.edu\\
+
+
+\section{Acknowledgements}
+This research was supported by United States Public Health Services,
+National Institutes of Health; Contract grant numbers R01 DE13276,
+R01 GM 65450, N01 AI45240, and R01 2AI33144. The {\sl hla.demo} data is
+kindly provided by Gregory A. Poland, M.D. and the Mayo Vaccine
+Research Group for illustration only, and may not be used for
+publication.
+
+\pagebreak
+
+\appendix
+\noindent{\large {\bf Appendix}}
+
+\section{Counting Haplotype Pairs When Marker Phenotypes
+ Have Missing Alleles}
+
+The following describes the process for counting the number of
+haplotype pairs that are consistent with a subject's observed marker
+phenotypes, allowing for some loci with missing data. Note that we
+refer to marker phenotypes, but our algorithm is oriented towards
+typical markers that have a one-to-one correspondence with their
+genotypes. We first describe how to count when none of the loci
+have missing alleles, and then generalize to allow loci to have
+either one or two missing alleles. When there are no missing alleles,
+note that homozygous loci are not ambiguous with respect to the
+underlying haplotypes, because at these loci the underlying
+haplotypes will not differ if we interchange alleles between
+haplotypes. In contrast, heterozygous loci are ambiguous,
+because we do not know the haplotype origin of the
+distinguishable alleles (i.e., unknown linkage phase).
+However, if there is only one heterozygous locus, then it
+doesn't matter if we interchange alleles, because the pair of
+haplotypes will be the same. In this situation, if parental
+origin of alleles were known, then interchanging alleles would
+switch parental origin of haplotypes, but not the composition
+of the haplotypes. Hence, ambiguity arises only when there
+are at least two heterozygous loci. For each heterozygous
+locus beyond the first one, the number of possible haplotypes
+increases by a factor of 2, because we interchange the two
+alleles at each heterozygous locus to create all possible
+pairs of haplotypes. Hence, the number of possible haplotype
+pairs can be expressed as $2^{x}$, where $x = H-1$, if $H$\ (the number of
+heterozygous loci) is at least 2, otherwise $x = 0$.
+
+Now consider a locus with missing alleles. The possible
+alleles at a given locus are considered to be those that
+are actually observed in the data. Let $a_{i}$\ denote the number
+of distinguishable alleles at the locus. To count the number
+of underlying haplotypes that are consistent with the observed
+and missing marker data, we need to enumerate all possible
+genotypes for the loci with missing data, and consider whether
+the imputed genotypes are heterozygous or homozygous.
+
+To develop our method, first consider how to count the number
+of genotypes at a locus, say the $i^{th}$ locus, when either one or two
+alleles are missing. This locus could have either a homozygous
+or heterozygous genotype, and both possibilities must be
+considered for our counting method. If the locus is considered
+as homozygous, and there is one allele missing, then there is
+only one possible genotype; if there are two alleles missing,
+then there are $a_{i}$ possible genotypes. A function to perform this
+counting for homozygous loci is denoted $f(a_{i})$. If the locus is
+considered as heterozygous, and there is one allele missing,
+then there are $ a_{i}-1$\ possible genotypes; if there are two alleles
+missing, then there are $\frac{a_{i}(a_{i}-1)}{2}$\ possible
+genotypes. A function to
+perform this counting for heterozygous loci is denoted $g(a_{i})$
+These functions and counts are summarized in Table A.1. \\
+
+\noindent {\bf Table A.1:} Factors for when a locus having missing allele(s)
+is counted as homozygous($f()$) or heterozygous($g()$)\\
+
+\begin{table}[h]
+\begin{center}
+\begin{tabular}{|r|c|c|}
+\hline
+Number of & Homozygous & Heterozygous \\
+missing alleles & function $f(a_{i})$ & function $g(a_{i})$ \\
+\hline \hline
+1 & 1 & $a_{i} - 1$ \\ \hline
+2 & $ a_{i}$ & $\frac{a_{i}(a_{i}-1)}{2}$ \\ \hline
+\end{tabular}
+\end{center}
+\end{table}
+
+Now, to use these genotype counting functions to determine the number
+of possible haplotype pairs, first consider a simple case where only
+one locus, say the $i^{th}$ locus, has two missing alleles. Suppose that
+the phenotype has H heterozygous loci (H is the count of heterozygous
+loci among those without missing data). We consider whether the locus with
+missing data is either homozygous or heterozygous, to give the count of possible
+haplotype pairs as
+
+\begin{equation}
+a_{i}2^{x} + \left[{\frac{a_{i}(a_{i}-1)}{2}}\right]2^{x+1}
+\end{equation}
+
+\noindent where again $x = H-1$\ if H is at least 2, otherwise x = 0. This special
+case can be represented by our more general genotype counting functions as
+
+\begin{equation}
+f(a_{i})\,2^{x} + g(a_{i})\,2^{x+1}
+\end{equation}
+
+When multiple loci have missing data, we need to sum over all possible
+combinations of heterozygous and homozygous genotypes for the
+incomplete loci. The rows of Table A.2 below present these
+combinations for up to $m=3$\ loci with missing data. Note that as the
+number of heterozygous loci increases (across the columns of Table A.2),
+so too does the exponent of 2. To calculate the total number of pairs
+of haplotypes, given observed and possibly missing genotypes, we need
+to sum the terms in Table A.2 across the appropriate row. For example,
+with $m=3$, there are eight terms to sum over. The general
+formulation for this counting method can be expressed as
+\begin{equation}
+Total Pairs = \sum_{j=0}^{m} \sum_{combo} C(combo,j)
+\end{equation}
+
+\noindent where combo is a particular pattern of heterozygous and homozygous
+loci among the loci with missing values (e.g., for $m = 3$, one
+combination is the first locus heterozygous and the $2^{nd}$\ and
+$3^{rd}$\ third as homozygous), and $C(combo,j)$\ is the corresponding count for
+this pattern when there are $i$\ loci that are heterozygous (e.g., for
+$m = 3$\ and $j = 1$\, , as illustrated in Table A.2).
+
+
+\noindent {\bf Table A.2:} Genotype counting terms when $m$ loci
+have missing \\
+alleles, grouped by number of heterozygous loci (out of $m$)\\
+\begin{table}[h]
+\begin{center}
+\begin{tabular}{|r||r|r|r|r|}
+\hline
+$m$ & $j=0\,of\, m$ & $j=1\, of\, m$ &$j=2\, of\, m$ & $j=3\, of\, m$ \\
+\hline \hline
+$0$ & $2^{x}$ & & & \\
+\hline
+$1$ & $f(a_{1})2^{x}$ & $g(a_{1})2^{x+1}$ & & \\
+\hline
+$2$ & $f(a_{1})f(a_{2})2^{x}$ & $g(a_{1})f(a_{2})2^{x+1}$ &$g(a_{1})g(a_{2})2^{x+1}$ & \\
+ & & $f(a_{1})g(a_{2})2^{x+1}$ & & \\
+\hline
+$3$ & $f(a_{1})f(a_{2})f(a_{3})2^{x}$&$g(a_{1})f(a_{2})f(a_{3})2^{x+1}$&$g(a_{1})g(a_{2})f(a_{3})2^{x+2}$&
+ $g(a_{1})g(a_{2})g(a_{3})2^{x+2}$ \\
+ & &$f(a_{1})g(a_{2})f(a_{3})2^{x+1}$&$g(a_{1})f(a_{2})g(a_{3})2^{x+2}$& \\
+ & &$f(a_{1})f(a_{2})g(a_{3})2^{x+1}$&$f(a_{1})g(a_{2})g(a_{3})2^{x+2}$& \\
+\hline
+\end{tabular}
+\end{center}
+\end{table}
+
+\pagebreak
+
+\begin{thebibliography}{}
+
+ %% using AMA, listed in order of appearance
+\bibitem{Clayton}
+ Clayton, David. Personal web page, software list.
+ \mbox{<http://www-gene.cimr.cam.ac.uk/clayton/software/>.}
+ Accessed April 1, 2004.
+
+\bibitem{Schaid 2005}
+ Schaid DJ. Power and Sample Size for Testing Associations of
+ Haplotypes with Complex Traits. {\em Annals of Human Genetics}
+ 2005;70:116-130.
+
+\bibitem{Schaid et al. 2002}
+ Schaid DJ, Rowland CM, Tines DE, Jacobson RM,
+ Poland GA. Score tests for association between traits and
+ haplotypes when linkage phase is ambiguous. {\em Am J Hum Genet}
+ 2002;70:425-34.
+
+\bibitem{Zaykin}
+ Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG.
+ Testing Association of Statistically Inferred Haplotypes with Discreet
+ and Continuous Traits in Samples of Unrelated Individuals.
+ {\em Human Heredity} 2002;53:79-91.
+
+\bibitem{Harrell}
+ Harrell, FE. {\em Regression Modeling Strategies.} New York: Springer-Verlag; 2001.
+
+\bibitem{Besag and Clifford 1991}
+ Besag J, Clifford P. Sequential Monte Carlo p-Values.
+ {\em Biometrika} 1991;78:301-304.
+
+\bibitem{Lake et al. 2003}
+ Lake S, Lyon H, Silverman E, Weiss S, Laird N, Schaid D.
+ Estimation and tests of haplotype-environment interaction
+ when linkage phase is ambiguous. {\em Human Heredity} 2003;55:56-65.
+
+\bibitem{Stram et al. 2003}
+ Stram D, Pearce C, Bretsky P, Freedman M, Hirschhorn J,
+ Altshuler D, Kolonel L, Henderson B, Thomas D. Modeling and
+ E-M estimation of haplotype-specific relative risks from genotype
+ data for case-control study of unrelated individuals.
+ {\em Hum Hered} 2003;55:179-190.
+
+\bibitem{Epstein and Satten 2003}
+ Epstein MP, Satten GA. Inference on haplotype effects
+ in case-control studies using unphased genotype data.
+ {\em Am J Hum Genet} 2003;73:1316-1329.
+
+\bibitem{McCullaghNelder}
+ McCullagh P, Nelder JA. {\em Generalized Linear Models, Second Edition}.
+ Boca Raton, FL: Chapman and Hall. 1989:35-36.
+
+%\bibitem{Cheng2003}
+% Cheng R, Ma JZ, Wright FA, Lin S, Gau X, Wang D, Elston RC, Li
+% MD. Nonparametric disequilibrium mapping of functional sites using
+% haplotypes of multiple tightly linked single-nucleotide polymorphism markers.
+% {\em Genetics} 2003;164:1175-1187.
+
+\bibitem{Cheng2005}
+ Cheng R, Ma JZ, Elston RC, Li MD. Fine Mapping Functional Sites or
+ Regions from Case-Control Data Using Haplotypes of Multiple Linked SNPs.
+ {\em Annals of Human Genetics} 2005;69: 102-112.
+
+\bibitem{Yu}
+ Yu Z, Schaid DJ. Sequential haplotype scan methods for
+ association analysis. {\em Gen Epi} 2007;31:553-564.
+
+\bibitem{Mantel}
+ Mantel N, Haenszel W. Statistical aspects of the analysis of
+ data from retrospective studies of disease. {\em J Nat Cancer Inst}
+ 1959;22:719-48.
+
+\bibitem{XieStram}
+ Xie R, Stram DO. Asymptotic equivalence between two score tests
+ for haplotype-specific risk in general linear models. {\em Gen Epi}
+ 2005;29:166-170.
+
+\bibitem{LinZeng}
+ Lin DY, Zeng D. Likelihood-based inference on haplotype effects
+ in genetic association studies. {\em J Am Stat Assoc} 2006;101:473.
+
+\end{thebibliography}
+
+\end{document}
Added: trunk/packages/R/r-cran-haplo.stats/trunk/debian/docs
===================================================================
--- trunk/packages/R/r-cran-haplo.stats/trunk/debian/docs (rev 0)
+++ trunk/packages/R/r-cran-haplo.stats/trunk/debian/docs 2012-02-10 22:52:56 UTC (rev 9642)
@@ -0,0 +1 @@
+inst/doc/manualHaploStats.pdf
Modified: trunk/packages/R/r-cran-haplo.stats/trunk/debian/rules
===================================================================
--- trunk/packages/R/r-cran-haplo.stats/trunk/debian/rules 2012-02-10 15:27:06 UTC (rev 9641)
+++ trunk/packages/R/r-cran-haplo.stats/trunk/debian/rules 2012-02-10 22:52:56 UTC (rev 9642)
@@ -4,6 +4,8 @@
# Copyright 2012 by Andreas Tille <tille at debian.org>
# GPL
+DEB_COMPRESS_EXCLUDE_ALL := .pdf
+
include /usr/share/R/debian/r-cran.mk
# Require a number equal or superior than the R version the package was built with.
More information about the debian-med-commit
mailing list