Description Usage Arguments Details Value Note Author(s) References See Also Examples
Given two set of SNPs typed in the same subjects, this function calculates rules which can be used to impute one set from the other in a subsequent sample. The function can also calculate rules for imputing each SNP in a single dataset from other SNPs in the same dataset
1 2 |
X |
An object of class |
Y |
An object of same class as |
pos.X |
The positions of the predictor SNPs. Can be missing if
there is no |
pos.Y |
The positions of the target SNPs. Only required when
a |
phase |
See "Details" below |
try |
The number of potential predictor SNPs to be
considered in the stepwise regression procedure around each target
SNP . The nearest |
stopping |
Parameters of the stopping rule for the stepwise regression (see below) |
use.hap |
Parameters to control use of the haplotype imputation method (see below) |
em.cntrl |
Parameters to control test for convergence of EM algorithm for fitting phased haplotypes (see below) |
minA |
A minimum data quantity measure for estimating pairwise linkage disequilibrium (see below) |
The routine first carries out a series of step-wise least-square
regression analyses in
which each Y SNP is regressed on the nearest try
predictor (X)
SNPs. If
phase
is TRUE
, the regressions will be calculated at the
chromosome (haplotype) level, variances being simply p(1-p) and
covariances estimated from the estimated two-locus haplotypes (this option is
not yet implemented). Otherwise, the
analysis is carried out at the genotype level based on
conventional variance and covariance estimates using the
"pairwise.complete.obs"
missing value treatment
(see cov
). New
SNPs are added to the regression until either (a) the value of
R^2 exceeds the first parameter of stopping
, (b) the
number of "tag" SNPs has reached the maximum set in the second parameter of
stopping
, or (c) the change in R^2 does not achieve the
target set by the third parameter of stopping
. If the third
parameter of stopping
is NA
, this last test is replaced
by a test for improvement in the Akaike information criterion
(AIC).
After choosing the set of "tag" SNPs in this way, a prediction
rule is generated either by calculating phased haplotype frequencies,
either (a) under a log-linear model for linkage disequilibrium with
only first order association terms fitted, or (b) under the
"saturated" model.
These methods do not differ if there is only
one tag SNP but, otherwise, choice between methods is controlled
by the use.hap
parameters.
If the prediction, as measure by R^2 achieved with the
log-linear smoothing model exceeds a
threshold (the first parameter of use.hap
)
then this method is used. Otherwise, if the gain in R^2
achieved by using the second method exceeds the second parameter of
use.hap
, then the second method is used.
Current experience is that, the log-linear method is rarely
preferred with reasonable choices for use.hap
, and imputation
is much faster when the second method only is considered.
The current default ensures that this second method is used,
but the other possibility might be considered if imputing
from very small samples; however this code is not extensively tested
and should be regarded as experimental.
The argument em.cntrl
controls convergence
testing for the EM algorithm for fitting haplotype frequencies and the
IPF algorithm for fitting the log-linear model. The
first parameter is the maximum number of EM iterations, and the second
parameter is the threshold for the change in log likelihood
below which the iteration is judged to have converged. The third and
fourth parameters give the maximum number of IPF iterations and the
convergence tolerance. There should be no need to change the default
values.
All SNPs selected for imputation must have sufficient data for
estimating pairwise linkage disequilibrium with each other and with
the target SNP. The statistic chosen is based on the four-fold tables
of two-locus haplotype frequencies. If the frequencies in such a table
are labelled a, b, c and d then, if ad>bc then
t = min(a,d) and, otherwise, t = min(b,c). The cell
frequencies t must exceed minA
for all pairwise
comparisons.
An object of class
"ImputationRules"
.
The phase=TRUE
option is not yet implemented
David Clayton dc208@cam.ac.uk
Chapman J.M., Cooper J.D., Todd J.A. and Clayton D.G. (2003) Human Heredity, 56:18-31.
Wallace, C. et al. (2010) Nature Genetics, 42:68-71
ImputationRules-class
,
imputation.maf
, imputation.r2
1 2 3 4 5 6 7 8 | # Remove 5 SNPs from a datset and derive imputation rules for them
data(for.exercise)
sel <- c(20, 1000, 2000, 3000, 5000)
to.impute <- snps.10[,sel]
impute.from <- snps.10[,-sel]
pos.to <- snp.support$position[sel]
pos.fr <- snp.support$position[-sel]
imp <- snp.imputation(impute.from, to.impute, pos.fr, pos.to)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.