Major overhaul from last version (1.9.0, last updated 2018-02-11). Overall, added unit testing to all functions, which resulted in the identification and fixing of several edge-case bugs, and also minor improvements.
model.gof
: entirely redundant with sHWE
in this same packageread.tped.recode
: obsolete file format; instead, use plink
for file conversions!read.bed
: instead, use read_plink
from the genio
package!center
: only worked for matrices without missingness (useless for real data), no dependencies in code, plus centering is trivial in Rtrunc.svd
-> trunc_svd
af_snp
and pca_af
already had underscores instead of periods (unchanged).
All other functions unchanged.trunc_svd
d=1
case (output matrices were dropped to vectors, which was a fatal error in lfa
after it called trunc_svd
).force
that, when TRUE
, forces the Lanczos algorithm to be used in all cases (most useful for testing purposes).lfa
af_snp
and af
NaN
allele frequencies instead of 1 as they should be in the limit.NA
genotypes.
Original version preserved NA
values (a genotype that was NA
for a particular individual and locus combination resulted in an NA
in the corresponding allele frequency only, and conversely non-NA
genotypes never resulted in NA
allele frequencies).
The new version takes advantage of knowing the LFs of all individuals (regardless of genotype missingness), and LFs and their coefficients are never NA
, permitting allele frequencies to always be imputed into non-NA
values!pca_af
NA
genotypes (internal change was trivial)d=1
case, which incorrectly returned an intercept column matrix instead of an allele frequency matrix.check_geno
Function sHWE
(through internal inverse_2x2
)
NA
p-value.NA
cases here are avoided now that af
never returns NA
values.Internal changes
.gitignore
files from another project.src/lfa.so
from version control tracking.testthat
.R CMD check
requirements.lfa
added support for BEDMatrix objects for the genotype matrix X
.m
is very large, so it enables analysis of larger datasets.n
gets larger.RSpectra
package dependency (for fast truncated eigendecomposition).X
. Although BEDMatrix is supported, in these cases there are minimal memory reduction advantages as outputs or intermediate matrices are necessarily as large as the input genotype data.af
. Although there is memory saving by not loading X
entirely into memory, the output individual-specific allele frequency matrix P
has the same dimensions so memory usage may still be excessive for in large datasets, negating the BEDMatrix advantage.pca_af
. Note same memory usage issue as af
.sHWE
. A worse memory problem is present, since internal code calculates the entire individual-specific allele frequency matrix P
, then simulates B
random genotype matrices of the same dimensions as input (each in memory) from which LF
and ultimately HWE statistics are obtained.sHWE
(in internal function compute_nulls
), which only occurred if the number of individuals times the number of loci exceeded the maximum integer size in R (the value of .Machine$integer.max
, which is 2,147,483,647 in my 64-bit machine).lfa
added rspectra
option (FALSE
by default), as an alternative way of calculating SVD internally (for experimental use only).trunc_svd
is now exported.sHWE
fixed bug: an error could occur when internal statistics vector included NA
values.NA
values:Error in while ((i2 <= B0) & (obs_stat[i1] >= stat0[i2])) { :
missing value where TRUE/FALSE needed
pvals_empir
, and its tested against a new naive version pvals_empir_brute
(slower brute-force algorithm, used to validate outputs only) in unit tests including simulated data with NA
values.sHWE
code into a new internal function gof_stat
, which by itself better handles BEDMatrix files (though overall memory savings are not yet there on the broader sHWE
).af_snp
, af
, and sHWE
added parameters max_iter
(default 100) and tol
(default 1e-10).max_iter = 20
used to be the default value, which in downstream tests was not routinely sufficient to converge with comparable numerical accuracy to glm
fits (not in this package lfa
, but in downstream packages gcatest
and jackstraw
, which require calculating deviances).trunc_svd
:seed
, ltrace
, and V
options.maxit
option.tol
from 1e-10 to .Machine$double.eps
(about 2e-16)lfa
:tol
from 1e-13 to .Machine$double.eps
(about 2e-16)Authors@R
.README.md
.NEWS.md
slightly to improve its automatic parsing.README.md
, inst/CITATION
.read.bed
has been removed.center
model.gof
read.bed
read.tped.recode
1:x
with seq_len(x)
several functions.reformatR
and otherwise match Bioconductor guidelines.inverse_2x2
, probably speeding up sHWE
slightly.mv
(all instances called C code directly instead of this R wrapper).trunc_svd
source considerably.LICENSE.md
.README.md
.BEDMatrix
for loading data.fastmat.c: In function ‘mv’:
fastmat.c:22:14: error: too few arguments to function ‘dgemv_’
22 | F77_CALL(dgemv)(&tr,dimA,dimA+1,&alpha,A,dimA,v,&one,&zero,ret,&one);
| ^~~~~
/home/biocbuild/bbs-3.17-bioc/R/include/R_ext/RS.h:77:25: note: in definition of macro ‘F77_CALL’
77 | # define F77_CALL(x) x ## _
| ^
/home/biocbuild/bbs-3.17-bioc/R/include/R_ext/BLAS.h:107:10: note: declared here
107 | F77_NAME(dgemv)(const char *trans, const int *m, const int *n,
| ^~~~~
...
make: *** [/home/biocbuild/bbs-3.17-bioc/R/etc/Makeconf:176: fastmat.o] Error 1
ERROR: compilation failed for package ‘lfa’
R CMD check --as-cran now uses FC_LEN_T
(I was testing locally using --as-cran
, perhaps it manifests later otherwise.)FC_LEN_T
led me to R news, which pointed me to Writing R Extensions: 6.6.1 Fortran character strings, which shows that an argument of type FC_LEN_T
now has to be added to specify the length of a string passed to Fortran code.dgemv
in the R source code and came across array.c
examples where it sufficed to append the C macro FCONE
to my existing dgemv
call, and that solves it!
(FCONE
, defined in R_ext/BLAS.h
, equal to ,(FC_LEN_T)1
if FC_LEN_T
has been defined, otherwise it is blank.)--as-cran
notes:README.md
updated an http
link to https
to which it redirects.sHWE
documentation used \doi
instead of direct link..lreg
against glm
, which differ more often than expected due to poor or lack of convergence.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.