sme | R Documentation |
SME fits a linear mixed model in order to test for marginal epistasis. It concentrates the scans for epistasis to regions of the genome that have known functional enrichment for a trait of interest.
sme(
plink_file,
pheno_file,
mask_file = NULL,
gxg_indices = NULL,
chunk_size = NULL,
n_randvecs = 10,
n_blocks = 100,
n_threads = 1,
gxg_h5_group = "gxg",
ld_h5_group = "ld",
rand_seed = -1,
log_level = "WARNING"
)
plink_file |
Character. File path to the PLINK dataset
(without *.bed extension).
The function will append |
pheno_file |
Character. File path to a phenotype file in PLINK format. The file should contain exactly one phenotype column. |
mask_file |
Character or NULL. File path to an HDF5 file specifying
per-SNP masks for gene-by-gene interaction tests. This file informs which
SNPs are tested for marginal epistasis. Defaults to |
gxg_indices |
Integer vector or NULL. List of indices corresponding to
SNPs to test for marginal epistasis.
If |
chunk_size |
Integer or NULL. Number of SNPs processed per chunk.
This influences memory
usage and can be left |
n_randvecs |
Integer. Number of random vectors used for stochastic trace estimation. Higher values yield more accurate estimates but increase computational cost. Default is 10. |
n_blocks |
Integer. Number of blocks into which SNPs are divided for processing. This parameter affects memory requirements. Default is 100. |
n_threads |
Integer. Number of threads for OpenMP parallel processing. Default is 1. |
gxg_h5_group |
Character. Name of the HDF5 group within the mask file containing gene-by-gene interaction masks. SNPs in this group will be included in the gene-by-gene interactions. Defaults to "gxg". |
ld_h5_group |
Character. Name of the HDF5 group within the mask file containing linkage disequilibrium masks. SNPs in this group are excluded from analysis. Defaults to "ld". |
rand_seed |
Integer. Seed for random vector generation. If |
log_level |
Character. Logging level for messages. Must be in uppercase (e.g., "DEBUG", "INFO", "WARNING", "ERROR"). Default is "WARNING". |
This function integrates PLINK-formatted genotype and phenotype data to
perform marginal epistasis tests on a set of SNPs. Using stochastic trace
estimation, the method computes variance components for gene-by-gene
interaction and genetic relatedness using the MQS estimator. The process is
parallelized using OpenMP when n_threads > 1
.
The memory requirements and computation time scaling can be optimized through
the parameters chunk_size
, n_randvecs
, and n_blocks
.
Mask Format Requirements
The mask file format is an HDF5 file used for storing index data for the masking process. This format supports data retrieval by index. Below are the required groups and datasets within the HDF5 file:
The required group names can be configured as input parameters. The defaults are described below.
Groups:
ld
: Stores SNPs in LD with the focal SNP. These SNPs will be excluded.
gxg
: Stores indices of SNPs that the marginal epistasis test is conditioned on. These SNPs will be included.
Datasets:
ld/<j>
: For each focal SNP <j>
, this dataset contains indices of SNPs
in the same LD block as that SNP. These SNPs will be excluded from the gene-by-gene interaction covariance matrix.
gxg/<j>
: For each focal SNP <j>
, this dataset contains indices of SNPs to include in the
the gene-by-gene interaction covariance matrix for focal SNP <j>
.
Important: All indices in the mask file data are zero-based, matching the zero-based indices of the PLINK .bim
file.
A list containing:
summary
: A tibble summarizing results for each tested SNP, including:
id
: Variant ID.
index
: Index of the SNP in the dataset.
chromosome
: Chromosome number.
position
: Genomic position of the SNP.
p
: P value for the gene-by-gene interaction test.
pve
: Proportion of variance explained (PVE) by gene-by-gene interactions.
vc
: Variance component estimate.
se
: Standard error of the variance component.
pve
: A long-format tibble of PVE for all variance components.
vc_estimate
: A long-format tibble of variance component estimates.
vc_se
: A long-format tibble of standard errors for variance components.
average_duration
: Average computation time per SNP.
Stamp, J., Pattillo Smith, S., Weinreich, D., & Crawford, L. (2025). Sparse modeling of interactions enables fast detection of genome-wide epistasis in biobank-scale studies. bioRxiv, 2025.01.11.632557.
Stamp, J., DenAdel, A., Weinreich, D., & Crawford, L. (2023). Leveraging the genetic correlation between traits improves the detection of epistasis in genome-wide association studies. G3: Genes, Genomes, Genetics, 13(8), jkad118.
Crawford, L., Zeng, P., Mukherjee, S., & Zhou, X. (2017). Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits. PLoS genetics, 13(7), e1006869.
plink_file <- gsub("\\.bed", "", system.file("testdata", "test.bed", package="smer"))
pheno_file <- system.file("testdata", "test_h2_0.5.pheno", package="smer")
mask_file <- ""
# Parameter inputs
chunk_size <- 10
n_randvecs <- 10
n_blocks <- 10
n_threads <- 1
# 1-based Indices of SNPs to be analyzed
n_snps <- 100
snp_indices <- 1:n_snps
sme_result <- sme(
plink_file,
pheno_file,
mask_file,
snp_indices,
chunk_size,
n_randvecs,
n_blocks,
n_threads
)
head(sme_result$summary)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.