knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(smer)
The Sparse Marginal Epistasis test (SME) requires a HDF5 file containing binary masking data. The effectiveness of the test relies on how informative the mask is for a specific trait. For instance, to test for marginal epistasis related to transcriptional activity, DNase I hypersensitivity sites data can be used to condition the detection on accessible chromatin regions.
Figure 1. The illustration shows open chromatin data. DNase I hypersensitive
sites indicate transcriptional activity. The DNase-seq signal is converted to a
binary mask, where genetic variants in inaccessible chromatin regions are
excluded from the covariance matrix for marginal epistasis.
The sme()
function expects the mask data to be in an HDF5 file. The HDF5
format includes two primary object types:
The mask data should be organized into the following groups and datasets:
Groups:
ld
: Contains SNPs in linkage disequilibrium (LD) with the focal SNP, which
will be excluded.gxg
: Contains indices of SNPs used to condition the marginal epistasis test,
which will be included.The required group names can be configured as input parameters of sme()
.
The defaults are ld
and gxg
.
Datasets:
ld/<j>
: For each focal SNP <j>
, this dataset contains indices of SNPs in
the same LD block as that SNP. These SNPs will be excluded from the
gene-by-gene interaction covariance matrix.gxg/<j>
: For each focal SNP <j>
, this dataset contains indices of SNPs to
include in the the gene-by-gene interaction covariance matrix for focal
SNP <j>
.Important: All indices in the mask file must be zero-based and should
correspond to the zero-based row indices of the PLINK .bim
file. This includes
the dataset index (<j>
in gxg/<j>
) and the data itself.
This zero-based indexing is necessary because the mask data is read by a C++
subroutine in
sme()
, which uses zero-based indexing, unlike R's one-based indexing for SNP
indices in the function call.
The package provides utility functions to create, write, and read valid mask
files for sme()
.
hdf5_file <- tempfile() # Group names gxg_h5_group <- "gxg" ld_h5_group <- "ld" # Data (still in 1-based R indexing) include_gxg_snps <- 1:10 exclude_ld_snps <- 5:6 # Focal SNP (still in 1-based R indexing) focal_snp <- 4 # Dataset names dataset_name_pattern <- "%s/%s" # 0-based index! gxg_dataset <- sprintf(dataset_name_pattern, gxg_h5_group, focal_snp - 1) ld_dataset <- sprintf(dataset_name_pattern, ld_h5_group, focal_snp - 1) # Create an empty HDF5 file create_hdf5_file(hdf5_file) # Write LD data write_hdf5_dataset(hdf5_file, ld_dataset, exclude_ld_snps - 1) # 0-based index! # Write GXG data write_hdf5_dataset(hdf5_file, gxg_dataset, include_gxg_snps - 1)
We can check that the data was written correctly.
ld_read <- read_hdf5_dataset(hdf5_file, ld_dataset) gxg_read <- read_hdf5_dataset(hdf5_file, gxg_dataset) print(sprintf("Zero-based indices of SNPs to exclude: %s", str(ld_read))) print(sprintf("Zero-based indices of SNPs to include: %s", str(gxg_read)))
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.