knitr::opts_chunk$set(echo = TRUE, eval=TRUE)
In this tutorial, we prepare input data for the analysis.
Load the package.
library(mapgen)
We need a data frame of GWAS summary statistics with the following columns:
Let's load a small example GWAS dataset (test.sumstats.txt.gz
), and
process the data (see below).
We can use the vroom
package to read the GWAS summary statistics.
library(vroom) gwas.file <- system.file('extdata', 'test.sumstats.txt.gz', package='mapgen') sumstats <- vroom(gwas.file, col_names = TRUE, show_col_types = FALSE) head(as.data.frame(sumstats),3)
For the LD blocks, we need a data frame with four columns: chromosome, start and end positions, and the indices of the LD blocks.
We provided the LD blocks from LDetect
for the 1000 Genomes (1KG) European population (in hg19).
LD_Blocks <- readRDS(system.file('extdata', 'LD.blocks.EUR.hg19.rds', package='mapgen')) head(LD_Blocks, 3)
* You can skip this if you only need to run enrichment analysis.
Fine-mapping requires linkage-disequilibrium (LD) information.
You could either provide LD matrices or use a reference genotype panel, which will compute LD matrices internally.
To use the reference genotype panel, we utilize the R package bigsnpr
to
read in PLINK files (bed/bim/fam) into R and
match alleles between GWAS summary statistics and reference genotype panel.
If you have reference genotypes in PLINK format, you can use bigsnpr::snp_readBed()
to read bed/bim/fam
files into a bigSNP object and save as a '.rds' file.
This '.rds' file could be used for downstream analyses when attached with
\code{bigsnpr::snp_attach()}.
We provided a bigSNP
object of the reference genotype panel from
the 1000 Genomes (1KG) European population. If you are in the He lab at UChicago,
you can load the bigSNP
object from RCC as below.
library(bigsnpr) bigSNP <- snp_attach(rdsfile = '/project2/xinhe/1kg/bigsnpr/EUR_variable_1kg.rds')
We also have pre-computed LD matrices of European samples from UK Biobank.
They can be downloaded [here][UKBB_LD].
If you are at UChicago,
you can directly access those LD matrices from RCC at
/project2/mstephens/wcrouse/UKB_LDR_0.1_b37/
.
We create a data frame region_info
, with filenames of UKBB reference LD matrices and SNP info
adding to the LD_Blocks.
region_info <- get_UKBB_region_info(LD_Blocks, LDREF.dir = "/project2/mstephens/wcrouse/UKB_LDR_0.1_b37", prefix = "ukb_b37_0.1")
* You can skip this if you only need to run enrichment analysis, or if you only need to run downstream analysis (e.g. gene mapping) with precomputed fine-mapping result.
We run process_gwas_sumstats()
to process the summary statistics.
This checks and cleans GWAS summary statistics,
add locus ID from the LD_Blocks
if available,
match alleles in GWAS SNPs with the reference panel from
the bigSNP
object if bigSNP
is specified,
or harmonize GWAS SNPs with the reference LD information
from the precopmuted LD matrices if region_info
is specified.
gwas.sumstats <- process_gwas_sumstats(sumstats, chr='chr', pos='position_b37', beta='bcac_onco2_beta', se='bcac_onco2_se', a0='a0', a1='a1', snp='phase3_1kg_id', pval='bcac_onco2_P1df_Wald', LD_Blocks=LD_Blocks, bigSNP=bigSNP)
* You do not need to specify bigSNP
, region_info
, or LD_Blocks
if you only need to run enrichment analysis.
Check that the summary statistics is processed and has appropriate columns:
gwas.sumstats <- readRDS(system.file('extdata', 'test.processed.sumstats.rds', package='mapgen'))
head(as.data.frame(gwas.sumstats),3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.