munge: Clean and munge files to enable LD score regression

View source: R/munge.R

mungeR Documentation

Clean and munge files to enable LD score regression

Description

Function to process GWAS summary statistis files and prepair them for LD score regression

Usage

munge(files,hm3,trait.names=NULL,N,info.filter = .9,maf.filter=0.01, column.names=list(),parallel=FALSE,cores=NULL,overwrite=TRUE ...)

Arguments

files

A vector of file names, files must be located in the working directory, or a path must be provided.

hm3

A file of SNPs with A1, A2 and rsID used to allign alleles across traits. We suggest using an (UNZIPPED) file of HAPMAP3 SNPs with some basic cleaning applied (e.g., MHC region removed) that is supplied and created by the original LD score regression developers and available here: https://data.broadinstitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz2:

trait.names

A vector of trait names which will be used as names for the munged files

N

A vector of sample size

info.filter

Numeric value which is used as a lower bound for inputation quality (INFO)

maf.filter

Numeric value used as a lower bound for minor allel frequency

column.names

Optional list detailing which columns represent, SNP, MAF, etc. e.g. list(SNP=my_snp_column)

parallel

Indicates whether munge should process the summary statistics files in parallel or serial fashion. Default is TRUE, indicating that it will run in parallel.

cores

Indicates how many cores to use when running in parallel. The default is NULL, in which case munge will use 1 less than the total number of cores available in the local environment.

overwrite

Indicates whether existing .sumstats.gz files should be overwritten

Value

The function writes files of the ".sumstats" format, which can be used to estimate SNP heritability and genetic covariance using the ldsc() function. The function will also output a .log file that should be examined to ensure that column names are being interpret correctly.


MichelNivard/GenomicSEM documentation built on Dec. 24, 2024, 3:23 a.m.