View source: R/infer_effect_column.R
infer_effect_column | R Documentation |
Three checks are made to infer which allele the effect/frequency information relates to if they are ambiguous (named A1 and A2 or equivalent):
Check if ambiguous naming conventions are used (i.e. allele 1 and 2 or equivalent). If not exit, otherwise continue to next checks. This can be checked by using the mapping file and splitting A1/A2 mappings by those that contain 1 or 2 (ambiguous) or doesn't contain 1 or 2 e.g. effect, tested (unambiguous so fine for MSS to handle as is).
Look for effect column/frequency column where the A1/A2 explicitly mentioned, if found then we know the direction and should update A1/A2 naming so A2 is the effect column. We can look for such columns by getting every combination of A1/A2 naming and effect/frq naming.
If not found in 2, a final check should be against the reference genome, whichever of A1 and A2 has more of a match with the reference genome should be taken as not the effect allele. There is an assumption in this but is still better than guessing the ambiguous allele naming.
infer_effect_column(
sumstats_dt,
dbSNP = 155,
sampled_snps = 10000,
mapping_file = sumstatsColHeaders,
nThread = nThread,
ref_genome = NULL,
on_ref_genome = TRUE,
infer_eff_direction = TRUE,
return_list = TRUE
)
sumstats_dt |
data table obj of the summary statistics file for the GWAS. |
dbSNP |
version of dbSNP to be used for imputation (144 or 155). |
sampled_snps |
Downsample the number of SNPs used when inferring genome build to save time. |
mapping_file |
MungeSumstats has a pre-defined column-name mapping file which should cover the most common column headers and their interpretations. However, if a column header that is in youf file is missing of the mapping we give is incorrect you can supply your own mapping file. Must be a 2 column dataframe with column names "Uncorrected" and "Corrected". See data(sumstatsColHeaders) for default mapping and necessary format. |
nThread |
Number of threads to use for parallel processes. |
ref_genome |
name of the reference genome used for the GWAS ("GRCh37" or "GRCh38"). Argument is case-insensitive. Default is NULL which infers the reference genome from the data. |
on_ref_genome |
Binary Should a check take place that all SNPs are on the reference genome by SNP ID. Default is TRUE. |
infer_eff_direction |
Binary Should a check take place to ensure the alleles match the effect direction? Default is TRUE. |
return_list |
Return the |
list containing sumstats_dt, the modified summary statistics data table object
sumstats <- MungeSumstats::formatted_example()
#for speed, don't run on_ref_genome part of check (on_ref_genome = FALSE)
sumstats_dt2<-infer_effect_column(sumstats_dt=sumstats,on_ref_genome = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.