BiocStyle::markdown()
SNPedia is a curated database containing information about thousands of SNPs. Related diseases, genotypes and references to relevant scientific publications are available trough their web. This site is powered by MediaWiki and information about each SNP is written in the corresponding wiki page.
The SNPediaR
library provides tools for automatically search and download such pages.
It also implements few functions to scrap some relevant information from the downloaded wiki text,
and allows users to extend such parsing functionality.
For a known set of pages,
the function getPages
downloads the corresponding wiki content using the
MediaWiki web API.
We can for instance download the page Rs53576, corresponding to the rs53576 SNP doing:
library (SNPediaR) pg <- getPages (titles = "Rs53576") pg
We can use the same function to download several pages at a time, for instance we can download the 3 genotype pages corresponding with the same SNP: Rs53576(A;A), Rs53576(A;G) and Rs53576(G;G) as
pgs <- getPages (titles = c ("Rs53576(A;A)", "Rs53576(A;G)", "Rs53576(G;G)")) pgs
Extracting relevant information requires parsing the wiki text. Some utility functions are already implemented in our library for such purpose and any other can be implemented by users.
The function extractSnpTags
for instance,
extracts the "tabular" information from SNP pages:
extractSnpTags (pg$Rs53576)
The function extractGenotypeTags
can be used to get the "tabular" information from genotype pages:
sapply (pgs, extractGenotypeTags)
This same parsing can also be done while downloading the pages,
including the wiki processing function as an argument of the in the getPages
query.
If for instance we are just interested in the alleles and the magnitude associated with each of the genotypes we can do:
getPages (titles = c ("Rs53576(A;A)", "Rs53576(A;G)", "Rs53576(G;G)"), wikiParseFunction = extractGenotypeTags, tags = c ("allele1", "allele2", "magnitude"))
Any wiki processing function can be included in the getPages
.
If a user wants for instance to extract all PubMed IDs from pages
Rs53576 and
Rs1815739,
he or she can first define a parsing function like:
findPMID <- function (x) { x <- unlist (strsplit (x, split = "\n")) x <- grep ("PMID=", x, value = TRUE) x }
and then call getPages
as:
getPages (titles = c ("Rs53576", "Rs1815739"), wikiParseFunction = findPMID)
Besides the SNP and the genotype pages, some other interesting SNPedia resources are the category pages. They constitute indexes of all other pages which may be queried.
Most used categories are:
Full list of categories may be found here.
The function getCategoryElements
is devised to query all elements under certain category.
It can be used explore which is the available information in SNPedia.
We can get for instance all medical conditions
res <- getCategoryElements (category = "Is_a_medical_condition") head (res)
and find out those related to cancer
grep ('cancer', res, value = TRUE)
sessionInfo ()
r Sys.Date()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.