To install the development version of inferorg
, run
devtools::install_github("camlab-bioml/inferorg")
Load the package:
library(inferorg)
To infer the organism and gene ID format, call the inferorg
function. For example, for the human MHC-I genes:
human_mhc_genes <- c("HLA-A", "HLA-B", "HLA-C") inferorg(human_mhc_genes)
This returns a list with four entries:
organism
: the best guess of the organism the symbols correspond toformat
: the best guess of the format the symbols correspond toconfidence_organism
: the confidence in the guess of the organismconfidence_format
: the confidence in the formatFor a full list of supported organisms and formats, see supported ID formats and organisms.
Sometimes identifiers match multiple organisms, such as Tap1
and Tap2
both matching mouse and rat. In this case the confidence score is lower, but the recommended organism is first in the preferred order given by supported ID formats and organisms (i.e. human before mouse before fruit fly):
inferorg(c("Tap1", "Tap2"))
The confidence scores for each organism and format as follows:
To convert automatically between formats we use the autoconvert
function. Under-the-hood, this calls the inferorg
function to work out the gene ID format and organism, before converting to the desired format (for that organism).
For example, if we wish to convert the genes of the human MHC-I complex to ensembl IDs, we can call:
human_mhc_genes <- c("HLA-A", "HLA-B", "HLA-C") autoconvert(human_mhc_genes, to = 'ensgene')
Similarly, we can convert the genes Tap1
and Tap2
in mouse to their entrez IDs:
mouse_tap_genes <- c("Tap1", "Tap2") autoconvert(mouse_tap_genes, to = "entrez")
and we can convert these back to
autoconvert(c(21354, 21355), to = "symbol")
Note that if the gene ID format and/or organism can't be confidently inferred or any of the genes provided can't be confidently mapped, an NA
is returned:
autoconvert(c("fake", "gene"))
But be careful! Sometimes they will match, which is especially an issue for very small input genesets:
autoconvert(c("made", "up", "gene"), to='ensgene')
The following organisms are supported:
human
mouse
fruit_fly
macaque
worm
chicken
rat
and the following gene ID formats:
symbol
: HGNC symbolensgene
: ensembl gene IDentrez
: entrez gene IDprint(sessionInfo())
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.