geneToSymbol | R Documentation |
If your organism is not in this list of supported
organisms, manually assign the input arguments.
There are 2 main fetch modes:
By gene ids (Single accession per gene)
By tx ids (Multiple accessions per gene)
Run the mode you need depending on your required attributes.
Will check for already existing table of all genes, and use that instead
of re-downloading every time (If you input valid experiment or txdb
and have run makeTxdbFromGenome
with symbols = TRUE, you have a file called gene_symbol_tx_table.fst) will
load instantly. If df = NULL, it can still search cache to load a bit slower.
geneToSymbol(
df,
organism_name = organism(df),
gene_ids = filterTranscripts(df, by = "gene", 0, 0, 0),
org.dataset = paste0(tolower(substr(organism_name, 1, 1)), gsub(".* ", replacement =
"", organism_name), "_gene_ensembl"),
ensembl = biomaRt::useEnsembl("ensembl", dataset = org.dataset),
attribute = "external_gene_name",
include_tx_ids = FALSE,
uniprot_id = FALSE,
force = FALSE,
verbose = TRUE
)
df |
an ORFik |
organism_name |
default, |
gene_ids |
default, |
org.dataset |
default, |
ensembl |
default, |
attribute |
default, "external_gene_name", the biomaRt column / columns default(primary gene symbol names). These are always from specific database, like hgnc symbol for human, and mgi symbol for mouse and rat, sgd for yeast etc. |
include_tx_ids |
logical, default FALSE, also match tx ids, which then returns as the 3rd column. Only allowed when 'df' is defined. If |
uniprot_id |
logical, default FALSE. Include uniprotsptrembl and/or uniprotswissprot. If include_tx_ids you will get per isoform if available, else you get canonical uniprot id per gene. If both uniprotsptrembl and uniprotswissprot exists, it will make a merged uniprot id column with rule: if id exists in uniprotswissprot, keep. If not, use uniprotsptrembl column id. |
force |
logical FALSE, if TRUE will not look for existing file made through |
verbose |
logical TRUE, if FALSE, do not output messages. |
data.table with 2, 3 or 4 columns: gene_id, gene_symbol, tx_id and uniprot_id named after attribute, sorted in order of gene_ids input. (example: returns 3 columns if include_tx_ids is TRUE), and more if additional columns are specified in 'attribute' argument.
## Without ORFik experiment input
gene_id_ATF4 <- "ENSG00000128272"
#geneToSymbol(NULL, organism_name = "Homo sapiens", gene_ids = gene_id_ATF4)
# With uniprot canonical isoform id:
#geneToSymbol(NULL, organism_name = "Homo sapiens", gene_ids = gene_id_ATF4, uniprot_id = TRUE)
## All genes from Organism using ORFik experiment
# df <- read.experiment("some_experiment)
# geneToSymbol(df)
## Non vertebrate species (the ones not in ensembl, but in ensemblGenomes mart)
#txdb_ylipolytica <- loadTxdb("txdb_path")
#dt2 <- geneToSymbol(txdb_ylipolytica, include_tx_ids = TRUE,
# ensembl = useEnsemblGenomes(biomart = "fungi_mart", dataset = "ylipolytica_eg_gene"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.