knitr::opts_chunk$set(warning=FALSE, comment="# ", collapse=TRUE)
The hciRdata package includes feature annotations from the latest Ensembl release 98 for twelve reference genomes (human, mouse, elephant, fly, pig, rabbit, rat, sheep, vervet, worm, yeast and zebrafish), plus earlier even numbered releases for mouse and human (human90, human92, etc). Load the package and select an annotation table.
options(width=110) library(hciRdata) dplyr::select(human98, 1:4,8)
To download additional species, use read_biomart
to view a list of all available datasets.
x <- read_biomart(list="datasets") Using Ensembl release 98 Downloaded 212 datasets > x # # A tibble: 212 x 3 # dataset description version # <chr> <chr> <chr> # 1 abrachyrhynchus_gene_ensembl Pink-footed goose genes (ASM259213v1) ASM259213v1 # 2 acalliptera_gene_ensembl Eastern happy genes (fAstCal1.2) fAstCal1.2 # 3 acarolinensis_gene_ensembl Anole lizard genes (AnoCar2.0) AnoCar2.0 # 4 acitrinellus_gene_ensembl Midas cichlid genes (Midas_v5) Midas_v5 # 5 ahaastii_gene_ensembl Great spotted kiwi genes (aptHaa1) aptHaa1 # 6 amelanoleuca_gene_ensembl Panda genes (ailMel1) ailMel1 # 7 amexicanus_gene_ensembl Mexican tetra genes (Astyanax_mexicanus-2.0) Astyanax_mexicanus-2.0 # # … with 205 more rows
Use the first letter of the genus and species name in the first column to download features (common names are supported for the reference genomes with existing release 98 tables).
panda98 <- read_biomart("amelanoleuca") # Using Ensembl release 98 # Downloaded 23262 features
To download earlier versions, check the Ensembl archives for available releases. Archived versions are available for 54, 67, and all versions after 77.
biomaRt::listEnsemblArchives() # name date url version current_release # 1 Ensembl GRCh37 Feb 2014 http://grch37.ensembl.org GRCh37 # 2 Ensembl 98 Sep 2019 http://sep2019.archive.ensembl.org 98 * # 3 Ensembl 97 Jul 2019 http://jul2019.archive.ensembl.org 97 # ... # 22 Ensembl 78 Dec 2014 http://dec2014.archive.ensembl.org 78 # 23 Ensembl 77 Oct 2014 http://oct2014.archive.ensembl.org 77 # 24 Ensembl 75 Feb 2014 http://feb2014.archive.ensembl.org 75 # 25 Ensembl 67 May 2012 http://may2012.archive.ensembl.org 67 # 26 Ensembl 54 May 2009 http://may2009.archive.ensembl.org 54
Add the version number to download earlier releases.
zb87 <- read_biomart("zebrafish", version=87) # Downloaded 32266 features
In some cases, the default attributes may be missing. For example, the
external_gene_name
is missing from early fly releases. Use the list option
to view the available attributes.
fly67 <- read_biomart("fly", version=67) # Error in biomaRt::getBM(attributes = c("ensembl_gene_id", "external_gene_name", : # Invalid attribute(s): external_gene_name # Please use the function 'listAttributes' to get valid attribute names fly67_columns <- read_biomart("fly", version=67, list="attributes") fly67_columns # name description page # 1 ensembl_gene_id Ensembl Gene ID feature_page # 2 ensembl_transcript_id Ensembl Transcript ID feature_page # 3 ensembl_peptide_id Ensembl Protein ID feature_page # 4 description Description feature_page # ... # 13 external_gene_id Associated Gene Name feature_page # ...
Replace the default external_gene_name
with external_gene_id
by passing a
vector of attributes and then follow the code in read_biomart
to update column
names and descriptions.
fly67 <- read_biomart("fly", version=67, attributes = c('ensembl_gene_id', 'external_gene_id', 'gene_biotype', 'chromosome_name', 'start_position', 'end_position','strand', 'description', 'transcript_count')) names(fly67)[1:6] <- c("id", "gene_name", "biotype", "chromosome", "start", "end") fly67$description <- gsub(" \\[Source.*", "", fly67$description)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.