knitr::opts_chunk$set(echo = TRUE)
InterMine is a powerful open source data warehouse system integrating diverse biological data sets (e.g. genomic, expression and protein data) for various organisms. Integrating data makes it possible to run sophisticated data mining queries that span domains of biological knowledge. A selected list of databases powered by InterMine is shown in Table 1:
Database | Organism | Data | ---------|----------|------| FlyMine | Drosophila | Genes, homology, proteins, interactions, gene ontology, expression, regulation, phenotypes, pathways, diseases, resources, publications HumanMine | H. sapiens | Genomics, SNPs, GWAS, proteins, gene ontology, pathways, gene expression, interactions, publications, disease, orthologues, alleles MouseMine | M. musculus | Genomics, proteins, gene ontology, expression, interactions, pathways, phenotypes, diseases, homology, publications RatMine | R. norvegicus | Disease, gene ontology, genomics, interactions, phenotype, pathway, proteins, publication QTL, SNP WormMine | C. elegans | Genes, alleles, homology, go annotation, phenotypes, strains YeastMine | S. cerevisiae | Genomics, proteins, gene ontology, comparative genomics, phenotypes, interactions, literature, pathways, gene expression ZebrafishMine | D. rerio | Genes, constructs, disease, gene ontology, genotypes, homology, morpholinos, phenotypes TargetMine | H. sapiens, M. musculus | Genes, protein structures, chemical compounds, protein domains, gene function, pathways, interactions, disease, drug targets MitoMiner | H. sapiens, M. musculus, R. norvegicus, D. rerio, S. cerevisiae, S. pombe | Genes, homology, localisation evidence, Mitochondrial reference gene lists, phenotypes, diseases, expression, interactions, pathways, exome IndigoMine | Archae | Genomics ThaleMine | A. thaliana | Genomics, proteins, domains, homology, gene ontology, interactions, expression, publications, pathways, GeneRIF, stocks, phenotypes, alleles, insertions, TAIR MedicMine | Medicago truncatula | Genomics, pathways, gene ontology, publications, proteins, homology PhytoMine | over 50 plant genomes | Genes, proteins, expression, transcripts, homology
Please see the InterMine home page for a full list of available InterMines.
InterMine includes an attractive, user-friendly web interface that works 'out of the box' and a powerful, scriptable web-service API to allow programmatic access to your data. This R package provides an interface with the InterMine-powered databases through Web services.
Let's start with a simple task - find the pathways of the gene ABO.
First, we look at what databases are available.
library(InterMineR) listMines()
Since we would like to query human genes, we select HumanMine.
# load HumaMine im <- initInterMine(mine=listMines()["HumanMine"]) im
Both in InterMine database website and in InterMineR, you are able to build custom queries. However, to facilitate the retrieval of information from InterMine databases, a variety of pre-built queries, called templates, have also been made available. Templates are queries that have already been created with a fixed set of output columns and one or more constraints.
# Get template (collection of pre-defined queries) template = getTemplates(im) head(template)
We would like to find templates involving genes.
# Get gene-related templates template[grep("gene", template$name, ignore.case=TRUE),]
The template Gene_Pathway seems to be what we want. Let's look at this template in more detail.
# Query for gene pathways queryGenePath = getTemplateQuery( im = im, name = "Gene_Pathway" ) queryGenePath
There are three essential members in a query - SELECT, WHERE and constraintLogic.
What does 'Gene.symbol' mean? What is 'Gene.pathway.identifier'?
Let's take a look at the data model. NOTE: Section temporarily removed due to errors
Let's look at the children of the Gene data type. # ```r # model[which(model$type=="Gene"),]
Gene has a field called 'symbol' (hence the column Gene.symbol). Gene also references the Pathways class, which is of the Pathway data type.
## Run a Query Let's now run our template. ```r resGenePath <- runQuery(im, queryGenePath) head(resGenePath)
Let's modify the query to find the pathways of the gene ABO. We want to change the 'value' attribute from PPARG to ABO.
There are two ways to build a query in InterMineR.
We can either build a query as a list object with newQuery
function, and assign all input values (selection of retrieved data type, constraints, etc.) as items of that list,
Or we can build the query as an InterMineR-class
object with the functions setConstraint
, which allows us to generate a new or modify an existing list of constraints, and setQuery
, which generates the query as a InterMineR-class
object.
setConstraints
and setQuery
functions are designed to facilitate the generation of queries for InterMine instances and avoid using multiple iterative loops, especially when it is required to include multiple constraints or constraint values (e.g. genes, organisms) in your query.
# modify directly the value of the first constraint from the list query queryGenePath$where[[1]][["value"]] <- "ABO" # or modify the value of the first constraint from the list query with setConstraints queryGenePath$where = setConstraints( modifyQueryConstraints = queryGenePath, m.index = 1, values = list("ABO") ) queryGenePath$where
Note the value is now equal to 'ABO'. Let's rerun our query with the new constraint.
resGenePath <- runQuery(im, queryGenePath) head(resGenePath)
Now we are seeing pathways for the ABO gene.
You can also add additional filters. Let's look for a specifc pathway.
There are four parts of a constraint to add:
newConstraint <- list( path=c("Gene.pathways.name"), op=c("="), value=c("ABO blood group biosynthesis"), code=c("B") ) queryGenePath$where[[2]] <- newConstraint queryGenePath$where
Our new filter has been added successfully. Rerun the query and you see you only have one pathway,ABO blood group biosynthesis, returned.
resGenePath <- runQuery(im, queryGenePath) resGenePath
You can also add additional columns to the output. For instance, is the Gene also involved in any disease? Let's add this information.
Let's see what we know about diseases.
The Gene data type has an 'Diseases' reference of type 'Disease'.
Disease has an attribute called "name". Add Gene.diseases.name to the view. We'll add it as the last column, we can see from above there are 7 other columns already so we'll put it as #8:
# use setQuery function which will create an InterMineR-class query queryGenePath.InterMineR = setQuery( inheritQuery = queryGenePath, select = c(queryGenePath$select, "Gene.diseases.name") ) getSelect(queryGenePath.InterMineR) #queryGenePath.InterMineR@select # or assign new column directly to the existing list query queryGenePath$select[[8]] <- "Gene.diseases.name" queryGenePath$select # run queries resGenePath.InterMineR <- runQuery(im, queryGenePath.InterMineR) resGenePath <- runQuery(im, queryGenePath) all(resGenePath == resGenePath.InterMineR) head(resGenePath, 3)
NB: adding columns can result in changing the row count.
The constraintLogic, if not given, is 'A and B'. We would now try to explicitly specify the constraintLogic. A and B corresponds to the 'code' for each constraint.
queryGenePath$constraintLogic <- "A and B" queryGenePath$constraintLogic
Run the query again and see no change:
resGenePath <- runQuery(im, queryGenePath) resGenePath
Change to be 'A or B' and see how the results change.
- Start with the template Gene GO
queryGeneGO <- getTemplateQuery(im, "Gene_GO") queryGeneGO
- Modify the view to display a compact view
queryGeneGO$select <- queryGeneGO$select[2:5] queryGeneGO$select
- Modify the constraints to look for gene ABO.
queryGeneGO$where[[1]][["value"]] <- "ABO" queryGeneGO$where
- Run the query
resGeneGO <- runQuery(im, queryGeneGO ) head(resGeneGO)
- Start with the template Gene GO
queryGOGene <- getTemplateQuery(im, "GOterm_Gene") queryGOGene
- Modify the view to display a compact view
queryGOGene$select <- queryGOGene$select[2:5] queryGOGene$select
- Modify the constraints to look for GO term 'metal ion binding'
queryGOGene$where[[1]]$value = "metal ion binding" queryGOGene$where
- Run the query
resGOGene <- runQuery(im, queryGOGene ) head(resGOGene)
- Start with the Gene_Location template, update to search for ABCA6 gene.
queryGeneLoc = getTemplateQuery(im, "Gene_Location") queryGeneLoc$where[[2]][["value"]] = "ABCA6" resGeneLoc= runQuery(im, queryGeneLoc) resGeneLoc
We're going to use the output (gene location) as input for the next query.
- Define a new query
# set constraints constraints = setConstraints( paths = c( "Gene.chromosome.primaryIdentifier", "Gene.locations.start", "Gene.locations.end", "Gene.organism.name" ), operators = c( "=", ">=", "<=", "=" ), values = list( resGeneLoc[1, "Gene.chromosome.primaryIdentifier"], as.character(as.numeric(resGeneLoc[1, "Gene.locations.start"])-50000), as.character(as.numeric(resGeneLoc[1, "Gene.locations.end"])+50000), "Homo sapiens" ) ) # set InterMineR-class query queryNeighborGene = setQuery( select = c("Gene.primaryIdentifier", "Gene.symbol", "Gene.chromosome.primaryIdentifier", "Gene.locations.start", "Gene.locations.end", "Gene.locations.strand"), where = constraints ) summary(queryNeighborGene)
- Run the query
resNeighborGene <- runQuery(im, queryNeighborGene) resNeighborGene
- Plot the genes
resNeighborGene$Gene.locations.strand[which(resNeighborGene$Gene.locations.strand==1)]="+" resNeighborGene$Gene.locations.strand[which(resNeighborGene$Gene.locations.strand==-1)]="-" gene.idx = which(nchar(resNeighborGene$Gene.symbol)==0) resNeighborGene$Gene.symbol[gene.idx]=resNeighborGene$Gene.primaryIdentifier[gene.idx]
require(Gviz)
annTrack = AnnotationTrack( start=resNeighborGene$Gene.locations.start, end=resNeighborGene$Gene.locations.end, strand=resNeighborGene$Gene.locations.strand, chromosome=resNeighborGene$Gene.chromosome.primaryIdentifier[1], genome="GRCh38", name="around ABCA6", id=resNeighborGene$Gene.symbol) gtr <- GenomeAxisTrack() itr <- IdeogramTrack(genome="hg38", chromosome="chr17") plotTracks(list(gtr, itr, annTrack), shape="box", showFeatureId=TRUE, fontcolor="black")
sessionInfo()
The InterMine model could be accessed from the mine homepage by clicking the tab "QueryBuilder" and selecting the appropriate data type under "Select a Data Type to Begin a Query":
Here we select Gene as the data type:
We could select Symbol and Chromosome->Primary Identifier by clicking Show on the right of them. Then click "Export XML" at the bottom right corner of the webpage:
The column names Gene.symbol and Gene.chromosome.primaryIdentifier are contained in the XML output:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.