View source: R/extract_eupath_orthologs.R
extract_eupath_orthologs | R Documentation |
The eupathdb provides such a tremendous wealth of information. For me though, it is difficult sometimes to boil it down into just the bits of comparison I want for 1 species or between 2 species. A singularly common question I am asked is: "What are the most similar genes between species x and y among these two arbitrary parasites?" There are lots of ways to poke at this question: run BLAST/fasta36, use biomart, query the ortholog tables from the eupathdb, etc. However, in all these cases, it is not trivial to ask the next question: What about: a:b and b:a? This function attempts to address that for the case of two eupath species from the same domain. (tritrypdb/fungidb/etc.) It does however assume that the sqlite package has been installed locally, if not it suggests you run the make_organismdbi function in order to do that.
extract_eupath_orthologs(
db,
master = "GID",
query_species = NULL,
id_column = "ORTHOLOGS_GID",
org_column = "ORTHOLOGS_ORGANISM",
group_column = "ANNOT_GENE_ORTHOMCL_NAME",
name_column = "ORTHOLOGS_PRODUCT",
count_column = "ORTHOLOGS_COUNT",
print_speciesnames = FALSE,
webservice = "eupathdb"
)
db |
Species name (subset) from one eupath database. |
master |
Primary keytype to use for indexing the various tables. |
query_species |
A list of exact species names to search for. If uncertain about them, add print_speciesnames=TRUE and be ready for a big blob of text. If left null, then it will pull all species. |
id_column |
What column in the database provides the set of ortholog IDs? |
org_column |
What column provides the species name? |
group_column |
Ortholog group column name. |
name_column |
Name of the gene for this group. |
count_column |
Name of the column with the count of species represented. |
print_speciesnames |
Dump the species names for diagnostics? |
webservice |
Which eupathdb project to query? |
One other important caveat: this function assumes queries in the format 'table_column' where in this particular instance, the table is further assumed to be the ortholog table.
A big table of orthoMCL families, the columns are:
GID: The gene ID
ORTHOLOG_ID: The gene ID of the associated ortholog.
ORTHOLOG_SPECIES: The species of the associated ortholog.
ORTHOLOG_URL: The OrthoMCL group ID's URL.
ORTHOLOG_COUNT: The number of all genes from all species represented in this group.
ORTHOLOG_GROUP: The family ID
QUERIES_IN_GROUP: How many of the query species are represented in this group?
GROUP_REPRESENTATION: ORTHOLOG_COUNT / the number of possible species.
atb
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.