prioritise_targets: Prioritise target genes

Prioritise target genes based on a procedure:

  1. Disease-level: keep_deaths: Keep only diseases with a certain age of death.

  2. Disease-level: severity_threshold_max:

    Keep only diseases annotated as a certain degree of severity or greater
     (filters on maximum severity per disease).
  3. Phenotype-level: prune_ancestors:

    Remove redundant ancestral phenotypes when at least one of their
     descendants already exist.
  4. Phenotype-level: keep_descendants:

    Remove phenotypes belonging to a certain branch of the HPO,
     as defined by an ancestor term.
  5. Phenotype-level: keep_ont_levels: Keep only phenotypes at certain absolute ontology levels within the HPO.

  6. Phenotype-level: pheno_ndiseases_threshold: The maximum number of diseases each phenotype can be associated with.

  7. Phenotype-level: keep_tiers: Keep only phenotypes with high severity Tiers.

  8. Phenotype-level: severity_threshold: Keep only phenotypes with mean Severity equal to or below the threshold.

  9. Phenotype-level: gpt_filters:

    Keep only phenotypes with certain GPT annotations in specific
     severity metrics.
  10. Phenotype-level: severity_score_gpt_threshold: Keep only phenotypes with a minimum GPT severity score.

  11. Phenotype-level: info_content_threshold:

    Keep only phenotypes with a minimum information criterion score
     (computed from the HPO).
  12. Symptom-level: pheno_frequency_threshold:

    Keep only phenotypes with mean frequency equal to or above the threshold
     (i.e. how frequently a phenotype is associated with any diseases in
     which it occurs).
  13. Symptom-level: keep_onsets: Keep only symptoms with a certain age of onset.

  14. Symptom-level: symptom_p_threshold: Uncorrected p-value threshold to filter cell type-symptom associations by.

  15. Symptom-level: symptom_intersection_threshold:

    Minimum proportion of genes overlapping between a symptom gene list
     (phenotype-associated genes in the context of a particular disease)
     and the phenotype-cell type association driver genes.
  16. Cell type-level: q_threshold:

    Keep only cell type-phenotype association results at q<=0.05.
  17. Cell type-level: effect_threshold: Keep only cell type-phenotype association results at effect size>=1.

  18. Cell type-level: keep_celltypes: Keep only terminally differentiated cell types.

  19. Gene-level: keep_chr: Remove genes on non-standard chromosomes.

  20. Gene-level: evidence_score_threshold:

    Remove genes that are below an aggregate phenotype-gene
     evidence score threshold.
  21. Gene-level: gene_size: Keep only genes <4.3kb in length.

  22. Gene-level: add_driver_genes:

    Keep only genes that are driving the association with a given phenotype
     (inferred by the intersection of phenotype-associated genes and gene with
     high-specificity quantiles in the target cell type).
  23. Gene-level: keep_biotypes: Keep only genes belonging to certain biotypes.

  24. Gene-level: gene_frequency_threshold:

    Keep only genes at or above a certain mean frequency threshold
     (i.e. how frequently a gene is associated with a given phenotype
     when observed within a disease).
  25. Gene-level: keep_specificity_quantiles:

    Keep only genes in top specificity quantiles
     from the cell type dataset (CTD).
  26. Gene-level: keep_mean_exp_quantiles:

    Keep only genes in top mean expression quantiles
     from the cell type dataset (CTD).
  27. Gene-level: symptom_gene_overlap:

    Ensure that genes nominated at the phenotype-level also
     appear in the genes overlapping at the cell type-specific symptom-level.
  28. All levels: sort_cols:

    Sort candidate targets by one or more columns
     (e.g. "severity_score_gpt", "q").
  29. All levels: top_n:

    Only return the top N targets per variable group
     (specified with the "group_vars" argument).
     For example, setting "group_vars" to "hpo_id" and "top_n" to 1 would
     only return one target (row) per phenotype ID after sorting.


  results = load_example_results(),
  ctd_list = load_example_ctd(c("ctd_DescartesHuman.rds", "ctd_HumanCellLandscape.rds"),
    multi_dataset = TRUE),
  phenotype_to_genes = HPOExplorer::load_phenotype_to_genes(),
  hpo = HPOExplorer::get_hpo(),
  keep_deaths = HPOExplorer::list_deaths(exclude = c("Miscarriage", "Stillbirth",
    "Prenatal death"), include_na = TRUE),
  keep_descendants = c("Phenotypic abnormality"),
  keep_ont_levels = NULL,
  pheno_ndiseases_threshold = NULL,
  gpt_filters = NULL,
  severity_score_gpt_threshold = 20,
  keep_tiers = NULL,
  severity_threshold_max = NULL,
  info_content_threshold = 8,
  run_prune_ancestors = TRUE,
  severity_threshold = NULL,
  pheno_frequency_threshold = NULL,
  keep_onsets = HPOExplorer::list_onsets(include_na = TRUE),
  effect_var = "logFC",
  q_threshold = 0.05,
  effect_threshold = 1,
  symptom_intersection_threshold = 0.25,
  keep_celltypes = NULL,
  evidence_score_threshold = 15,
  keep_chr = c(seq(22), "X", "Y"),
  gene_size = list(min = 0, max = Inf),
  gene_frequency_threshold = NULL,
  keep_biotypes = NULL,
  keep_specificity_quantiles = seq(30, 40),
  keep_mean_exp_quantiles = seq(30, 40),
  sort_cols = c(severity_score_gpt = -1, q = 1, logFC = -1, specificity = -1, mean_exp =
    -1, pheno_freq_mean = -1, gene_freq_mean = -1, width = 1),
  top_n = NULL,
  group_vars = c("hpo_id"),
  return_report = TRUE,
  verbose = TRUE



The cell type-phenotype enrichment results generated by gen_results and merged together with merge_results


A named list of CellTypeDataset objects each created with generate_celltype_data.


Output of load_phenotype_to_genes mapping phenotypes to gene annotations.


Human Phenotype Ontology object, loaded from get_ontology.


The age of death associated with each HPO ID to keep. If >1 age of death is associated with the term, only the earliest age is considered. See add_death for details.


Terms whose descendants should be kept (including themselves). Set to NULL (default) to skip this filtering step.


Only keep phenotypes at certain absolute ontology levels to keep. See add_ont_lvl for details.


Filter phenotypes by the maximum number of diseases they are associated with.


A named list of filters to apply to the GPT annotations.


The minimum GPT severity score that a phenotype can have across any disease.


Tiers from hpo_tiers to keep. Include NA if you wish to retain phenotypes that do not have any Tier assignment.


The max severity score that a phenotype can have across any disease.


Minimum phenotype information content threshold.


Prune redundant ancestral terms if any of their descendants are present. Passes to prune_ancestors.


Only keep phenotypes with a mean severity score (averaged across multiple associated diseases) below the set threshold. The severity score ranges from 1-4 where 1 is the MOST severe. Include NA if you wish to retain phenotypes that do not have any severity score.


Only keep phenotypes with frequency above the set threshold. Frequency ranges from 0-100 where 100 is a phenotype that occurs 100% of the time in all associated diseases. Include NA if you wish to retain phenotypes that do not have any frequency data. See add_pheno_frequency for details.


The age of onset associated with each HPO ID to keep. If >1 age of onset is associated with the term, only the earliest age is considered. See add_onset for details.


Name of the effect size column in the results.


The q value threshold to subset the results by.


The minimum fold change in specific expression to subset the results by.


Minimum proportion of genes overlapping between a symptom gene list (phenotype-associated genes in the context of a particular disease) and the phenotype-cell type association driver genes


Cell type to keep.


The minimum threshold of mean evidence scores of each gene-phenotype association to keep.


Chromosomes to keep.


Min/max gene size (important for therapeutics design).


Only keep genes with frequency above the set threshold. Frequency ranges from 0-100 where 100 is a gene that occurs 100% of the time in a given phenotype. Include NA if you wish to retain genes that do not have any frequency data. See add_gene_frequency for details.


Which gene biotypes to keep. (e.g. "protein_coding", "processed_transcript", "snRNA", "lincRNA", "snoRNA", "IG_C_gene")


Which cell type specificity quantiles to keep (max quantile is 40).


Which cell type mean expression quantiles to keep (max quantile is 40).


How to sort the rows using setorderv. names(sort_cols) will be supplied to the cols= argument and values will be supplied to the order= argument.


Top N genes to keep when grouping by group_vars.


Columns to group by when selecting top_n genes.


If TRUE, will return a named list containing a report that shows the number of phenotypes/celltypes/genes remaining after each filtering step.


Print messages.


Term key:

  • Disease:

    A disease defined in the database
    OMIM, DECIPHER and/or Orphanet.
  • Phenotype: A clinical feature associated with one or more diseases.

  • Symptom:

    A phenotype within the context of a particular disease.
    Within a given phenotype, there may be multiple symptoms with
     partially overlapping genetic mechanisms.
  • Assocation:

    A cell type-specific enrichment test result conducted
    at the disease-level, phenotype-level, or symptom-level.


A data.table of the prioritised phenotype- and cell type-specific gene targets.


results = load_example_results()[q<0.05]
out <- prioritise_targets(results=results)

