Description Usage Arguments Details Value Author(s) Examples
View source: R/GeneAccord_main_functions.R
Method to detect clonally exclusive gene or pathway pairs in a cohort of cancer patients
1 2 | GeneAccord(clone_tbl, avg_rates_m, ecdf_list, alternative = "greater",
genes_of_interest = "ALL", AND_OR = "OR")
|
clone_tbl |
The tibble containing the information of which
gene/pathway is mutated in which
clone from which patient and in which tree from the collection of
trees. Can be generated with |
avg_rates_m |
The average rates of clonal exclusivity for each
patient as computed with
|
ecdf_list |
The list of ECDF's of the test statistic under the
null distribution. Can be generated with
|
alternative |
The character indicating whether pairs should only be tested if delta > 0 or if all pairs should be tested. Can be one of "greater" or "two.sided". Default: "greater". |
genes_of_interest |
A character vector of genes to test for clonal exclusivity. The genes have to be in the same identifier as the one in the tibble. Per default, all genes are tested. Default: "ALL". |
AND_OR |
If |
After running a tool such as Cloe
that identifies clones in a
tumor and infers the phylogenetic history, the user has for each tumor
a list of alterations and their clone assignments. Since the tree
inference includes uncertainty, it may be run several times. Given a
tibble containing the information of which genes/pathways are mutated
in which patient and clone and from which tree, this function
systematically tests the data for significant clonal exclusivities.
That is, it checks for each gene/pathway pair whether the number of
clonal exclusivities is significantly different from what would be
expected by chance. Such a tibble can be generated with
create_tbl_tree_collection
, and then adding the
additional column 'tree_id' to indicate which tree of the tree
inference was used. For instance, if the tree inference tool was
run several times using different seeds, the column 'tree_id' may
contain the seed of the respective tree. Hence, the tibble is
expected to have the columns 'file_name', 'patient_id',
'altered_entity', 'clone1', 'clone2', ... up to the maximal number
of clones (Default: until 'clone7'), and 'tree_id'. Note that the
labelling of the clones does not matter and only needs to stay fixed
within each patient and tree inference. There is also the option to
test two-sided, meaning that also pairs will be tested that tend to
occur more often together in the same clones or separate in different
clones. Hence it also allows to detect significant clonal co-occurrence.
An additional option is to test only a specific subset of genes.
A tibble containing the test result for each pair of mutated genes/pathways that was tested. More precisely, it contains the columns 'entity_A', 'entity_B', 'num_patients', 'pval', 'mle_delta', 'test_statistic', and 'qval'. Each row is then a gene or pathway pair which is specified with 'entity_A', and 'entity_B'. Note that the test is symmetric, hence switching the labels A and B does not change the results. The column 'num_patients' contains the information in how many patients both of the genes/pathways were mutated and hence how many patients' rates were used for the test. The 'pval' is the p-value of the clonal exclusivity test. The 'mlde_delta' is the maximum likelihood estimate of the delta for the elevated clonal exclusivity rate in the alternative model. The column 'test_statistic' is the likelihood ratio test statistic. The 'qval' is the adjusted p-value after multiple testing correction with Benjamini-Hochberg.
Ariane L. Moore, ariane.moore@bsse.ethz.ch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | clone_tbl <- dplyr::tibble("file_name"=
rep(c(rep(c("fn1", "fn2"), each=3)), 2),
"patient_id"=rep(c(rep(c("pat1", "pat2"), each=3)), 2),
"altered_entity"=c(rep(c("geneA", "geneB", "geneC"), 4)),
"clone1"=c(0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0),
"clone2"=c(1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1),
"tree_id"=c(rep(5, 6), rep(10, 6)))
clone_tbl_pat1 <- dplyr::filter(clone_tbl, patient_id == "pat1")
clone_tbl_pat2 <- dplyr::filter(clone_tbl, patient_id == "pat2")
rates_exmpl_1 <- compute_rates_clon_excl(clone_tbl_pat1)
rates_exmpl_2 <- compute_rates_clon_excl(clone_tbl_pat2)
avg_rates_m <- apply(cbind(rates_exmpl_1, rates_exmpl_2), 2, mean)
names(avg_rates_m) <- c(names(rates_exmpl_1)[1],
names(rates_exmpl_2)[1])
values_clon_excl_num_trees_pat1 <- get_hist_clon_excl(clone_tbl_pat1)
values_clon_excl_num_trees_pat2 <- get_hist_clon_excl(clone_tbl_pat2)
list_of_num_trees_all_pats <-
list(pat1=values_clon_excl_num_trees_pat1[[1]],
pat2=values_clon_excl_num_trees_pat2[[1]])
list_of_clon_excl_all_pats <-
list(pat1=values_clon_excl_num_trees_pat1[[2]],
pat2=values_clon_excl_num_trees_pat2[[2]])
num_pat_pair_max <- 2
num_pairs_sim <- 10
ecdf_list <- generate_ecdf_test_stat(avg_rates_m,
list_of_num_trees_all_pats,
list_of_clon_excl_all_pats,
num_pat_pair_max,
num_pairs_sim)
alternative <- "greater"
GeneAccord(clone_tbl, avg_rates_m, ecdf_list, alternative)
alternative <- "two.sided"
GeneAccord(clone_tbl, avg_rates_m, ecdf_list, alternative)
genes_of_interest <- c("geneB", "geneC")
GeneAccord(clone_tbl, avg_rates_m, ecdf_list,
alternative, genes_of_interest)
AND_OR <- "AND"
GeneAccord(clone_tbl, avg_rates_m, ecdf_list,
alternative, genes_of_interest, AND_OR)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.