generate_ecdf_test_stat: Generate the ECDF of the test statistic under the null...

Description Usage Arguments Details Value Author(s) Examples

View source: R/sim_functions.R

Description

Generate the ECDF of the test statistic under the null distribution - taking the average rates of clonal exclusivity, as well as sampling from the real data for each patient, in how many trees a pair occurs and is clonally excl.

Usage

1
2
3
generate_ecdf_test_stat(avg_rates_m, list_of_num_trees_all_pats,
  list_of_clon_excl_all_pats, num_pat_pair_max, num_pairs_sim,
  beta_distortion = 1000)

Arguments

avg_rates_m

The average rates of clonal exclusivity from all the patients in the cohort, and averaged over several trees from the collection of tree inferences.

list_of_num_trees_all_pats

A named list that contains an entry for each patient which is the vector with the values of the information from each pair in a patient of how often it was mutated across trees. The patient odering in the list has to be the same as in avg_rates_m.

list_of_clon_excl_all_pats

A named list with an entry for each patient that is a vector with the values of in how many trees a pair was clonally exclusive. The patient ordering in the list has to be the same as in avg_rates_m.

num_pat_pair_max

The maximum number of patients a pair is mutated in.

num_pairs_sim

The number of simulated gene/pathway pairs to be generated, i.e. the number of times the test statistic is computed. Recommended to choose a big number, e.g. 100000.

beta_distortion

The value M=alpha + beta for the beta distribution, with which the average rates will be distorted. The bigger the M the higher the distribution is peaked around the actual rate. Therefore, the lesser the M, the more distorted the rates will be. Default: 1000.

Details

This function takes the computed average rates of clonal exclusivity from the data (m1, ... mN), which are specific to each patient and averaged over several trees from the collection of tree inferences. It also takes the histogram for each patient, of the values of how often a pair was clonally exclusive over the number of trees it was mutated in. It then simulates the test statistic under the null for each number of patients a pair is be mutated in from 2, 3, ... 'num_pat_pair_max'. Afterwards, it generates the empirical cumulative distribution function (ECDF) using the ecdf function of the stats package and returns the list with the ECDF's for the number of patients n=2, 3, ..., N. This step is necessary for each new data set before the clonal exclusivity test can be done. In the clonal exclusivity test, the observed test statistics are compared to the ECDF.

Value

The return value is a list with ECDF's. The first list entry is just set to NULL for technical reasons.

Author(s)

Ariane L. Moore

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
clone_tbl <- dplyr::tibble("file_name" =
   rep(c(rep(c("fn1", "fn2"), each=3)), 2),
   "patient_id"=rep(c(rep(c("pat1", "pat2"), each=3)), 2),
   "altered_entity"=c(rep(c("geneA", "geneB", "geneC"), 4)),
   "clone1"=c(0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0),
   "clone2"=c(1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1),
   "tree_id"=c(rep(5, 6), rep(10, 6)))
clone_tbl_pat1 <- dplyr::filter(clone_tbl, patient_id == "pat1")
clone_tbl_pat2 <- dplyr::filter(clone_tbl, patient_id == "pat2")
rates_exmpl_1 <- compute_rates_clon_excl(clone_tbl_pat1)
rates_exmpl_2 <- compute_rates_clon_excl(clone_tbl_pat2)
avg_rates_m <- apply(cbind(rates_exmpl_1, rates_exmpl_2), 2, mean)
names(avg_rates_m) <- c(names(rates_exmpl_1)[1], names(rates_exmpl_2)[1])
values_clon_excl_num_trees_pat1 <- get_hist_clon_excl(clone_tbl_pat1)
values_clon_excl_num_trees_pat2 <- get_hist_clon_excl(clone_tbl_pat2)
list_of_num_trees_all_pats <-
    list(pat1=values_clon_excl_num_trees_pat1[[1]], 
    pat2=values_clon_excl_num_trees_pat2[[1]])
list_of_clon_excl_all_pats <-
    list(pat1=values_clon_excl_num_trees_pat1[[2]],
    pat2=values_clon_excl_num_trees_pat2[[2]])
num_pat_pair_max <- 2
num_pairs_sim <- 10
ecdf_list <- generate_ecdf_test_stat(avg_rates_m, 
                list_of_num_trees_all_pats, list_of_clon_excl_all_pats,
                num_pat_pair_max, num_pairs_sim)
plot(ecdf_list[[2]])

GeneAccord documentation built on Nov. 8, 2020, 8:04 p.m.