evalCand: Evaluate candidate levels and select the optimal one

View source: R/evalCand.R

evalCandR Documentation

Evaluate candidate levels and select the optimal one

Description

Evaluate all candidate levels proposed by getCand and select the one with best performance. For more details about how the scoring is done, see Huang et al (2021): https://doi.org/10.1186/s13059-021-02368-1.

Usage

evalCand(
  tree,
  type = c("single", "multiple"),
  levels,
  score_data = NULL,
  node_column,
  p_column,
  sign_column,
  feature_column = NULL,
  method = "BH",
  limit_rej = 0.05,
  use_pseudo_leaf = FALSE,
  message = FALSE
)

Arguments

tree

A phylo object.

type

A character scalar indicating whether the evaluation is for a DA-type workflow (set type="single") or a DS-type workflow (set type="multiple").

levels

A list of candidate levels that are returned by getCand. If type = "single", elements in the list are candidate levels, and are named by the value of the tuning parameter. If type = "multiple", a nested list is required and the list should be named by the feature (e.g., genes). In that case, each element is a list of candidate levels for that feature.

score_data

A data.frame (type = "single") or a list of data.frames (type = "multiple"). Each data.frame must have at least one column containing the node IDs (defined by node_column), one column with p-values (defined by p_column), one column with the direction of change (defined by sign_column) and one optional column with the feature (feature_column, for type="multiple").

node_column

The name of the column that contains the node information.

p_column

The name of the column that contains p-values of nodes.

sign_column

The name of the column that contains the direction of the (estimated) change.

feature_column

The name of the column that contains information about the feature ID.

method

method The multiple testing correction method. Please refer to the argument method in p.adjust. Default is "BH".

limit_rej

The desired false discovery rate threshold.

use_pseudo_leaf

A logical scalar. If FALSE, the FDR is calculated on the leaf level of the tree; If TRUE, the FDR is calculated on the pseudo-leaf level. The pseudo-leaf level is the level on which entities have sufficient data to run analysis and the that is closest to the leaf level.

message

A logical scalar, indicating whether progress messages should be printed.

Value

A list with the following components:

candidate_best

The best candidate level

output

Node-level information for best candidate level

candidate_list

A list of candidates

level_info

Summary information of all candidates

FDR

The specified FDR level

method

The method to perform multiple test correction.

column_info

A list with the specified node, p-value, sign and feature column names

More details about the columns in level_info:

  • t The thresholds.

  • r The upper limit of t to control FDR on the leaf level.

  • is_valid Whether the threshold is in the range to control leaf FDR.

  • limit_rej The specified FDR.

  • level_name The name of the candidate level.

  • rej_leaf The number of rejections on the leaf level.

  • rej_pseudo_leaf The number of rejected pseudo-leaf nodes.

  • rej_node The number of rejections on the tested candidate level (leaves or internal nodes).

Author(s)

Ruizhu Huang

Examples

suppressPackageStartupMessages({
    library(TreeSummarizedExperiment)
    library(ggtree)
})

## Generate example tree and assign p-values and signs to each node
data(tinyTree)
ggtree(tinyTree, branch.length = "none") +
   geom_text2(aes(label = node)) +
   geom_hilight(node = 13, fill = "blue", alpha = 0.5) +
   geom_hilight(node = 18, fill = "orange", alpha = 0.5)
set.seed(1)
pv <- runif(19, 0, 1)
pv[c(seq_len(5), 13, 14, 18)] <- runif(8, 0, 0.001)

fc <- sample(c(-1, 1), 19, replace = TRUE)
fc[c(seq_len(3), 13, 14)] <- 1
fc[c(4, 5, 18)] <- -1
df <- data.frame(node = seq_len(19),
                 pvalue = pv,
                 logFoldChange = fc)

## Propose candidates
ll <- getCand(tree = tinyTree, score_data = df,
               node_column = "node",
               p_column = "pvalue",
               sign_column = "logFoldChange")

## Evaluate candidates
cc <- evalCand(tree = tinyTree, levels = ll$candidate_list,
               score_data = ll$score_data, node_column = "node",
               p_column = "pvalue", sign_column = "logFoldChange",
               limit_rej = 0.05)

## Best candidate
cc$candidate_best

## Details for best candidate
cc$output


fionarhuang/treeclimbR documentation built on Jan. 1, 2025, 9:02 p.m.