run: run() : Invokes a routine inferCNV analysis to Infer CNV...

Description Usage Arguments Value Examples

View source: R/inferCNV_ops.R

Description

Function doing the actual analysis before calling the plotting functions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
run(
  infercnv_obj,
  cutoff = 1,
  min_cells_per_gene = 3,
  out_dir = NULL,
  window_length = 101,
  smooth_method = c("pyramidinal", "runmeans", "coordinates"),
  num_ref_groups = NULL,
  ref_subtract_use_mean_bounds = TRUE,
  cluster_by_groups = FALSE,
  cluster_references = TRUE,
  k_obs_groups = 1,
  hclust_method = "ward.D2",
  max_centered_threshold = 3,
  scale_data = FALSE,
  HMM = FALSE,
  HMM_transition_prob = 1e-06,
  HMM_report_by = c("subcluster", "consensus", "cell"),
  HMM_type = c("i6", "i3"),
  HMM_i3_pval = 0.05,
  HMM_i3_use_KS = TRUE,
  BayesMaxPNormal = 0.5,
  sim_method = "meanvar",
  sim_foreground = FALSE,
  reassignCNVs = TRUE,
  analysis_mode = c("samples", "subclusters", "cells"),
  tumor_subcluster_partition_method = c("random_trees", "qnorm", "pheight", "qgamma",
    "shc"),
  tumor_subcluster_pval = 0.1,
  denoise = FALSE,
  noise_filter = NA,
  sd_amplifier = 1.5,
  noise_logistic = FALSE,
  outlier_method_bound = "average_bound",
  outlier_lower_bound = NA,
  outlier_upper_bound = NA,
  final_scale_limits = NULL,
  final_center_val = NULL,
  debug = FALSE,
  num_threads = 4,
  plot_steps = FALSE,
  resume_mode = TRUE,
  png_res = 300,
  plot_probabilities = TRUE,
  save_rds = TRUE,
  save_final_rds = TRUE,
  diagnostics = FALSE,
  remove_genes_at_chr_ends = FALSE,
  prune_outliers = FALSE,
  mask_nonDE_genes = FALSE,
  mask_nonDE_pval = 0.05,
  test.use = "wilcoxon",
  require_DE_all_normals = "any",
  hspike_aggregate_normals = FALSE,
  no_plot = FALSE,
  no_prelim_plot = FALSE,
  output_format = "png",
  useRaster = TRUE,
  up_to_step = 100
)

Arguments

infercnv_obj

An infercnv object populated with raw count data

cutoff

Cut-off for the min average read counts per gene among reference cells. (default: 1)

min_cells_per_gene

minimum number of reference cells requiring expression measurements to include the corresponding gene. default: 3

out_dir

path to directory to deposit outputs (default: NULL, required to provide non NULL)

## Smoothing params

window_length

Length of the window for the moving average (smoothing). Should be an odd integer. (default: 101)#'

smooth_method

Method to use for smoothing: c(runmeans,pyramidinal,coordinates) default: pyramidinal

#####

num_ref_groups

The number of reference groups or a list of indices for each group of reference indices in relation to reference_obs. (default: NULL)

ref_subtract_use_mean_bounds

Determine means separately for each ref group, then remove intensities within bounds of means (default: TRUE) Otherwise, uses mean of the means across groups.

#############################

cluster_by_groups

If observations are defined according to groups (ie. patients), each group of cells will be clustered separately. (default=FALSE, instead will use k_obs_groups setting)

cluster_references

Whether to cluster references within their annotations or not. (dendrogram not displayed) (default: TRUE)

k_obs_groups

Number of groups in which to break the observations. (default: 1)

hclust_method

Method used for hierarchical clustering of cells. Valid choices are: "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid". default("ward.D2")

max_centered_threshold

The maximum value a value can have after centering. Also sets a lower bound of -1 * this value. (default: 3), can set to a numeric value or "auto" to bound by the mean bounds across cells. Set to NA to turn off.

scale_data

perform Z-scaling of logtransformed data (default: FALSE). This may be turned on if you have very different kinds of data for your normal and tumor samples. For example, you need to use GTEx representative normal expression profiles rather than being able to leverage normal single cell data that goes with your experiment.

######################################################################### ## Downstream Analyses (HMM or non-DE-masking) based on tumor subclusters

HMM

when set to True, runs HMM to predict CNV level (default: FALSE)

HMM_transition_prob

transition probability in HMM (default: 1e-6)

HMM_report_by

cell, consensus, subcluster (default: subcluster) Note, reporting is performed entirely separately from the HMM prediction. So, you can predict on subclusters, but get per-cell level reporting (more voluminous output).

HMM_type

HMM model type. Options: (i6 or i3): i6: infercnv 6-state model (0, 0.5, 1, 1.5, 2, >2) where state emissions are calibrated based on simulated CNV levels. i3: infercnv 3-state model (del, neutral, amp) configured based on normal cells and HMM_i3_pval

HMM_i3_pval

p-value for HMM i3 state overlap (default: 0.05)

HMM_i3_use_KS

boolean: use the KS test statistic to estimate mean of amp/del distributions (ala HoneyBadger). (default=TRUE)

## Filtering low-conf HMM preds via BayesNet P(Normal)

BayesMaxPNormal

maximum P(Normal) allowed for a CNV prediction according to BayesNet. (default=0.5, note zero turns it off)

sim_method

method for calibrating CNV levels in the i6 HMM (default: 'meanvar')

sim_foreground

don't use... for debugging, developer option.

reassignCNVs

(boolean) Given the CNV associated probability of belonging to each possible state, reassign the state assignments made by the HMM to the state that has the highest probability. (default: TRUE)

###################### ## Tumor subclustering

analysis_mode

options(samples|subclusters|cells), Grouping level for image filtering or HMM predictions. default: samples (fastest, but subclusters is ideal)

tumor_subcluster_partition_method

method for defining tumor subclusters. Options('random_trees', 'qnorm') random_trees: (default) slow but best. Uses permutation statistics w/ tree construction. qnorm: defines tree height based on the quantile defined by the tumor_subcluster_pval

tumor_subcluster_pval

max p-value for defining a significant tumor subcluster (default: 0.1)

############################# ## de-noising parameters ####

denoise

If True, turns on denoising according to options below

noise_filter

Values +- from the reference cell mean will be set to zero (whitening effect) default(NA, instead will use sd_amplifier below.

sd_amplifier

Noise is defined as mean(reference_cells) +- sdev(reference_cells) * sd_amplifier default: 1.5

noise_logistic

use the noise_filter or sd_amplifier based threshold (whichever is invoked) as the midpoint in a logistic model for downscaling values close to the mean. (default: FALSE)

################## ## Outlier pruning

outlier_method_bound

Method to use for bounding outlier values. (default: "average_bound") Will preferentially use outlier_lower_bounda and outlier_upper_bound if set.

outlier_lower_bound

Outliers below this lower bound will be set to this value.

outlier_upper_bound

Outliers above this upper bound will be set to this value.

########################## ## Misc options

final_scale_limits

The scale limits for the final heatmap output by the run() method. Default "auto". Alt, c(low,high)

final_center_val

Center value for final heatmap output by the run() method.

debug

If true, output debug level logging.

num_threads

(int) number of threads for parallel steps (default: 4)

plot_steps

If true, saves infercnv objects and plots data at the intermediate steps.

resume_mode

leverage pre-computed and stored infercnv objects where possible. (default=TRUE)

png_res

Resolution for png output.

plot_probabilities

option to plot posterior probabilities (default: TRUE)

save_rds

Whether to save the current step object results as an .rds file (default: TRUE)

save_final_rds

Whether to save the final object results as an .rds file (default: TRUE)

diagnostics

option to create diagnostic plots after running the Bayesian model (default: FALSE)

####################### ## Experimental options

remove_genes_at_chr_ends

experimental option: If true, removes the window_length/2 genes at both ends of the chromosome.

prune_outliers

Define outliers loosely as those that exceed the mean boundaries among all cells. These are set to the bounds.

## experimental opts involving DE analysis

mask_nonDE_genes

If true, sets genes not significantly differentially expressed between tumor/normal to the mean value for the complete data set (default: 0.05)

mask_nonDE_pval

p-value threshold for defining statistically significant DE genes between tumor/normal

test.use

statistical test to use. (default: "wilcoxon") alternatives include 'perm' or 't'.'

require_DE_all_normals

If mask_nonDE_genes is set, those genes will be masked only if they are are found as DE according to test.use and mask_nonDE_pval in each of the comparisons to normal cells options: "any", "most", "all" (default: "any")

other experimental opts

hspike_aggregate_normals

instead of trying to model the different normal groupings individually, just merge them in the hspike.

no_plot

don't make any of the images. Instead, generate all non-image outputs as part of the run. (default: FALSE)

no_prelim_plot

don't make the preliminary infercnv image (default: FALSE)

output_format

Output format for the figure. Choose between "png", "pdf" and NA. NA means to only write the text outputs without generating the figure itself. (default: "png")

useRaster

Whether to use rasterization for drawing heatmap. Only disable if it produces an error as it is much faster than not using it. (default: TRUE)

up_to_step

run() only up to this exact step number (default: 100 >> 23 steps currently in the process)

Value

infercnv_obj containing filtered and transformed data

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
data(infercnv_data_example)
data(infercnv_annots_example)
data(infercnv_genes_example)

infercnv_object_example <- infercnv::CreateInfercnvObject(raw_counts_matrix=infercnv_data_example, 
                                                          gene_order_file=infercnv_genes_example,
                                                          annotations_file=infercnv_annots_example,
                                                          ref_group_names=c("normal"))

infercnv_object_example <- infercnv::run(infercnv_object_example,
                                         cutoff=1,
                                         out_dir=tempfile(), 
                                         cluster_by_groups=TRUE, 
                                         denoise=TRUE,
                                         HMM=FALSE,
                                         num_threads=2,
                                         no_plot=TRUE)

infercnv documentation built on Nov. 8, 2020, 11:10 p.m.