Nothing
#' @title Statistical Inference with DESeq package based on the provided reads count for exomic bins.
#'
#' @description \code{call_peaks_with_GLM} conduct inference on every exome bins using negative binomial model,
#' the significant bins will be the merged into peaks.
#'
#' @details \code{call_peaks_with_GLM} will performe exome level peak calling using DESeq2 model,
#'
#' The significant bins will be merged into modification peaks.
#'
#' The insignificant bins (pass the row means filtering) will also be merged into control peaks.
#'
#' @param SE_bins a \code{SummarizedExperiment} object. The meta-data collumn should contain the design information of IP/input + treated/control.
#'
#' @param glm_type a character, which can be one of the "Poisson", "NB", and "DESeq2". This argument specify the type of generalized linear model used in peak calling; Default to be "Poisson".
#' The DESeq2 method is only recommended for high power experiments with more than 3 biological replicates for both IP and input.
#'
#' @param txdb the txdb object that is necessary for the calculation of the merge of the peaks.
#'
#' @param correct_GC_bg a \code{logical} value of whether to estimate the GC content linear effect on background regions; default \code{= TRUE}.
#'
#' If \code{correct_GC_bg = TRUE}, it could result in a more accurate estimation of the technical effect of GC content for the RNA modifications that are highly biologically related to GC content.
#'
#' @param qtnorm a \code{logical} of whether to perform subset quantile normalization after the GC content linear effect correction; default \code{= TRUE}.
#'
#' Subset quantile normalization will be applied within the IP and input samples seperately to account for the inherent differences between the marginal distributions of IP and input samples.
#'
#' @param count_cutoff an integer value indicating the cutoff of the mean of reads count in a row, inference is only performed on the windows with the row average read count bigger than the cutoff. Default value is 5.
#'
#' @param p_cutoff a numeric value of the p value cutoff used in DESeq inference.
#'
#' @param p_adj_cutoff a numeric value of the adjusted p value cutoff used in DESeq2 inference; if provided, the value of \code{p_cutoff} will be ignored; Default = 0.05.
#'
#' @param log2FC_cutoff a non negative numeric value of the log2 fold change (log2 IP/input) cutoff used in the inferene of peaks.
#'
#' @param consistent_peak a \code{logical} of whether the positive peaks returned should be consistent among replicates; default \code{= TRUE}.
#'
#' @param consistent_log2FC_cutoff a \code{numeric} for the modification log2 fold changes cutoff in the peak consisency calculation; default = 0.
#'
#' @param consistent_fdr_cutoff a \code{numeric} for the BH adjusted C-test p values cutoff in the peak consistency calculation; default { = 0.05}. Check \code{\link{ctest}}.
#'
#' @param alpha a \code{numeric} for the binomial quantile used in the consitent peak filter; default\code{ = 0.05}.
#'
#' @param p0 a \code{numeric} for the binomial proportion parameter used in the consistent peak filter; default \code{= 0.8}.
#'
#' For a peak to be consistently methylated, the minimum number of significant enriched replicate pairs is defined as the 1 - alpha quantile of a binomial distribution with p = p0 and N = number of possible pairs between replicates.
#'
#' The consistency defined in this way is equivalent to the rejection of an exact binomial test with null hypothesis of p < p0 and N = replicates number of IP * replicates number of input.
#'
#' @return This function will return a list of \code{GRangesList} object storing peaks for both modification and control.
#'
#' @import SummarizedExperiment
#'
#'
call_peaks_with_GLM <- function(SE_bins,
glm_type = c("Poisson", "NB", "DESeq2"),
correct_GC_bg = TRUE,
qtnorm = TRUE,
txdb,
count_cutoff = 5,
p_cutoff = NULL,
p_adj_cutoff = 0.05,
log2FC_cutoff = 0,
consistent_peak = TRUE,
consistent_log2FC_cutoff = 0,
consistent_fdr_cutoff = 0.05,
alpha = 0.05,
p0 = 0.8) {
design_IP_temp <- rep("input", ncol(SE_bins))
design_IP_temp[colData(SE_bins)$design_IP] <- "IP"
SE_bins$design_IP <- factor(design_IP_temp)
SE_bins$design_IP <- relevel(SE_bins$design_IP, "input")
rm(design_IP_temp)
index_mod <- GLM_inference(
SE_bins = SE_bins,
glm_type = glm_type,
p_cutoff = p_cutoff,
p_adj_cutoff = p_adj_cutoff,
count_cutoff = count_cutoff,
log2FC_mod = log2FC_cutoff ,
correct_GC_bg = correct_GC_bg,
consistent_peak = consistent_peak,
consistent_log2FC_cutoff = consistent_log2FC_cutoff,
consistent_fdr_cutoff = consistent_fdr_cutoff,
alpha = alpha,
p0 = p0
)
gr_mod <-
reduce_peaks(
peaks_grl = rowRanges(SE_bins)[as.numeric(rownames(SE_bins)) %in% index_mod],
txdb = txdb
)
return(split(gr_mod, names(gr_mod)))
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.