computeProbs | R Documentation |
This function computes posterior probabilities of chemical modification in a structure probing experiment. It uses empirical p-values, which quantify the difference between LDRs (log-ratios of drop-off rates) from treatment-control and control-control experimental comparisons, as observations in a HMM with an emission model defined as a Beta-Uniform mixture.
computeProbs(LDR_C, LDR_CT, Nc, Nt, strand, nuclPosition, analysedC,
analysedCT, stretches, optimise)
LDR_C |
A matrix of transformed LDRs for control-control comparisons as returned
by |
LDR_CT |
A matrix of transformed LDRs for treatment-control comparisons as
returned by |
Nc |
Number of control experimental replicates. Must be at least 2. |
Nt |
Number of treatment experimental replicates. Must be at least 2. |
strand |
A character, indicating the plus ( |
nuclPosition |
A list where each component corresponds to a pattern (indicated by the
field |
analysedC |
The first element of the list returned by
|
analysedCT |
The second element of the list returned by
|
stretches |
An |
optimise |
An indicator variable to use the EM optimisation of shape parameters of
the Beta distribution (emission model of the HMM for the modified
state). If optimisation is to be used, the variable should be set to the
desired tolerance for parameter estimation (once the estimates stop
changing within this tolerance, the algorithm stops). Set to |
The function first computes percentiles of LDR null distributions, for each
percent from 1 to 89 and for a tenth of a percent from 90 to 100. If
multiple nucleotide patterns are used, then null distributions are computed
separately for each pattern. Then, LDRs from treatment-control comparisons
are compared to the percentiles of a corresponding null distribution via
computing the p-values, defined as 1 - closest percentile
. Thus, if
the closest percentile to an LDR is 99th, then its p-value will be 0.01,
reflecting a small probability for it to be from this null distribution.
The p-values are then used as observations in a hidden Markov model.
P-values corresponding to multiple comparisons at the same nucleotide
position are considered to be independent measurements.
HMM has a binary latent state corresponding to the true state of a nucleotide, modified or unmodified. Transition probabilities are set according to the empirically derived expectations of unmodified nucleotide stretches of average length 20 and modified stretches of average length 5. Emission probabilites come from a Beta-Uniform mixture, where the Uniform component represents the conditional model for the unmodifed state, and the Beta component represents the conditional model for the modified state. The shape parameters of Beta distribution are set to (1, 10) to approximately match the likelihood under the null hypothesis (unmodified state) for the LDRs in the top quintile of the distribution.
We also provide an option to optimise the shape parameters of Beta
distribution with the EM algorithm. To do this, make the function call with
the last parameter optimise
set to the desired tolerance, within
which the parameter estimates are allowed not to change at the next
iteration. The EM algorithm is run for 10 iterations and each M-step is
performed with Newton's optimisation method and is allowed 20 iterations. We
remark that during our experiments, this optimisation appeared vulnerable to
local minima. Thus, the default version of the pipeline does not use this
optimisation.
An n-by-2 matrix of posterior probabilities for all selected nucleotides. First column corresponds to the probability of the nucleotide being unmodified (given the observations) and the second column - to the probability of the nucleotide being modified. Those nucleotides not belonging to stretches for which the HMM was run are assigned an ‘NA’.
The following errors are returned if:
"Number of control and treatment replicates must be at least 2." the number of control or treatment experimental replicates is less than 2;
"All lists of positions selected for pair-wise comparisons should be non-empty." a list of nucleotides for control-control or treatment-control comparisons is empty;
"Strand should be either plus or minus, specified with a sign." strand is not specified as "+" or "-";
"The list of considered nucleotide positions should be non-empty. If patterns are not used, a vector with all positions should be provided in the first element of the list." the list of considered nucleotides (which should either correspond to patterns if addressing sequence bias or list all positions) is empty;
"The matrix of control-control (treatment-control) LDRs should have as many columns as there are control-control (treatment-control) comparisons." dimensionalities of matrices with LDRs do not agree with the number of pair-wise comparisons;
"Please provide a tolerance if shape parameters are to be optimised with EM
algorithm."
the optimise
parameter was not set to a (real) tolerance value.
Alina Selega, Sander Granneman, Guido Sanguinetti
Selega et al. "Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments", Nature Methods (2016).
See Also stabiliseVariance
,
selectNuclPos
, findPatternPos
,
and computeStretches
.
library(SummarizedExperiment)
Nc <- 3
Nt <- 3
t <- 1
nuclSelection <- selectNuclPos(se, Nc, Nt, t)
assay(se, "dropoff_rate") <- scaleDOR(se, nuclSelection, Nc, Nt)
stretches <- computeStretches(se, t)
varStab <- stabiliseVariance(se, nuclSelection, Nc, Nt)
LDR_C <- varStab$LDR_C
LDR_CT <- varStab$LDR_CT
sequence <- subject(rowData(se)$nucl)
nuclPosition <- list()
nuclPosition[[1]] <- 1:nchar(sequence)
posteriors <- computeProbs(LDR_C, LDR_CT, Nc, Nt, '+', nuclPosition,
nuclSelection$analysedC, nuclSelection$analysedCT,
stretches)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.