ASE-GLM-edgeR | R Documentation |
These functions allow users to fit custom GLMs included/excluded counts using edgeR for differential Alternative Splice Events (ASEs)
fitASE_edgeR(
se,
strModelFormula,
strASEFormula,
useQL = TRUE,
IRmode = c("all", "annotated", "annotated_binary"),
filter_antiover = TRUE,
filter_antinear = FALSE
)
fitASE_edgeR_custom(
se,
model_IncExc,
model_ASE,
useQL = TRUE,
IRmode = c("all", "annotated", "annotated_binary"),
filter_antiover = TRUE,
filter_antinear = FALSE
)
testASE_edgeR(
se,
fit,
coef_IncExc = ncol(fit[["model_IncExc"]]),
contrast_IncExc = NULL,
coef_ASE = ncol(fit[["model_ASE"]]),
contrast_ASE = NULL
)
addPSI_edgeR(results, se, condition, conditionList)
se |
The NxtSE object created by |
strModelFormula |
A string specifying the model formula to fit isoform
counts to assess differential expression in isolation. Should take the form
of |
strASEFormula |
A string specifying the model formula to fit PSIs
(isoform ratios). The variate of interest should be specified as an
interactiion term with |
useQL |
(default |
IRmode |
(default |
filter_antiover , filter_antinear |
Whether to remove novel IR events that overlap over or near anti-sense genes. Default will exclude antiover but not antinear introns. These are ignored if strand-specific RNA-seq protocols are used. |
model_IncExc |
A model matrix in which to model differential expression
of isoform counts in isolation. The number of rows must equal that of the
number of samples in |
model_ASE |
A model matrix in which to model differential PSIs.
The number of rows must be twice that of the number of samples in |
fit |
The output returned by the |
coef_IncExc , coef_ASE |
model coefficients to be dropped for LRT test
between full and reduced models. Directly parsed onto |
contrast_IncExc , contrast_ASE |
numeric vector specifying one or more #' contrasts of the linear model coefficients to be tested.
Directly parsed onto |
results |
The return value of |
condition |
The name of the column containing the condition values in
|
conditionList |
A list (or vector) of condition values of which to calculate mean PSIs |
edgeR accounts appropriately for zero-counts which are often problematic as PSI approaches zero or one, leading to false positives. The following functions allow users to define model formulas to test relative expressions of included / excluded counts (to assess whether isoforms are differentially regulated, in isolation), as well as together as an interaction (the latter provides results of differential ASE analysis)
See the examples section for a brief explanation of how to use these functions.
See also ASE-methods for further explanations of results output.
fitASE_edgeR
and fitASE_edgeR_custom
returns a named list containing
the following:
IncExc
and ASE
are DGEGLM
objects containing the fitted models for
isoform counts and PSIs, respectively
model_IncExc
and model_ASE
are model matrices of the above fitted
models.
testASE_edgeR()
returns a data.table containing the following:
EventName: The name of the ASE event. This identifies each ASE in downstream functions including makeMeanPSI, makeMatrix, and plotCoverage
EventType: The type of event. See details section above.
EventRegion: The genomic coordinates the event occupies. This spans the most upstream and most downstream splice junction involved in the ASE, and is use to guide the plotCoverage function.
flags: Indicates which isoforms are NMD substrates and/or which are formed by novel splicing only.
edgeR specific output equivalent to statistics returned by
edgeR::topTags()
:
logFC, logCPM, F, PValue, FDR: log fold change, log counts per million, F statistic, p value and (Benjamini Hochberg) adjusted p values of the differential PSIs for the contrasts or coefficients tested.
inc/exc_(...): edgeR statistics corresponding to differential expression testing for raw included / excluded counts in isolation (not of the PSIs).
addPSI_edgeR()
appends the following columns to the above output
AvgPSI_X: the average percent spliced in / percent IR levels for condition X. Note this is a geometric mean, based on the arithmetic mean of logit PSI values.
deltaPSI: The difference in PSI between the mean values of the two conditions.
abs_deltaPSI: The absolute value of difference in PSI between the mean values of the two conditions.
fitASE_edgeR()
: Use edgeR to fit counts and ASE models with a
given design formula
fitASE_edgeR_custom()
: Use edgeR to fit counts and ASE models with a
given design formula
testASE_edgeR()
: Use edgeR to return differential ASE results. coef
and contrast are parsed onto edgeR's glmQLFTest function
addPSI_edgeR()
: Adds average and delta PSIs of conditions of
interest onto results produced by testASE_edgeR(). Note this is done
automatically for other methods described in ASE-methods
.
Lun A, Smyth G (2017). 'No counts, no variance: allowing for loss of degrees of freedom when assessing biological variability from RNA-seq data' Stat Appl Genet Mol Biol, 017 Apr 25;16(2):83-93. https://doi.org/10.1515/sagmb-2017-0010
# Load the NxtSE object and set up the annotations
# - see ?makeSE on example code of generating this NxtSE object
se <- SpliceWiz_example_NxtSE()
colData(se)$treatment <- rep(c("A", "B"), each = 3)
colData(se)$replicate <- rep(c("P","Q","R"), 2)
require("edgeR")
fit <- fitASE_edgeR(
se,
strModelFormula = "~0 + replicate + treatment",
strASEFormula = "~0 + replicate + treatment + treatment:ASE"
)
# Get coefficient terms of Included / Excluded counts isolated model
colnames(fit$model_IncExc)
# [1] "replicateP" "replicateQ" "replicateR" "treatmentB"
# Get coefficient terms of PSI model
colnames(fit$model_ASE)
# [1] "replicateP" "replicateQ" "replicateR" "treatmentB"
# [5] "treatmentA:ASEIncluded" "treatmentB:ASEIncluded"
# Contrast between treatment "B" against treatment "A"
res <- testASE_edgeR(se, fit,
contrast_IncExc = c(0,0,0,1),
contrast_ASE = c(0,0,0,0,-1,1)
)
### # Add mean PSI values to results:
res_withPSI <- addPSI_edgeR(res, se, "treatment", c("B", "A"))
### Using custom model matrices to model counts
# - the equivalent analysis can be performed as follows:
# Sample annotations for isoform count expressions
colData <- as.data.frame(colData(se))
# Sample annotations for isoform count PSI analysis
colData_ASE <- rbind(colData, colData)
colData_ASE$ASE <- rep(c("Included", "Excluded"), each = nrow(colData))
rownames(colData_ASE) <- c(
paste0(rownames(colData), ".Included"),
paste0(rownames(colData), ".Excluded")
)
model_IncExc <- model.matrix(
~0 + replicate + treatment,
data = colData
)
model_ASE <- model.matrix(
~0 + replicate + treatment + treatment:ASE,
data = colData_ASE
)
fit <- fitASE_edgeR_custom(se, model_IncExc, model_ASE)
res_customModel <- testASE_edgeR(se, fit,
contrast_IncExc = c(0,0,0,1),
contrast_ASE = c(0,0,0,0,-1,1)
)
# Check this produces identical results:
identical(res_customModel, res)
### Time series examples using edgeR and splines
# - similar to section 4.8 in the edgeR vignette
colData(se)$timepoint <- rep(c(1,2,3), each = 2)
colData(se)$batch <- rep(c("1", "2"), 3)
# First, we set up a polynomial spline with 2 degrees of freedom:
Time <- poly(colData(se)$timepoint, df = 2)
# Next, we define the batch factor:
Batch <- factor(colData(se)$batch)
# Finally, we construct the same factors for ASE analysis. Note that
# each factor must be repeated twice
Time_ASE <- rbind(Time, Time)
Batch_ASE <- c(Batch, Batch)
ASE <- factor(
rep(c("Included", "Excluded"), each = nrow(colData(se)))
)
# Now, we set up the model matrices for isoform and PSI count modelling
model_IncExc <- model.matrix(~0 + Batch + Time)
model_ASE <- model.matrix(~0 + Batch_ASE + Time_ASE + Time_ASE:ASE)
fit <- fitASE_edgeR_custom(se, model_IncExc, model_ASE)
# Note the coefficients of interest in the constructed models:
colnames(model_IncExc)
# [1] "Batch1" "Batch2" "Time1" "Time2"
colnames(model_ASE)
# [1] "Batch_ASE1" "Batch_ASE2" "Time_ASE1" "Time_ASE2"
# [5] "Time_ASE1:ASEIncluded" "Time_ASE2:ASEIncluded"
# We are interested in a model in which `Time` is excluded, thus:
res <- testASE_edgeR(se, fit,
coef_IncExc = 3:4,
coef_ASE = 5:6
)
# Finally, add PSI values for each time point:
res_withPSI <- addPSI_edgeR(res, se, "timepoint", c(1, 2, 3))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.