Description Usage Arguments Details Value Author(s) References See Also Examples
Fits an impulse model to time course data and uses this model as a basis to detect differentially expressed (DE) genes. If a single time course data set is given, DE genes are detected over time, whereas if an additional control time course data set is present, DE genes are detected between both datasets.
1 2 3 4 5 | impulse_DE(expression_table = NULL, annotation_table = NULL,
colname_time = NULL, colname_condition = NULL,
control_timecourse = FALSE, control_name = NULL, case_name = NULL,
expr_type = "Array", plot_clusters = TRUE, n_iter = 100,
n_randoms = 50000, n_process = 4, Q_value = 0.01, new_device = TRUE)
|
expression_table |
numeric matrix of expression values; genes should be in rows, samples in columns. Data should be properly normalized and log2-transformed as well as filtered for present or variable genes. |
annotation_table |
table providing co-variables for the samples including condition and time points. Time points must be numeric numbers. |
colname_time |
character string specifying the column name of the
co-variable " |
colname_condition |
character string specifying the column name of
the co-variable " |
control_timecourse |
logical indicating whether a control time
timecourse is part of the data set ( |
control_name |
character string specifying the name of the control
condition in |
case_name |
character string specifying the name of the case
condition in |
expr_type |
character string with allowed values " |
plot_clusters |
logical indicating whether to plot the clusters
( |
n_iter |
numeric value specifying the number of iterations, which are
performed to fit the impulse model to the clusters. Default is |
n_randoms |
numeric value specifying the number of generated randomized
background iterations, which are used for differential expression analysis.
Default is |
n_process |
numeric value indicating the number of processes, which can
be used on the machine to run calculations in parallel. Default
is |
Q_value |
numeric value specifying the cutoff to call genes
significantly differentially expressed after FDR correction (adjusted
p-value). Default is |
new_device |
logical indicating whether each plot should be plotted
into a new device ( |
ImpulseDE
is based on the impulse model proposed by
Chechik and Koller, which reflects a two-step behavior of genes within a cell
responding to environmental changes (Chechik and Koller, 2009). To detect
differentially expressed genes, a five-step workflow is followed:
The genes are clustered into a limited number of groups using k-means
clustering. If plot_clusters
= TRUE
, the clusters are plotted.
The impulse model is fitted to the mean expression profiles of the clusters. The best parameter sets are then used for the next step.
The impulse model is fitted to each gene separately using the parameter sets from step 2 as optimal start point guesses.
The impulse model is fitted to a randomized dataset (bootstrap), which is essential to detect significantly differentially expressed genes (Storey et al., 2005).
Detection of differentially expressed genes utilizing the fits to the real and randomized data sets. FDR-correction is performed to obtain adjusted p-values (Benjamini and Hochberg, 1995).
List containing the following elements:
impulse_fit_results
List containing fitted values and model
parameters:
impulse_parameters_case
Matrix of fitted impulse model
parameters and sum of squared fitting errors for the case dataset. If a
control time course is present, corresponding list entries will exist for the
control and the combined dataset as well (named
impulse_parameters_control
and impulse_parameters_combined
,
respectively).
impulse_fits_case
Matrix of impulse values calculated based
on the analyzed time points and the fitted model parameters for the combined
dataset. If a control time course is present, corresponding list entries will
exist for the control and the combined dataset as well (named
impulse_fits_control
and impulse_fits_combined
,
respectively).
DE_results
List containg the results from the differential
expression analysis:
DE_genes
data.frame containing the names of genes being called
as differentially expressed according to the specified cutoff Q_value
together with the adjusted p-values.
pvals_and_flags
data.frame containing all gene names
together with the adjusted p-values and flags for differential expression
according to additional tests.
clustering_results
List containing the clustering results:
kmeans_clus_case
Numeric vector of clusters IDs, to which the
genes were finally assigned.
cluster_means_case
Matrix containing the mean expression values
for each cluster (taken over all genes assigned to a cluster).
pre_clus_case
Numeric number of clusters determined after the
first (preliminary) clustering step.
fine_clus_case
Numeric number of final clusters determined after
the second clustering step.
If a control time course is present, those four list entries will
exist correspondingly for the control and the combined dataset as well
(ending with _control
and _combined
instead of _case
,
respectively).
Jil Sander
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol., 57, 289-300.
Storey, J.D. et al. (2005) Significance analysis of time course microarray experiments. Proc. Natl. Acad. Sci. USA, 102, 12837-12841.
Rangel, C., Angus, J., Ghahramani, Z., Lioumi, M., Sotheran, E., Gaiba, A., Wild, D.L., Falciani, F. (2004) Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics, 20(9), 1361-72.
Chechik, G. and Koller, D. (2009) Timing of Gene Expression Responses to Envi-ronmental Changes. J. Comput. Biol., 16, 279-290.
Yosef, N. et al. (2013) Dynamic regulatory network controlling TH17 cell differentiation. Nature, 496, 461-468.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | #' Install package longitudinal and load it
library(longitudinal)
#' Attach datasets
data(tcell)
#' check dimension of data matrix of interest
dim(tcell.10)
#' generate a proper annotation table
annot <- as.data.frame(cbind("Time" =
sort(rep(get.time.repeats(tcell.10)$time,10)),
"Condition" = "activated"), stringsAsFactors = FALSE)
#' Time columns must be numeric
annot$Time <- as.numeric(annot$Time)
#' rownames of annotation table must appear in data table
rownames(annot) = rownames(tcell.10)
#' apply ImpulseDE in single time course mode
#' since genes must be in rows, transpose data matrix using t()
#' For the example, reduce iterations to 10, randomizations to 50, number of
#' genes to 20 and number of used processors to 1:
impulse_results <- impulse_DE(t(tcell.10)[1:20,], annot, "Time", "Condition",
n_iter = 10, n_randoms = 50, n_process = 1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.