cluster | R Documentation |
Given a sample-by-feature matrix and sample-associated metadata including their biological condition groupings, cluster samples hierarchically and use external cluster validity measures (Adjusted Rand Index, Normalized Mutual Information, and V measure) to assess the agreement between the inferred clusters and the biological conditions. Optionally, produce a heatmap reflecting the hierarchical clustering result.
cluster(ft_mat, metadata, query, heatmap = FALSE, title = NULL,
outdir = NULL, optimal_clusters = TRUE, n_features = FALSE,
estimate_state = FALSE, method = NULL, test_condition = NULL,
signal_col = NULL, mark = NULL)
ft_mat |
matrix where columns are features and rows are samples as
returned by |
metadata |
A dataframe with a column "Sample" which stores the sample identifiers, and a column "Condition", which stores the biological condition labels of the samples |
query |
GRanges object specifying the query region |
heatmap |
(Optional) Logical value indicating whether to plot the heatmap for hierarchical clustering. Default: FALSE |
title |
(Optional) If |
outdir |
(Optional) String specifying the name of the directory where PDF of heatmaps should be saved |
optimal_clusters |
(Optional) Logical value indicate whether to cluster samples into two groups, or to find the optimal clustering solution by choosing the set of clusters which maximizes the Average Silhouette width. Default: TRUE |
n_features |
(Optional) Logical value indicating whether to include a column "n_features" in the output storing the number of features in the feature matrix constructed for the region, which may be useful for understanding the behaviour of the binary strategy for constructing feature matrices. Default: FALSE |
estimate_state |
(Optional) Logical value indicating whether to include a column "state" in the output specifying the estimated chromatin state of a test condition. The state will be on of "ON", "OFF", or NA, where the latter results if a binary switch between the conditions is unclear. Default: FALSE. |
method |
(Optional) If |
test_condition |
(Optional) If |
signal_col |
(Optional) If |
mark |
(Optional) If |
A dataframe with the region, the number of clusters inferred, the cluster validity statistics, and the cluster assignments for each sample
samples <- c("E068", "E071", "E074", "E101", "E102", "E110")
bedfiles <- system.file("extdata", paste0(samples, ".H3K4me3.bed"),
package = "chromswitch")
Conditions <- c(rep("Brain", 3), rep("Other", 3))
metadata <- data.frame(Sample = samples,
H3K4me3 = bedfiles,
Condition = Conditions,
stringsAsFactors = FALSE)
region <- GRanges(seqnames = "chr19",
ranges = IRanges(start = 54924104, end = 54929104))
lpk <- retrievePeaks(H3K4me3,
metadata = metadata,
region = region)
ft_mat <- summarizePeaks(lpk, mark = "H3K4me3",
cols = c("qValue", "signalValue"))
cluster(ft_mat, metadata, region)
# Estimate the state of the test condition, "Brain"
cluster(ft_mat, metadata, region,
estimate_state = TRUE,
method = "summary",
signal_col = "signalValue",
mark = "H3K4me3",
test_condition = "Brain")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.