plotMDS.DGEList | R Documentation |
Plot samples on a two-dimensional scatterplot so that distances on the plot approximate the expression differences between the samples.
## S3 method for class 'DGEList' plotMDS(x, top = 500, labels = NULL, pch = NULL, cex = 1, dim.plot = c(1,2), gene.selection = "pairwise", xlab = NULL, ylab = NULL, method = "logFC", prior.count = 2, plot = TRUE, var.explained = TRUE, ...)
x |
a |
top |
number of top genes used to calculate pairwise distances. |
labels |
character vector of sample names or labels. If |
pch |
plotting symbol or symbols. See |
cex |
numeric vector of plot symbol expansions. See |
dim.plot |
which two dimensions should be plotted, numeric vector of length two. |
gene.selection |
character, |
xlab |
x-axis label |
ylab |
y-axis label |
method |
method used to compute distances. Possible values are |
prior.count |
average prior count to be added to observation to shrink the estimated log-fold-changes towards zero. Only used when |
plot |
logical. If |
var.explained |
logical. If |
... |
any other arguments are passed to |
The default method (method="logFC"
) is to convert the counts to log-counts-per-million using cpm
and to pass these to the limma plotMDS
function.
This method calculates distances between samples based on log2 fold changes.
See the plotMDS help page
for details.
The alternative method (method="bcv"
) calculates distances based on biological coefficient of variation.
A set of top genes are chosen that have largest biological variation between the libraries
(those with largest genewise dispersion treating all libraries as one group).
Then the distance between each pair of libraries (columns) is the biological coefficient of variation (square root of the common dispersion) between those two libraries alone, using
the top genes.
Beware that the "bcv"
method is slow when the number of samples is large.
The "bcv"
method is in general much less used than "logFC"
and is scheduled to be removed in a future version of edgeR.
The number of genes (top
) chosen for this exercise should roughly correspond to the number of differentially expressed genes with materially large fold-changes.
The default setting of 500 genes is widely effective and suitable for routine use, but a smaller value might be chosen for when the samples are distinguished by a specific focused molecular pathway.
Very large values (greater than 1000) are not usually so effective.
An object of class MDS
is invisibly returned and (if plot=TRUE
) a plot is created on the current graphics device.
Yunshun Chen, Mark Robinson and Gordon Smyth
plotMDS
, cmdscale
, as.dist
# Simulate DGE data for 1000 genes and 6 samples. # Samples are in two groups # First 200 genes are differentially expressed in second group ngenes <- 1000 nlib <- 6 counts <- matrix(rnbinom(ngenes*nlib, size=1/10, mu=20),ngenes,nlib) rownames(counts) <- paste("gene",1:ngenes, sep=".") group <- gl(2,3,labels=c("Grp1","Grp2")) counts[1:200,group=="Grp2"] <- counts[1:200,group=="Grp2"] + 10 y <- DGEList(counts,group=group) y <- calcNormFactors(y) # without labels, indexes of samples are plotted. col <- as.numeric(group) mds <- plotMDS(y, top=200, col=col) # or labels can be provided, here group indicators: plotMDS(mds, col=col, labels=group)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.