knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
The leafSummarizedExperiment
and treeSummarizedExperiment
classes are both
extensions of the SummarizedExperiment
class. They are used to store
rectangular data of experimental results as in a SummarizedExperiment
, and
also support the storage of a hierarchical structure and its link information to
the rectangular data. The leafSummarizedExperiment
is intended to store
experimental data and annotation information for the leaf nodes of a tree, while
the treeSummarizedExperiment
class contains experimental (or derived) data and
annotation information also for the internal nodes of the tree.
The leafSummarizedExperiment
class has exactly the same structure as the
SummarizedExperiment
class, including assays
, rowData
, colData
and
metadata
. More details about the SummarizedExperiment
structure can be found
in SummarizedExperiment for coordinating experimental assays, samples, and
regions of
interest.
The difference to SummarizedExperiment
is that leafSummarizedExperiment
has
more restrictions on the data.
phylo
object stored in metadata
and named tree
.phylo
object has a unique label for each leaf node.nodeLab
in rowData
, or row names of rowData
, to
provide the label of the node that each row is mapped to.The link between the assays
data and the nodes of the tree strucutre is
checked when the leafSummarizedExperiment
object is build. Only rows that
could be mapped to the nodes of the tree are kept in the matrix-like
elements of assays
.
We generate a toyTable with observations of 5 entities collected from 4 samples.
suppressPackageStartupMessages({ library(treeAGG) library(S4Vectors)}) # assays data set.seed(1) toyTable <- matrix(rnbinom(20, size = 1, mu = 10), nrow = 5) colnames(toyTable) <- paste(rep(LETTERS[1:2], each = 2), rep(1:2, 2), sep = "_") rownames(toyTable) <- paste("entity", seq_len(5), sep = "") toyTable
Descriptions of the r nrow(toyTable)
entities and r ncol(toyTable)
samples are given in the rowInf and colInf, respectively.
# row data rowInf <- DataFrame(var1 = sample(letters[1:2], 5, replace = TRUE), var2 = sample(c(TRUE, FALSE), 5, replace = TRUE), row.names = rownames(toyTable)) rowInf # column data colInf <- DataFrame(gg = c(1, 2, 3, 3), group = rep(LETTERS[1:2], each = 2), row.names = colnames(toyTable)) colInf
The hierarchical structure of the 5 entities is denoted as toyTree. We create
it by using the function rtree
from the package r CRANpkg("ape")
. It's a
phylo
object.
# create a toy tree suppressPackageStartupMessages(library(ape)) toyTree <- rtree(5) class(toyTree)
As we see, the phylo
object is actually a list of four elements.
str(toyTree) plot(toyTree)
To store the toy data, toyTable, rowInf, colInf, and toyTree, we could
use a leafSummarizedExperiment
container. The map between the rows of
toyTable and the leaf nodes of toyTree is checked when creating a
leafSummarizedExperiment
object. If the object is built as below, we get a
message that 5 rows of toyTable are removed because they can't be matched to
any node of the tree. The assays
table of the resulting test1 object
consequently has no rows.
# use the toy data to create a leafSummarizedExperiment object. test1 <- leafSummarizedExperiment(assays = list(toyTable), rowData = rowInf, colData = colInf, tree = toyTree) assays(test1)[[1]]
To correctly store data, we need to additionally provide the link information
between the rows of toyTable and the nodes of the toyTree via a column
nodeLab
in rowData
or via the row names of rowData
.
Option 1: Create a leafSummarizedExperiment
object by changing the row names
of rowData
.
# change the row names of the row data rowInf_1 <- rowInf rownames(rowInf_1) <- toyTree$tip.label toyTable_1 <- toyTable rownames(toyTable_1) <- toyTree$tip.label test2 <- leafSummarizedExperiment(assays = list(toyTable_1), rowData = rowInf_1, colData = colInf, tree = toyTree) assays(test2)[[1]]
Option 2: Create a leafSummarizedExperiment
object by adding a column
nodeLab
to the rowData
.
# add a column (nodeLab) to the row data rowInf$nodeLab <- toyTree$tip.label lse <- leafSummarizedExperiment(assays = list(toyTable), rowData = rowInf, colData = colInf, tree = toyTree) assays(lse)[[1]]
Although the leafSummarizedExperiment
object could be sucessfully created in
either way, the row names of the table in assays
are different. We recommend
to do it as option 2 because it keeps the original row names and works better
especially when there are multiple rows mapped to a same leaf node of the tree.
The leafSummarizedExperiment
only supports data on the level of the leaf
nodes. If users have rows mapped to internal nodes of a tree, then the
treeSummarizedExperiment
class should be used instead. Technically, the
treeSummarizedExperiment
could also store data on the level of leaf nodes.
Users could choose either leafSummarizedExperiment
or
treeSummarizedExperiment
to store data on the level of leaf node as shown in
Section \@ref(sec:tse-build)).
knitr::include_graphics("tse.png")
Compared with the SummarizedExperiment
class, there are two more slots, the
tree data (treeData
) and the link data (linkData
), in the
treeSummarizedExperiment
class. Other slots, including assays
, rowData
,
colData
and metadata
, are the same as those in SummarizedExperiment
.
Here, we construct a treeSummarizedExperiment
object using the toy example
from the previous section. With the function nodeValue
, data at all nodes of
the tree is created based on the available data on the leaf node level. The
argument fun
specifies how the value at an internal node is calculated from
its descendant nodes. Here, the sum is used, and thus the value for each node is
equal to the sum of the values of all its descendants. To view the running
process, message = TRUE
is used.
tse <- nodeValue(data = lse, fun = sum, message = TRUE)
The output tse is a treeSummarizedExperiment
object. The class
treeSummarizedExperiment
is an extension of the SummarizedExperiment
class.
class(tse) showClass("treeSummarizedExperiment")
As shown, there are two more slots, treeData and linkData, in
treeSummarizedExperiment
compared with SummarizedExperiment
.
tse
To extract a table in assays
from treeSummarizedExperiment
object, we could
use the assays
accessor function. This is similar to the
SummarizedExperiment
class.
(aData <- assays(tse)[[1]])
We could use the node labels from the link data (\@ref(sec:linkData)) as the row
names via the argument use.nodeLab = TRUE
. Commonly, the column nodeLab
is
used as the row names. However, if it has duplicated values, the column
nodeLab_alias
is used.
aData <- assays(tse, use.nodeLab = TRUE)[[1]]
The value at each node (from sample A_1) could be visualized with the
following figure. We see that the value at each internal node is the sum of
those at its descendant leaves. More details about how to use
r Biocpkg("ggtree")
to plot the tree could be seen
here.
# load packages suppressPackageStartupMessages({ library(ggtree) # to plot the tree }) # extract a sample column from assays ex1 <- assays(tse)[[1]][, 1, drop = FALSE] # combine it with the data extracted from linkData # rename column nodeNum as node # optional : datF <- cbind.data.frame(ex1, linkData(tse)) %>% # rename(node=nodeNum) datF <- cbind.data.frame(ex1, linkData(tse)) colN <- colnames(datF) colN[colN == "nodeNum"] <- "node" colnames(datF) <- colN # plot ggtree(treeData(tse)) %<+% datF + geom_text2(aes(label = A_1), color = "brown1", size = 8)
The data datF is used to annotate the tree via %<+%
(see `?"%<+%")
The row data, column data, and metadata could be accessed exactly as in the
SummarizedExperiment
class.
(rData <- rowData(tse))
The column nodeLab
is now moved to the linkData
(see Section
\@ref(sec:linkData)), and isn't in the rowData
(rData) of
treeSummarizedExperiment
anymore.
(cData <- colData(tse))
# It is empty in the metadata (mData <- metadata(tse))
The linkData()
accessor is used to view the link information between rows of
matrix-like elements in the assays
and nodes of the tree.
(linkD <- linkData(tse))
Rows of the link data are one-to-one mapped to rows of the matrix-like element
in assays
. Their orders are exactly the same. Each row of the matrix-like
element in assays
could be mapped to a node of the tree, whose label and
number are in the column nodeLab and nodeNum, respectively. The column
isLeaf gives information whether a node is a leaf node. The column rowID
contains the corresponding row number in assays
.
If the labels of nodes are available and unique on the tree, the link data would have 4 columns including nodeLab, nodeNum, isTip, and rowID; otherwise, it would have one additonal column nodeLab_alias.
As shown in the Figure \@ref(fig:toyTREE), the tree we have used has no labels (orange text) for internal nodes.
The tree structure could be accessed using treeData()
. It is a phylo
object.
treeD <- treeData(tse) class(treeD)
The figure of the tree structure is shown in the Figure \@ref(fig:toyTREE). The node label is given in orange text, and the node number is in blue.
ggtree(treeD) + geom_text2(aes(label = label), color = "darkorange", hjust = -0.1, vjust = -0.7, size = 6) + geom_text2(aes(label = node), color = "darkblue", hjust = -0.5, vjust = 0.7, size = 5)
A treeSummarizedExperiment
object could be constructed using the
function treeSummarizedExperiment
.
tseA <- treeSummarizedExperiment(tree = treeD, assays = list(aData), rowData = rData, colData = cData, metadata = mData)
If the provided tree has a unique label for each node, we could also construct a
treeSummarizedExperiment
without providing the link data.
We recreate a tree treeN based on the old tree by adding labels to the internal nodes.
treeN <- treeD treeN$node.label <- paste("Node_", 6:9, sep = "")
In Figure \@ref(fig:newTREE), we see that all nodes have different labels in orange text.
# plot ggtree(treeN) + geom_text2(aes(label = label), color = "darkorange", hjust = -0.05, vjust = -0.4, size = 6) + geom_text2(aes(label = node), color = "darkblue", hjust = -0.5, vjust = 0.7, size = 5)
We would need to provide the mapping information between rows and nodes of the
tree if the link data is not given in the creation of a
treeSummarizedExperiment
object. This could be done by adding a nodeLab
column to the row data. Users need to make sure they have assigned a correct
label to a row.
# give a column nodeLab in row data # the linkData is created automatically lab <- ifelse(is.na(linkD$nodeLab), linkD$nodeLab_alias, linkD$nodeLab) lab (nData <- cbind(nodeLab = lab, rData))
Create tseB without providing the link data.
tseB <- treeSummarizedExperiment(tree = treeN, assays = list(aData), rowData = nData, colData = cData, metadata = mData)
The link data has been generated automatically and has only four columns as described in Section \@ref(sec:linkData).
linkData(tseB)
As users like, data on the level of leaf nodes could be organized as a
leafSummarizedExperiment
object (lseC) or a treeSummarizedExperiment
object (tseC) as below.
lseC <- leafSummarizedExperiment(assays = list(toyTable), rowData = rowInf, colData = colInf, tree = toyTree)
tseC <- treeSummarizedExperiment(assays = list(toyTable), rowData = rowInf, colData = colInf, tree = toyTree)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.