In version 2.0.0 of clusterExperiment
we have made some major changes to the package. The most relevant changes for the user are:
These changes are fully detailed in the NEWS file of the package (all releases since October 31, 2017).
Below we will explain the first two of these changes in more detail. We would also note that in Bioconductor 3.6 (the release before) there were many more plotting functions that were introduced.
This version consists of a major update of how dimensionality reduction and filtering is done. The ClusterExperiment
class has been updated to extend the new bioconductor class, SingleCellExperiment
(see the SingleCellExperiment package).
Briefly, the SingleCellExperiment
class extends the SummarizedExperiment
class to give a structure for saving the results of dimensionality reductions. This is done by adding a slot reducedDims
, which is a SimpleList
of datasets that have the same number of observations but reduced dimensions (i.e. genes). This gives a unified slot for saving the results of applying a dimensionality reduction method and helper functions to access them, etc. The user gives them names, e.g. "PCA" or "tSNE".
Dimensionality Reduction clusterExperiment
now makes use these stored dimensionality reductions in functions like RSEC
and clusterMany
. This allows clusterExperiment
to make use of any dimensionality reduction method so long as the user saves it in the appropriate slot in a SingleCellExperiment
object. The user can also choose not to calculate any dimensionality reduction and just do as before and have a function like clusterMany
do the dimensionality reduction (i.e. PCA) internally. The difference is that now the results of the PCA will be stored in the appropriate slot so that they will not need to be recalculated in the future.
Filtering of genes We also added in clusterExperiment
package a similar option for filtering statistics. In particular, in clusterExperiment
we have always allowed the user -- instead of dimensionality reduction via PCA-like methods -- to instead reduce the dimensionality of the problem by filtering to the top set of genes, e.g. the top 500 most variable genes. In this case clusterExperiment
will calculate the variance for every gene and reduce down to the top 500 genes. Now clusterExperiment
when calculating statistics (like var
or mad
) will add the per-gene value of the statistic in a column of the rowData
of the ClusterExperiment
object (rowData
is a standard slot of a SummarizedExperiment
). Similarly, if the user has already calculated a per-gene statistic and saved it as a column in the rowData
slot, this user-defined statistic can be used for filtering. This means that the user is not limited to the built-in functions provided in clusterExperiment
.
Note that simplicity we call "dimensionality reduction methods" to be those like PCA that reduce the data in a way that is not a simple selection of existing variables, but rather create new variables to represent the data. This is because a simple selection of variables can be stored as a single vector of the length of the number of genes and the reduced data can be obtained from the original matrix. The more complicated methods actually have to save a matrix with a value for each observation for each new variable.
A great deal has changed under the hood of any function that allowed for dimensionality reduction and greatly simplified and unified our treatment of filtering and dimensionality reduction. The main function affected is clusterMany
, which runs the clustering, but makeDendrogram
is another example (and of course RSEC
which is a wrapper around these).
To make the function compatible with SingleCellExperiment
, we have changed many of the names of our arguments related to dimensionality reduction. This is because the slot names and related functions of SingleCellExperiment
take the form of reducedDims
and our previous versions of clusterExperiment
used dimReduce
format instead. We also changed to the name of arguments to be less tied to 'PCA' and 'var':
nPCADims
changed to nReducedDims
in clusterMany-related functionsnVarDims
changed to nFilterDims
in clusterMany-related functionsdimReduce
changed to reduceMethod
across functionsplotDimReduce
to plotReducedDims
ndims
to nDims
in clusterSingle
and makeDendrogram
to keep consistency.The package clusterExperiment
has built in functions for both dimensionality reduction (right now only PCA) and for filtering. These can be obtained by the new functions listBuiltInReducedDims
and listBuiltInFilterStats
, which give a character vector of the names of currently available functions for dimensionality reduction and statistics for filtering genes, respectively.
The argument reduceMethod
, like the previous argument dimReduce
, defines either the dimensionality reduction method or the filtering method to be used to reduce the number of dimensions of the data that will be used (and for clusterMany
this can be a mixture of the two if the user wants to compare them). In places where multiple values can be given (i.e. clusterMany
or RSEC
) the user has the choice to give to reduceMethod
either
clusterExperiment
to be calculated internally (as given by listBuiltInReducedDims
and listBuiltInFilterStats
)The user cannot do both (i.e. give reducedMethod
2 names that match user-defined stored values and 3 names that are built-in functions). To do this the user can call the new functions makeReducedDims
and makeFilterStats
that will apply the built-in method (and store them appropriately) the built-in methods for dimensionality reductions and statistics for filtering the data, respectively. These are indeed the functions called internally by functions like clusterMany
. In the above example, after calling these functions for the 3 built-in functions, the user can then call clusterMany
on all 5 of the names that are stored values (the 2 user defined, and the 3 that were created by the built-in functions).
The clusterExperiment
packages is now compatible with SummarizedExperiment
objects that have DelayedArray
classes in their assay
slot (which includes HDF5Matrix
and DelayedMatrix
). These are classes from the DelayedArray
and HDF5Array
packages that allow the assay to be stored on file rather than in memory.
Note, however, that while the package allows for these objects, it doesn't mean that it makes use of the HDF5 structure. Many times if the assay
object must be actually used beyond simple subsetting, it will call the entire matrix into memory for the computations. The advantages currently of having the full dataset in HDF5 format are:
reducedDims
slot, since subsetting of the HDF5 matrix doesn't call the entire matrix into memory. Furthermore, some (but not all) of our built-in filtering statistics are HDF5 aware, and do not call the entire matrix into memory to calculate the statistics used for filtering.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.