writeTENxMatrix: Write a matrix-like object as an HDF5-based sparse matrix

View source: R/writeTENxMatrix.R

writeTENxMatrixR Documentation

Write a matrix-like object as an HDF5-based sparse matrix

Description

The 1.3 Million Brain Cell Dataset and other datasets published by 10x Genomics use an HDF5-based sparse matrix representation instead of the conventional (a.k.a. dense) HDF5 representation.

writeTENxMatrix writes a matrix-like object to this format.

IMPORTANT NOTE: Only use writeTENxMatrix if the matrix-like object to write is sparse, that is, if most of its elements are zero. Using writeTENxMatrix on dense data is very inefficient! In this case, you should use writeHDF5Array instead.

Usage

writeTENxMatrix(x, filepath=NULL, group=NULL, level=NULL, verbose=NA)

Arguments

x

The matrix-like object to write to an HDF5 file.

The object to write should typically be sparse, that is, most of its elements should be zero.

If x is a DelayedMatrix object, writeTENxMatrix realizes it on disk, that is, all the delayed operations carried by the object are executed while the object is written to disk.

filepath

NULL or the path (as a single string) to the (new or existing) HDF5 file where to write the data. If NULL, then the data will be written to the current HDF5 dump file i.e. to the file whose path is getHDF5DumpFile.

group

NULL or the name of the HDF5 group where to write the data. If NULL, then the name returned by getHDF5DumpName will be used.

level

The compression level to use for writing the data to disk. By default, getHDF5DumpCompressionLevel() will be used. See ?getHDF5DumpCompressionLevel for more information.

verbose

Whether block processing progress should be displayed or not. If set to NA (the default), verbosity is controlled by DelayedArray:::get_verbose_block_processing(). Setting verbose to TRUE or FALSE overrides this.

Details

Please note that, depending on the size of the data to write to disk and the performance of the disk, writeTENxMatrix can take a long time to complete. Use verbose=TRUE to see its progress.

Use setHDF5DumpFile and setHDF5DumpName to control the location of automatically created HDF5 datasets.

Value

A TENxMatrix object pointing to the newly written HDF5 data on disk.

See Also

  • TENxMatrix objects.

  • The TENxBrainData dataset (in the TENxBrainData package).

  • HDF5-dump-management to control the location and physical properties of automatically created HDF5 datasets.

  • h5ls to list the content of an HDF5 file.

Examples

## ---------------------------------------------------------------------
## A SIMPLE EXAMPLE
## ---------------------------------------------------------------------
m0 <- matrix(0L, nrow=25, ncol=12,
             dimnames=list(letters[1:25], LETTERS[1:12]))
m0[cbind(2:24, c(12:1, 2:12))] <- 100L + sample(55L, 23, replace=TRUE)
out_file <- tempfile()
M0 <- writeTENxMatrix(m0, out_file, group="m0")
M0
sparsity(M0)

path(M0)  # same as 'out_file'

## Use h5ls() to list the content of this HDF5 file:
h5ls(path(M0))

## ---------------------------------------------------------------------
## USING THE "1.3 Million Brain Cell Dataset"
## ---------------------------------------------------------------------

## The 1.3 Million Brain Cell Dataset from 10x Genomics is available via
## ExperimentHub:
library(ExperimentHub)
hub <- ExperimentHub()
query(hub, "TENxBrainData")
fname <- hub[["EH1039"]]
oneM <- TENxMatrix(fname, group="mm10")  # see ?TENxMatrix for the details
oneM

## Note that the following transformation preserves sparsity:
M2 <- log(oneM + 1)  # delayed
M2                   # a DelayedMatrix instance

## In order to reduce computation times, we'll write only the first
## 5000 columns of M2 to disk:
out_file <- tempfile()
M3 <- writeTENxMatrix(M2[ , 1:5000], out_file, group="mm10", verbose=TRUE)
M3                   # a TENxMatrix instance

Bioconductor/HDF5Array documentation built on Jan. 16, 2025, 10:19 a.m.