README.md

DelayedArrays to HDF5

Overview

Save delayed operations to HDF5 using the chihaya specification. This extracts operations out of a DelayedArray and stores them in a HDF5 file, where they can be used to reconstitute the same DelayedArray in a new R session - or indeed, in a different analysis framework altogether. The idea is to save the operations, which is usually cheap; rather than the results of the operations, which may be expensive for large datasets or when sparsity is broken.

Quick start

If we make a DelayedArray with arbitrary operations:

library(DelayedArray)
x <- DelayedArray(matrix(runif(1000), ncol=10))
x <- x[11:15,] / runif(5) 
x <- log2(x + 1)
x
## <5 x 10> matrix of class DelayedMatrix and type "double":
##             [,1]        [,2]        [,3] ...       [,9]      [,10]
## [1,] 1.318228112 1.789374232 1.854133153   . 1.10085064 1.22825033
## [2,] 0.340258109 0.598988926 0.005719794   . 0.05900444 0.19562976
## [3,] 0.205758979 0.624928389 0.574661104   . 0.96990885 0.31573385
## [4,] 0.129171362 1.149253865 0.091821910   . 0.10878614 0.45618400
## [5,] 1.317402933 1.753933055 1.857993438   . 1.83012744 2.11469960

We can save it to file with the chihaya R package:

library(chihaya)
fpath <- tempfile(fileext=".h5")
saveDelayed(x, fpath, "my_delayed_array")
rhdf5::h5ls(fpath)
##                                     group             name       otype  dclass      dim
## 0                                       / my_delayed_array   H5I_GROUP
## 1                       /my_delayed_array             base H5I_DATASET   FLOAT    ( 0 )
## 2                       /my_delayed_array           method H5I_DATASET  STRING    ( 0 )
## 3                       /my_delayed_array             seed   H5I_GROUP
## 4                  /my_delayed_array/seed           method H5I_DATASET  STRING    ( 0 )
## 5                  /my_delayed_array/seed             seed   H5I_GROUP
## 6             /my_delayed_array/seed/seed            along H5I_DATASET INTEGER    ( 0 )
## 7             /my_delayed_array/seed/seed           method H5I_DATASET  STRING    ( 0 )
## 8             /my_delayed_array/seed/seed             seed   H5I_GROUP
## 9        /my_delayed_array/seed/seed/seed            index   H5I_GROUP
## 10 /my_delayed_array/seed/seed/seed/index                0 H5I_DATASET INTEGER        5
## 11       /my_delayed_array/seed/seed/seed             seed   H5I_GROUP
## 12  /my_delayed_array/seed/seed/seed/seed             data H5I_DATASET   FLOAT 100 x 10
## 13  /my_delayed_array/seed/seed/seed/seed           native H5I_DATASET INTEGER    ( 0 )
## 14            /my_delayed_array/seed/seed             side H5I_DATASET  STRING    ( 0 )
## 15            /my_delayed_array/seed/seed            value H5I_DATASET   FLOAT        5
## 16                 /my_delayed_array/seed             side H5I_DATASET  STRING    ( 0 )
## 17                 /my_delayed_array/seed            value H5I_DATASET   FLOAT    ( 0 )

And then reload it in a separate session:

y <- loadDelayed(fpath, "my_delayed_array")
y
## <5 x 10> matrix of class DelayedMatrix and type "double":
##             [,1]        [,2]        [,3] ...       [,9]      [,10]
## [1,] 1.318228112 1.789374232 1.854133153   . 1.10085064 1.22825033
## [2,] 0.340258109 0.598988926 0.005719794   . 0.05900444 0.19562976
## [3,] 0.205758979 0.624928389 0.574661104   . 0.96990885 0.31573385
## [4,] 0.129171362 1.149253865 0.091821910   . 0.10878614 0.45618400
## [5,] 1.317402933 1.753933055 1.857993438   . 1.83012744 2.11469960

The file at fpath follows the specification described here. This provides cross-language portability and ensures that the serialization process is robust to changes in the DelayedArray class structure.

Comments

Many of the basic operations in DelayedArray are supported. However, there are a few operations that are not described by the chihaya specification. An incomplete list is provided below:



LTLA/DelayedArraySaver documentation built on Sept. 19, 2024, 1:37 p.m.