DelayedDataFrame class

library(DelayedDataFrame)
(ddf <- DelayedDataFrame())
lazyIndex(ddf)

Constructor:

Each argument in "..." is coerced to a DataFrame and combined column-wise.

DelayedDataFrame(..., row.names=FALSE, check.names=TRUE)
(obj <- DelayedDataFrame(letters, LETTERS, row.names=LETTERS))

lazyIndex(obj)

On-disk data representation in DataFrame format

CoreArray Genomic Data Structure (GDS) is designed for large-scale datasets (for available random-access memory). The Bioconductor package gdsfmt has provided a high-level R interface to GDS.


file <- SeqArray::seqExampleFileName("gds")
f <- gdsfmt::openfn.gds(file)
f
closefn.gds(f)

GDSArray is an R and Bioconductor package, that represents GDS files as DelayedArray instances.

library(GDSArray)
gdsnodes(file)

Use GDSArray to represent the GDS nodes for variant annotation.

varid <- GDSArray(file, "annotation/id")
AA <- GDSArray(file, "annotation/info/AA")
varid

seed(varid)

Construct a DelayedDataFrame object with GDSArray columns.

(ddf <- DelayedDataFrame(varid, AA))

subsetting

(ddf1 <- ddf[1:20,])
identical(ddf@listData, ddf1@listData)

lazyIndex(ddf1)
nrow(ddf1)

lazyIndex realization

as(ddf1, "DataFrame")

Availability

The development version is available to download through github:

devtools::install_github("Bioconductor/DelayedDataFrame")


Bioconductor/DelayedDataFrame documentation built on Nov. 2, 2024, 7:21 a.m.