HDF Server "extends the HDF5 data model to efficiently store large data objects (e.g. up to multi-TB data arrays) and access them over the web using a RESTful API." In this package, several data structures are introduced
We maintain, thanks to a grant from the National Cancer Institute, the server http://h5s.channingremotedata.org:5000/. Visit this URL to get a flavor of the server structure: datasets, groups, and datatypes are high-level elements to be manipulated to work with data values from the server.
A key application of the rhdf5client package is support for the
r BiocStyle::Biocpkg("restfulSE")
package that defines an interface between
the SummarizedExperiment class and the HDF5 Server. The server
provides content for assay()
requests to RESTfulSummarizedExperiment
instances.
Extensive human and computational effort is expended on downloading and managing large genomic data at site of analysis. Interoperable formats that are accessible via generic operations like those in RESTful APIs may help to improve cost-effectiveness of genome-scale analyses.
In this report we examine the use of HDF5 server as a back end for assay data.
A modest server configured to deliver HDF5 content via a RESTful API has been prepared and is used in this vignette.
We want to provide rapid read-only access to array-like data. To do this, the hierarchy and additional formalities of the HDF5 server data architecture are exposed through R functions and related classes. Full details on the HDF5 server are available at the HDFgroup site.
suppressPackageStartupMessages({ library(rhdf5client) })
The dsmeta function returns top-level groups and datasets.
library(rhdf5client) bigec2 = H5S_source(URL_h5serv()) bigec2 dsmeta(bigec2)[1:2,] # two groups dsmeta(bigec2)[1,2][[1]] # all dataset candidates in group 1
Given the URL of a server running HDF5 server, we create
an instance of H5S_source
:
mys = H5S_source(serverURL=URL_h5serv()) mys
The server identifies a collection of 'groups'. For the server we are working with, only one group, at the root, is of interest.
groups(mys)
There is a class to hold the link set for any group:
lks = links(mys,1) lks
We use double-bracket subscripting to grab a reference to a dataset from an H5S source. A dataset must be two-dimensional, i.e., accessible with two subscripts.
dta = bigec2[["tenx_100k_sorted"]] dta
Data are accessed by subscripting. The subscripts are arrays of increasing, positive integers.
x = dta[ 15:20, 1905:1906 ] x
(Obsolete) The subscripts are colon-delimited character with initial, final and optional stride.
x = dta["15:20", "1904:1906"] x
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.