title: "Working with Zarr archives" author: - name: Martin Morgan affiliation: Roswell Park Comprehensive Cancer Center, Buffalo, NY date: "2019-12-03" output: BiocStyle::html_document: toc: true toc_float: true package: ZarrExperiment vignette: > %\VignetteIndexEntry{Working with Zarr archives} %\VignetteEngine{knitr::rmarkdown} %\VignetteEndcoding{UTF-8}
The ZarrExperiment package is currently under construction. The
purpose of the package is to build an interface between
zarr(https://zarr.readthedocs.io/en/stable/) and the Bioconductor
SummarizedExperiment
family.
Install the package with
R -e "BiocManager::install('Bioconductor/ZarrExperiment')"
The ZarrExperiment
package uses the zarr module from
python. Configuring python currently requires a git clone.
git clone https://github.com/Bioconductor/ZarrExperiment
We suggest using a python virtual environment to configure python. There are different ways of establishing virtual environments.
venv
virtualenv
virtualenv
is a distinct program. Do these from the command line.
1) Install python3
and virtualenv
2) Create the virtual environment.
virtualenv -p python3 ~/.virtualenvs/Bioconductor
3) Activate this virtual environment.
source ~/.virtualenvs/Bioconductor/bin/activate
4) Install the required python packages from the file
python-requirements.txt
in the package's base directory using
pip.
pip install -r python-requirements.txt
When done, deactivate the virtual environment.
deactivate
6) Set the enviroment RETICULATE_PYTHON
variable to use the Bioconductor
virtual environment.
export RETICULATE_PYTHON=~/.virtualenvs/Bioconductor/bin/python
7) Test that zarr
can be imported with reticulate.
R -e "reticulate::import('zarr')"
Load libraries used in this vignette
library(SummarizedExperiment)
library(tibble)
library(ZarrExperiment)
Point to a sample archive. Archives are folders with a collection of files.
fl <- system.file(
package="ZarrExperiment", "extdata",
"stahl-2016-science-olfactory-bulb.matrix.zarr"
)
dir(fl, recursive = TRUE, all.files = TRUE)
## [1] ".zattrs" ".zgroup" "gene_name/.zarray"
## [4] "gene_name/.zattrs" "gene_name/0" "gene_name/1"
## [7] "gene_name/2" "gene_name/3" "matrix/.zarray"
## [10] "matrix/.zattrs" "matrix/0.0" "matrix/0.1"
## [13] "matrix/0.10" "matrix/0.11" "matrix/0.12"
## [16] "matrix/0.13" "matrix/0.14" "matrix/0.15"
## [19] "matrix/0.16" "matrix/0.2" "matrix/0.3"
## [22] "matrix/0.4" "matrix/0.5" "matrix/0.6"
## [25] "matrix/0.7" "matrix/0.8" "matrix/0.9"
## [28] "region_id/.zarray" "region_id/.zattrs" "region_id/0"
## [31] "x_region/.zarray" "x_region/.zattrs" "x_region/0"
## [34] "y_region/.zarray" "y_region/.zattrs" "y_region/0"
Load the archive and view, via the show()
or tree()
method, the
available groups (datasets) in the archive. These groups are analogous
to hdf5 datasets.
arr <- ZarrArchive(fl)
arr
## class: ZarrArchive
## resource: /Users/ma38727/Librar.../stahl-2016-science-olfactory-bulb.matrix.zarr
## /
## ├── gene_name (16573,) <U14
## ├── matrix (267, 16573) int64
## ├── region_id (267,) int64
## ├── x_region (267,) float64
## └── y_region (267,) float64
The archive allows $
subsetting, including tab completion. Access
the matrix
group member and coerce it to an R (dense) matrix.
m <- t(as(arr$matrix, "matrix"))
m[1:5, 1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 0 1 0 0
## [2,] 5 1 0 0 0
## [3,] 4 2 0 2 1
## [4,] 2 2 1 0 0
## [5,] 2 4 2 0 4
The components of this archive (archive format is completely general, so it is not possible to write a 'smart' import function) can be accessed...
rowData <- tibble(
gene_name = as(arr$gene_name, "matrix")
)
rowData
## # A tibble: 16,573 x 1
## gene_name
## <chr>
## 1 Nop58
## 2 Arl6ip4
## 3 Lix1
## 4 Chrm1
## 5 Nap1l1
## 6 Kat6a
## 7 Fam134c
## 8 Lrpprc
## 9 Srgap3
## 10 Slc1a3
## # … with 16,563 more rows
colData <- tibble(
region_id = as(arr$region_id, "matrix"),
x_region = as(arr$x_region, "matrix"),
y_region = as(arr$y_region, "matrix")
)
colData
## # A tibble: 267 x 3
## region_id x_region y_region
## <dbl> <dbl> <dbl>
## 1 0 4637. 2333.
## 2 1 4894. 2334.
## 3 2 5463. 2333.
## 4 3 5187. 2330.
## 5 4 5769. 2896.
## 6 5 5774. 2604.
## 7 6 5767. 3475.
## 8 7 5766. 3190.
## 9 8 5189. 2620.
## 10 9 5477. 2626.
## # … with 257 more rows
Form a SummarizedExperiment
from these components:
se <- SummarizedExperiment(
assays = list(count = m),
rowData = rowData,
colData = colData
)
se
## class: SummarizedExperiment
## dim: 16573 267
## metadata(0):
## assays(1): count
## rownames: NULL
## rowData names(1): gene_name
## colnames: NULL
## colData names(3): region_id x_region y_region
The SummarizedExperiment
object can then be used in standard R /
Bioconductor single-cell and other work flows.
sessionInfo()
## R Under development (unstable) (2019-12-01 r77489)
## Platform: x86_64-apple-darwin17.7.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
##
## Matrix products: default
## BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib
## LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ZarrExperiment_0.0.6 tibble_2.1.3
## [3] SummarizedExperiment_1.17.0 DelayedArray_0.13.0
## [5] BiocParallel_1.21.0 matrixStats_0.55.0
## [7] Biobase_2.47.1 GenomicRanges_1.39.1
## [9] GenomeInfoDb_1.23.0 IRanges_2.21.2
## [11] S4Vectors_0.25.1 BiocGenerics_0.33.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.3 compiler_4.0.0
## [3] pillar_1.4.2 XVector_0.27.0
## [5] bitops_1.0-6 tools_4.0.0
## [7] zlibbioc_1.33.0 SingleCellExperiment_1.9.0
## [9] digest_0.6.23 jsonlite_1.6
## [11] evaluate_0.14 lattice_0.20-38
## [13] pkgconfig_2.0.3 rlang_0.4.2
## [15] Matrix_1.2-18 xfun_0.11
## [17] GenomeInfoDbData_1.2.2 rtracklayer_1.47.0
## [19] stringr_1.4.0 knitr_1.26
## [21] Biostrings_2.55.2 grid_4.0.0
## [23] reticulate_1.13.0-9005 XML_3.98-1.20
## [25] magrittr_1.5 codetools_0.2-16
## [27] Rsamtools_2.3.2 GenomicAlignments_1.23.1
## [29] stringi_1.4.3 RCurl_1.95-4.12
## [31] crayon_1.3.4
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.