Generating HCAMatrix queries with the API

Installation

Pull the development branch from Bioconductor/HCAMatrixBrowser

if (!requireNamespace("BiocManager", quietly = TRUE)
    install.packages("BiocManager")

BiocManager::install("HCAMatrixBrowser")

Load packages

library(HCAMatrixBrowser)
library(rapiclient)
library(AnVIL)

Create an HCA api object using the rapiclient package:

(hca <- HCAMatrix())

Schemas

Check that the schemas are available as found in the api configuration file: Note. Due to downgrade from Open API Specification (OAS) version 3 to OAS version 2 (formerly known as Swagger), the full list of schemas may not be shown.

schemas(hca)

Exploring the endpoints

Here we make available a number of helper functions that allow the user to explore the possible queries supported by the HCA Matrix Service data model.

Filters

available_filters(hca)

We can get details for a particular filter:

filter_detail(hca, "genes_detected")

Formats

We can obtain the available formats for any request by doing:

available_formats(hca)

We can get additional details (via HTML pop-up page):

format_detail(hca, "mtx")

Features

The matrix service provides two types of features (i.e., genes and transcripts):

available_features(hca)

and a short description of a selected feature:

feature_detail(hca, "gene")

Request generation from schema

Using the rapiclient package, we can obtain schema or models for queries:

bundle_fqids <-
    c("980b814a-b493-4611-8157-c0193590e7d9.2018-11-12T131442.994059Z",
    "7243c745-606d-4827-9fea-65a925d5ab98.2018-11-07T002256.653887Z")
req <- schemas(hca)$v0_MatrixRequest(
    bundle_fqids = bundle_fqids, format = "loom"
)
req

For requests such as these, it is better to use the API layer provided by HCAMatrixBrowser (see examples below).

Python example

The HCA group has provided example JSON requests written in python:

The published notebook is available here: https://github.com/HumanCellAtlas/matrix-service/blob/master/docs/HCA%20Matrix%20Service%20to%20Scanpy.ipynb

The following JSON request applies a couple of filters on the project short name and the number of genes detected.

The provided example request looks like:

jsonlite::fromJSON(
    txt = '{"filter": {"op": "and", "value": [ {"op": "=", "value": "Single cell transcriptome analysis of human pancreas", "field": "project.project_core.project_short_name"}, {"op": ">=", "value": 300, "field": "genes_detected"} ] }}'
)

HCAMatrixBrowser example

Advanced usage note. We are interested in using the following endpoint:

hca$matrix.lambdas.api.v1.core.post_matrix

The HCAMatrixBrowser allows filtering using a non-standard evaluation method typical of the 'tidyverse'. Here we recreate the filters provided in the example above.

hcafilt <- filter(hca,
    project.project_core.project_short_name ==
        "Single cell transcriptome analysis of human pancreas" &
        genes_detected >= 300)

We can subsequently view the generated filter from the operation:

filters(hcafilt)

Query send-off

In order to send the query with the appropriate filters, we simply use the provided loadHCAMatrix function along with the API object that contains the filter structure. See the following section for more information.

Data representations

The matrix service allows you to request three different file formats:

  1. loom (default)
  2. mtx
  3. csv

These can be requested using the format argument in the loadHCAMatrix function:

loom

The loom format is supported by the LoomExperiment package in Bioconductor.

(loomex <- loadHCAMatrix(hcafilt, format = "loom"))

mtx

For the mtx format, we represent the data as a SingleCellExperiment class:

(mtmat <- loadHCAMatrix(hcafilt, format = "mtx"))

csv

The csv format option gives the user the option to obtain the data as a list of tibble data.frames from the output of the readr package in the 'tidyverse'.

Note. Loading multiple CSV files may take considerable time depending on system configuration.

(tib <- loadHCAMatrix(hcafilt, format = "csv"))

Advanced Usage

To use a lower level API endpoint, we can use the API object to select a particular endpoint. This will usually show you a description of the endpoint and the parameters for the query.

hca$matrix.lambdas.api.v0.core.post_matrix

In the above example, there are three parameters required:



Try the HCAMatrixBrowser package in your browser

Any scripts or data that you put into this service are public.

HCAMatrixBrowser documentation built on Nov. 8, 2020, 5:15 p.m.