library(knitr) opts_chunk$set(warning=FALSE, message=FALSE) BiocStyle::markdown()
From the Human Cell Atlas (HCA) website:
The cell is the core unit of the human body—the key to understanding the biology of health and the ways in which molecular dysfunction leads to disease. Yet our characterization of the hundreds of types and subtypes of cells in the human body is limited, based partly on techniques that have limited resolution and classifications that do not always map neatly to each other. Genomics has offered a systematic approach, but it has largely been applied in bulk to many cell types at once—masking critical differences between cells—and in isolation from other valuable sources of data.
Recent advances in single-cell genomic analysis of cells and tissues have put systematic, high-resolution and comprehensive reference maps of all human cells within reach. In other words, we can now realistically envision a human cell atlas to serve as a basis for both understanding human health and diagnosing, monitoring, and treating disease.
At its core, a cell atlas would be a collection of cellular reference maps, characterizing each of the thousands of cell types in the human body and where they are found. It would be an extremely valuable resource to empower the global research community to systematically study the biological changes associated with different diseases, understand where genes associated with disease are active in our bodies, analyze the molecular mechanisms that govern the production and activity of different cell types, and sort out how different cell types combine and work together to form tissues.
The Human Cell Atlas facilitates queries on it's [data coordination platform with a RESTFUL API] (https://dss.data.humancellatlas.org/).
To install this package, use Bioconductor's BiocManager
package.
if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install('HCAExplorer')
library(HCAExplorer)
One of the primary tasks of the HCAExplorer is to obtain metadata files of projects and then pass them down to other pipelines to download useful information like expression matrices. To illustrate the functionality of the package, we will first embark on the task of obtaining expression matrices from a selection of projects. We will first, initiate an HCAExplorer object, then look at functions useful for navigating the HCAExplorer object, and finally we will download the manifest file and use it to obtain expression matrices as a LoomExperiment object.
The r Biocpkg("HCAExplorer")
package relies on having network
connectivety. Also, the a link to a viable digest of the Human Cell Atlas must
also be operational. The backend that we are using will be using is refered to
as the "azul backend". This package is meant to mirror the functionality of the
HCA Data Explorer.
The HCAExplorer
object serves as the representation of the Human Cell
Atlas. Upon creation, it will automatically perform a cursorary query and
display a small table showing the first few project of the entire HCA. This
intial table contains some columns that we have determined are most useful
to users. The output also displays the url of the instance of the HCA digest
being used, the current query, relevant information about the quantity of data
being displayed, and finally the table of projects.
By default, 15 entries per page will be displayed in the result and the default url to the HCA DCP will be used. These two values can be changed in the constructor or later on using methods.
If the HCA cannot be reached, an error will be thrown displaying the status of the request.
hca <- HCAExplorer(url = 'https://service.explore.data.humancellatlas.org', per_page = 15) hca
Upon displaying the object, multiple fields can be seen:
- The class: HCAExplorer
- The azul-backend address that is currently being used.
- The current query.
- The projects being shown and whether a link
to more results is available.
- The number of projects being shown per_page.
- The results tibble
of the query. (This table is abbreviated to show only
columns that we determined are most useful to the user.)
The results tibble
can be obtained using the results()
method.
results(hca)
There are various columns that can be displayed in an HCAExplorer object. By
default, only a few columns are shown. We can change which columns are shown
by using select
. For example, the following will only show
projects.projectTitle
and samples.organ
columns when the object as shown.
hca <- hca %>% select('projects.projectTitle', 'samples.organ') hca
The original selection can be restored with resetSelect()
hca <- resetSelect(hca) hca
To toggle whether projects, samples, or file are being displayed in the
tibble
, the activate()
method can be used to choose which to display.
## The HCAExplorer object is activated here by 'samples' hca <- hca %>% activate('samples') hca ## Revert back to showing projects with 'projects' hca <- hca %>% activate('projects') hca
Looking at the bottom of the output, it can be that there are more pages of
results to be shown. The next set of entries can be obtained using the
nextResults
method.
hca <- nextResults(hca) hca
Once the HCAExplorer object is made, one can beging browsing the data present in the Human Cell Atlas.
Suppose we would like to search projects that have samples taken from a
particular organ. First, it is helpdul to understand which fields are available
to query upon. To do this, use the fields()
method.
hca <- HCAExplorer() fields(hca)
This function return all possible fields that can be queried upon. We can now
see that their is a field named "organ". Since, we are looking at what
values are avaiable for querying on organs, we can now use the values()
method to do just that.
values(hca, 'organ')
We can now see all possible values of 'organ' across all project as well as their frequency. Let's now decide that we would like to see projects that involve either blood or brain samples. The next step is to perform the query.
The HCAExplorer extends the functionality of the r CRANpkg("dplyr")
package's
filter()
and select()
methods.
The filter()
method allows the user to query the Human Cell Atlas by relating
fields to certain values. Character fields can be queried using the operators:
- ==
- %in%
Combination operators can be used to combine queries
- &
We can use either the ==
or %in%
operator in a filter statement to contruct
a query.
hca2 <- hca %>% filter(organ == c('blood', 'brain')) hca <- hca %>% filter(organ %in% c('blood', 'brain')) hca
Suppose we also wish to also search for results based on the disease.
We already know a "disease" field exists from our field()
function. Now we
can see what disease values are present in our current results.
values(hca, 'disease')
These are the possible values only for the results of our previous search.
Now suppose we would like to search for project only that have samples with no
disease (we see through values()
that this is labeled as "normal"). We can
now accomplish this with any of the following searchs. To show multiple
searches, we will also use the methods undoQuery()
and resetQuery()
to step
reset our search. undoQuery()
can step back one or many queries.
resetQuery()
undos all queries.
hca <- hca %>% filter(disease == 'normal') hca <- undoQuery(hca, n = 2L) hca <- hca %>% filter(organ %in% c('Brain', 'brain'), disease == 'normal') hca <- resetQuery(hca) hca <- hca %>% filter(organ %in% c('Brain', 'brain') & disease == 'normal') hca
We can refine out search further by using subsetting to only include
a few results. Here, the [
symbol can be used to select paricular rows by
either index or project name. These selections are added to our search as a
query against the "projectId". Here we take the first two results from our
HCAExplorer object.
hca <- hca[1:2,] hca
Now that we have completed our query, we can obtain the file manifest of our
selected projects. First, we must find which possible file formats are
available for download. To do this, we use the getManifestFileFormats()
.
formats <- getManifestFileFormats(hca) formats
Now that we have the possible file formats, we can download the manifest
as a tibble
. To do this, we use the getManifest()
method.
manifest <- getManifest(hca, fileFormat = formats[1]) manifest
HCAExplorer is able to download expression matrices availiable on the HCA Data Portal site. These are precomputed matrices and the HCAMatrixBrowser package should be used if the user wants to generate their own matrices.
The checkExpressionMatricesAvailability()
method returns a tibble displaying
whether the projects in the HCAExplorer
object are available for download.
hca <- HCAExplorer() checkExpressionMatricesAvailability(hca, format = "loom")
The downloadExpressionMatrices()
method downloads the expression matrices
and returns them as a certain format.
- If format
is "loom"
, a list of LoomExperiment
s objects will be returned.
- If format
is "csv"
, a list of tibble
s objects will be returned.
- If format
is "mtx"
, a list of SingleCellExperiment
s objects will be returned.
Some entries may contain multiple organism
s for download, usually either
"Homo sapiens"
or "Mus musculus"
. If the organism
argument is not
specified, all tables will attempt to be downloaded.
By default, expression matrices will be saved using BiocFileCache
to mantain a
persistent copy of the file between sessions, as specified by the
useBiocFileCache
argument. If useBiocFileCache = FALSE
, a temporary copy of
the expression matrices will be saved. Although using BiocFileCache is
recommeneded, we specify useBiocFileCache = FALSE
here so that this example
does not create a persistent copy of the file.
## Create HCAExplorer object hca <- HCAExplorer() ## Obtain the fifth project by subsetting hca <- hca[5] ## Download project's expression matrix file as a LoomExperiment object le <- downloadExpressionMatrices(hca, format = "loom", useBiocFileCache = FALSE) le
sessionInfo()
S3
object-oriented programming paradigm is used.dplyr
package can be used to manipulate objects in the
HCAExplorer
package.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.