Home

/

hpar

/

Human Protein Atlas in R
In hpar: Human Protein Atlas in R

suppressPackageStartupMessages(library("BiocStyle"))
suppressPackageStartupMessages(library("org.Hs.eg.db"))
suppressPackageStartupMessages(library("GO.db"))

Introduction

The HPA project

From the Human Protein Atlas [@Uhlen2005; @Uhlen2010] site:

The Swedish Human Protein Atlas project, funded by the Knut and Alice Wallenberg Foundation, has been set up to allow for a systematic exploration of the human proteome using Antibody-Based Proteomics. This is accomplished by combining high-throughput generation of affinity-purified antibodies with protein profiling in a multitude of tissues and cells assembled in tissue microarrays. Confocal microscopy analysis using human cell lines is performed for more detailed protein localisation. The program hosts the Human Protein Atlas portal with expression profiles of human proteins in tissues and cells.

The r Biocpkg("hpar") package provides access to HPA data from the R interface. It also distributes the following data sets:

hpaNormalTissue Normal tissue data: Expression profiles for proteins in human tissues based on immunohistochemisty using tissue micro arrays. The tab-separated file includes Ensembl gene identifier ("Gene"), tissue name ("Tissue"), annotated cell type ("Cell type"), expression value ("Level"), and the gene reliability of the expression value ("Reliability").}
hpaNormalTissue16.1: Same as above, for version 16.1.
hpaCancer Pathology data: Staining profiles for proteins in human tumor tissue based on immunohistochemisty using tissue micro arrays and log-rank P value for Kaplan-Meier analysis of correlation between mRNA expression level and patient survival. The tab-separated file includes Ensembl gene identifier ("Gene"), gene name ("Gene name"), tumor name ("Cancer"), the number of patients annotated for different staining levels ("High", "Medium", "Low" & "Not detected") and log-rank p values for patient survival and mRNA correlation ("prognostic - favourable", "unprognostic - favourable", "prognostic - unfavourable", "unprognostic - unfavourable"). }
hpaCancer16.1: Same as above, for version 16.1.
rnaGeneTissue RNA HPA tissue gene data: Transcript expression levels summarized per gene in 37 tissues based on RNA-seq. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Tissue"), transcripts per million ("TPM"), protein-transcripts per million ("pTPM") and normalized expression ("NX"). }
rnaGeneCellLine RNA HPA cell line gene data: Transcript expression levels summarized per gene in 64 cell lines. The tab-separated file includes Ensembl gene identifier ("Gene"), analysed sample ("Cell line"), transcripts per million ("TPM"), protein-coding transcripts per million ("pTPM") and normalized expression ("NX"). }
rnaGeneCellLine16.1: Same as above, for version 16.1.
hpaSubcellularLoc Subcellular location data: Subcellular location of proteins based on immunofluorescently stained cells. The tab-separated file includes the following columns: Ensembl gene identifier ("Gene"), name of gene ("Gene name"), gene reliability score ("Reliability"), enhanced locations ("Enhanced"), supported locations ("Supported"), Approved locations ("Approved"), uncertain locations ("Uncertain"), locations with single-cell variation in intensity ("Single-cell variation intensity"), locations with spatial single-cell variation ("Single-cell variation spatial"), locations with observed cell cycle dependency (type can be one or more of biological definition, custom data or correlation) ("Cell cycle dependency"), Gene Ontology Cellular Component term identifier ("GO id").}
hpaSubcellularLoc14 and *16.1: Same as above, for versions 14 and 16.1.
hpaSecretome Secretome data: The human secretome is here defined as all Ensembl genes with at least one predicted secreted transcript according to HPA predictions. The complete information about the HPA Secretomedata is given on \url{https://www.proteinatlas.org/humanproteome/blood/secretome}. This dataset has 230 columns and includes the Ensembl gene identifier ("Gene"). Information about the additionnal variables can be found \href{https://www.proteinatlas.org/search}{here} by clicking on \emph{Show/hide columns}.

HPA data usage policy

The use of data and images from the HPA in publications and presentations is permitted provided that the following conditions are met:

The publication and/or presentation are solely for informational and non-commercial purposes.
The source of the data and/or image is referred to the HPA site^[www.proteinatlas.org] and/or one or more of our publications are cited.

Installation

r Biocpkg("hpar") is available through the Bioconductor project. Details about the package and the installation procedure can be found on its landing page. To install using the dedicated Bioconductor infrastructure, run :

## install BiocManager only one
install.packages("BiocManager")
## install hpar
BiocManager::install("hpar")

After installation, r Biocpkg("hpar") will have to be explicitly loaded with

library("hpar")

so that all the package's functionality and data is available to the user.

The `r Biocpkg("hpar")` package

Data sets

The data sets described above can be loaded with the data function, as illustrated below for hpaNormalTissue below. Each data set is a data.frame and can be easily manipulated using standard R functionality. The code chunk below illustrates some of its properties.

data(hpaNormalTissue)
dim(hpaNormalTissue)
names(hpaNormalTissue)
## Number of genes
length(unique(hpaNormalTissue$Gene))
## Number of cell types
length(unique(hpaNormalTissue$Cell.type))
head(levels(hpaNormalTissue$Cell.type))
## Number of tissues
length(unique(hpaNormalTissue$Tissue))
head(levels(hpaNormalTissue$Tissue))

HPA interface

The package provides a interface to the HPA data. The getHpa allows to query the data sets described above. It takes three arguments, id, hpadata and type, that control the query, what data set to interrogate and how to report results respectively. The HPA data uses Ensembl gene identifiers and id must be a valid identifier. hpadata must be one of available dataset. type can be either "data" or "details". The former is the default and returns a data.frame containing the information relevant to id. It is also possible to obtained detailed information, (including cell images) as web pages, directly from the HPA web page, using "details".

We will illustrate this functionality with using the TSPAN6 (tetraspanin 6) gene (ENSG00000000003) as example.

id <- "ENSG00000000003"
head(getHpa(id, hpadata = "hpaNormalTissue"))
getHpa(id, hpadata = "hpaSubcellularLoc")
head(getHpa(id, hpadata = "rnaGeneCellLine"))

If we ask for "detail", a browser page pointing to the relevant page is open (see figure below)

getHpa(id, type = "details")

The HPA web page for the tetraspanin 6 gene (ENSG00000000003).

If a user is interested specifically in one data set, it is possible to set hpadata globally and omit it in getHpa. This is done by setting the hpar options hpardata with the setHparOptions function. The current default data set can be tested with getHparOptions.

getHparOptions()
setHparOptions(hpadata = "hpaSubcellularLoc")
getHparOptions()
getHpa(id)

HPA release information

Information about the HPA release used to build the installed

r Biocpkg("hpar") package can be accessed with getHpaVersion, getHpaDate and getHpaEnsembl. Full release details can be found on the HPA release history page.

getHpaVersion()
getHpaDate()
getHpaEnsembl()

A small use case

Let's compare the subcellular localisation annotation obtained from the HPA subcellular location data set and the information available in the Bioconductor annotation packages.

id <- "ENSG00000001460"
getHpa(id, "hpaSubcellularLoc")

Below, we first extract all cellular component GO terms available for id from the r Biocannopkg("org.Hs.eg.db") human annotation and then retrieve their term definitions using the r Biocannopkg("GO.db") database.

library("org.Hs.eg.db")
library("GO.db")
ans <- select(org.Hs.eg.db, keys = id,
              columns = c("ENSEMBL", "GO", "ONTOLOGY"),
              keytype = "ENSEMBL")
ans <- ans[ans$ONTOLOGY == "CC", ]
ans
sapply(as.list(GOTERM[ans$GO]), slot, "Term")

Session information {-}

sessionInfo()

Any scripts or data that you put into this service are public.

hpar documentation built on Nov. 8, 2020, 8:32 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

hpar
Human Protein Atlas in R

Human Protein Atlas in R
In hpar: Human Protein Atlas in R

Introduction

The HPA project

HPA data usage policy

Installation

The `r Biocpkg("hpar")` package

Data sets

HPA interface

HPA release information

A small use case

Session information {-}

Try the hpar package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

hpar Human Protein Atlas in R

Human Protein Atlas in R In hpar: Human Protein Atlas in R

Introduction

The HPA project

HPA data usage policy

Installation

The r Biocpkg("hpar") package

Data sets

HPA interface

HPA release information

A small use case

Session information {-}

Try the hpar package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

hpar
Human Protein Atlas in R

Human Protein Atlas in R
In hpar: Human Protein Atlas in R

The `r Biocpkg("hpar")` package