knitr::opts_chunk$set( comment = "##" )
We recommend installing the stable release version of PhIPData
in Bioconductor.
This can be done using BiocManager
:
if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("PhIPData")
To load the package:
library(PhIPData)
The PhIPData
class is used to store experimental results from phage-immunoprecipitation sequencing (PhIP-set) experiments in a matrix-like container.
Building on the RangedSummarizedExperiment
class, PhIPData
contains all of functionality of SummarizedExperiments
and includes additional operations to facilitate analysis with PhIP-seq data. Like SummarizedExperiments
, a key feature of PhIPData
is the coordination of metadata when subsetting PhIPData
objects. For example, if you wanted to examine experimental data for peptides from one particular virus, you can subset the experimental data and the associated peptide annotation with one command. This ensures all metadata (for samples, peptides, etc.) remain synced with the experimental data throughout analysis.
PhIPData
ObjectAs reflected in the figure below, the structure of a PhIPData
object is nearly identical to the structure of a SummarizedExperiment
/RangedSummarizedExperiment
object.
Each object contains at least three assays of data. These assays are:
counts
: matrix of raw read counts,logfc
: matrix of log2 estimated fold-changes (in comparison to negative control samples),prob
: matrix of probabilities (p-values or posterior probabilities) associated with whether a sample shows an enriched antibody response to the particular peptide. Though counts
typically contain integer values for the number of reads aligned to each peptide, PhIPData
only requires that stored values are non-negative numeric values. Pseudocounts or non-integer count values can also be stored in the counts
assay.
The rows of a PhIPData
object represent peptides of interest and the columns represent samples. Sample and peptide metadata are stored in DataFrame
s. Each row of the metadata DataFrame
specifies the peptide/sample, and the columns represent different features associated with the peptides/samples.
In addition to sample- and peptide-specific metadata, experimental metadata such as associated papers, experimental parameters, sequencing dates, etc. are stored in a list-like component named metadata
.
knitr::include_graphics("extras/PhIPData.png")
PhIPData
objectlibrary(dplyr) library(readr)
To demonstrate the PhIPData
class and functions, we will use a simulated example data set. Suppose we have PhIP-seq data from 5 individuals for 1288 peptides derived from known human viruses.
set.seed(20210120) # Read in peptide metadata ------------- virscan_file <- system.file("extdata", "virscan.tsv", package = "PhIPData") virscan_info <- readr::read_tsv(virscan_file, col_types = readr::cols( pep_id = readr::col_character(), pro_id = readr::col_character(), pos_start = readr::col_double(), pos_end = readr::col_double(), UniProt_acc = readr::col_character(), pep_dna = readr::col_character(), pep_aa = readr::col_character(), pro_len = readr::col_double(), taxon_id = readr::col_double(), species = readr::col_character(), genus = readr::col_character(), product = readr::col_character() )) %>% as.data.frame() # Simulate data ------------- n_samples <- 5L n_peps <- nrow(virscan_info) counts_dat <- matrix(sample(1:1e6, n_samples*n_peps, replace = TRUE), nrow = n_peps) logfc_dat <- matrix(rnorm(n_samples*n_peps, mean = 0, sd = 10), nrow = n_peps) prob_dat <- matrix(rbeta(n_samples*n_peps, shape1 = 1, shape2 = 1), nrow = n_peps) # Sample metadata ------------- sample_meta <- data.frame(sample_name = paste0("sample", 1:n_samples), gender = sample(c("M", "F"), n_samples, TRUE), group = sample(c("ctrl", "trt", "beads"), n_samples, TRUE)) # Set row/column names ------------- rownames(counts_dat) <- rownames(logfc_dat) <- rownames(prob_dat) <- rownames(virscan_info) <- paste0("pep_", 1:n_peps) colnames(counts_dat) <- colnames(logfc_dat) <- colnames(prob_dat) <- rownames(sample_meta) <- paste0("sample_", 1:n_samples) # Experimental metadata ------------- exp_meta <- list(date_run = as.Date("2021/01/20"), reads_per_sample = colSums(counts_dat))
To create a PhIPData
object, we will use the homonymous constructor PhIPData()
.
phip_obj <- PhIPData(counts_dat, logfc_dat, prob_dat, virscan_info, sample_meta, exp_meta) phip_obj
The PhIPData()
constructor is quite flexible; mismatched dimension names across assays and metadata are automatically corrected, and missing assays are initialized with empty matrices of the same dimensions. For more details on the constructor, type help(PhIPData)
.
PhIPData
objectAssays store matrix-like data. For PhIPData
objects, the assays counts
, logfc
, and prob
are required. If any of these matrices were missing from the constructor, they are initialized with empty matrices of the same dimensions. Experimental data can be accessed via assays(phip_obj)
or assay(phip_obj, i)
command. assays(phip_obj)
returns a list of all assays in the object, and list items can be accessed using the $
or [[
operators.
assays(phip_obj) head(assays(phip_obj)$counts) # Returns the same as assays(phip_obj)[["counts"]]
While assays(phip_obj)
returns a list of all assays in the PhIPData
object, assay(phip_obj, i)
returns a matrix
of the specified assay. If i
is missing, assay(phip_obj)
defaults to the first assay (counts
). i
can be a character specifying the assay name or a numeric index of the assay.
head(assay(phip_obj, "logfc")) # Returns the same as assay(phip_obj, 2)
Since all PhIPData
objects must contain the counts
, logfc
, and prob
assays, we have defined three homonyous function to conveniently access and modify these assays.
head(counts(phip_obj)) head(logfc(phip_obj)) head(prob(phip_obj))
After a PhIPData
object has been created, data for new and existing assays can be set using <-
. Dimension names of the replacement assays are automatically corrected to be identical to the object names. As we expect assays to contain homogenous data, replacement assays are coerced into matrices. Replacement assays must also be on the same dimension of the existing object.
replacement_dat <- matrix(1, nrow = n_peps, ncol = n_samples) # Replace the counts matrix ------------- head(counts(phip_obj)) counts(phip_obj) <- replacement_dat head(counts(phip_obj)) # Add a new assay ----------- head(assay(phip_obj, "new_assay")) assay(phip_obj, "new_assay") <- replacement_dat head(assay(phip_obj, "new_assay")) # Returns and error because `counts`, `logfc`, and `prob` must be in the # assays of a PhIPData object assays(phip_obj) <- list(new_assay1 = replacement_dat, new_assay2 = replacement_dat)
Information associated with peptides can be accessed using peptideInfo(phip_obj)
or the inherited rowRanges(phip_obj)
function. Both functions return a GRanges
object. GRanges
objects behave similar to matrices and can be subsetted using the usual 2-dimensional methods. More information about GRanges
objects can be found here.
Information about peptide positions in the protein sequence are stored as IRanges
in the GRanges
object. These are specified by pos_start
and pos_end
columns in the peptide metadata in the constructor. If these columns do not exist, the start and end positions are set to 0 by default.
# Only showing 2 columns for easier viewing peptideInfo(phip_obj)[, 8:9]
Sample metadata describing the samples can be extracted from the PhIPData
object using sampleInfo(phip_obj)
or the inherited function colData(phip_obj)
. Both functions return a DataFrame
where each row corresponds to a sample and each column corresponds to some sample description.
sampleInfo(phip_obj)
Like in SummarizedExperiments
/RangedSummarizedExperiments
, sample metadata can be accessed using the $
operator from the object. As demonstrated in the Subsetting section, this makes subetting data for a subgroup of samples very easy.
phip_obj$group
Data associated with the experiment such as papers, date the samples were run, etc. can be accessed via the metadata(phip_obj)
function. Experimental metadata is stored as a list, so it can be used to store any type of data.
metadata(phip_obj)
PhIPData
objectsLike subsetting matrices and dataframes, [
can be used for two-dimensional subsetting of PhIPData
objects.
phip_obj[1:10, 1:2]
As described in the Sample metadata section, $
operates on the sample metadata column names, so we can also use $
to select samples of a particular subgroups.
phip_obj[, phip_obj$group == "beads"]
In addition to subsetting by row indices, PhIPData
supports subsetting rows using peptide metadata information. subset(phip_obj, row_condition, column_condition)
returns a PhIPData
object with rows where the specified condition holds.
ebv_sub <- subset(phip_obj, grepl("Epstein-Barr virus", species)) peptideInfo(ebv_sub)[, "species"]
To subset all beads-only samples from a PhIPData
object, we can use the convenient wrapper function subsetBeads()
:
subsetBeads(phip_obj)
PhIPData
summariesTo assess the quality of the data, we are often interested in the number of reads per sample. This can be done using the librarySize()
function. Names can be removed by setting the withDimnames
parameter to FALSE
.
librarySize(phip_obj)
The proportion of sample reads pulled by each peptide can also be obtained via propReads()
. Like librarySize()
, names can also be removed by setting withDimnames
to FALSE
.
head(propReads(phip_obj))
Rather than re-importing peptide annotations, PhIPData
allows the user to create and reuse existing libraries. To save a library for future use, we can use makeLibrary(object, name_of_library)
. The peptide metadata should be in a matrix-like form such as a DataFrame
or data.frame
.
# Save virscan_info as the human_virus library makeLibrary(virscan_info, "human_virus")
The stored library can be accessed using getLibrary(library_name)
. We can then use the stored library to construct a new PhIPData
object as follows.
PhIPData(counts = counts_dat, sampleInfo = sample_meta, peptideInfo = getLibrary("human_virus"))
Libraries can be removed with the removeLibrary(library_name)
command.
removeLibrary("human_virus")
Peptides are often derived from species with long virus names. To quickly search up all viruses with "[Hh]uman immunodeficiency virus" in the species, you can create an alias of "HIV" to encode for the corresponding regex of interest. Alias's can be managed with:
getAlias(key)
: return the regex for the alias key
. setAlias(key, pattern)
: create or modify the alias for key
. If the key-pattern combination already exists in the database, then no changes are made. Otherwise, the pattern is replaced with pattern
.deleteAlias(key)
: remove an key-pattern combination from the alias database. # Create alias for HIV ---------- setAlias("hiv", "[Hh]uman immunodeficiency virus") # Use alias ---------- hiv_sub <- subset(phip_obj, grepl(getAlias("hiv"), species)) peptideInfo(hiv_sub)[, "species"] # Remove alias from database ----------- deleteAlias("hiv") # The following command returns an error that the virus does # not exist in the alias database. subset(phip_obj, grepl(getAlias("hiv"), species))
Alias-pattern combinations are case senstive, so an entry with key "hiv" would differ from an entry with key "HIV." Like libraries, by default the location of the alias database is in the extdata
folder of the PhIPData
package. The location to an .rda file with the key-pattern values in a dataframe called alias can be retreived and specified using the getAliasPath()
and setAliasPath()
functions, respectively.
PhIPData
to other containersAs packages for identifying differential experession are also commonly used in analyzing PhIP-seq data, the PhIPData
package supports coercion from PhIPData
objects to Lists
, lists
, DataFrame
, and DGELists
. The function as(phip_obj, "object_type")
converts a PhIPData
object to a object of object type
.
For DGELists
, the group
slot is automatically populated with the group
column in the sample metadata, if such a column exists.
# PhIPData to DGEList as(phip_obj, "DGEList") as(phip_obj, "DataFrame")
While PhIPData
objects can be roughly constructed from Lists
, lists
, and DGELists
, the coercion functions only construct a bare bones PhIPData
object (with only counts
, logfc
, and prob
assays and sample and peptide information). Any additional list/object components may be discarded.
sessionInfo()
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.