library(proteus) library(ggplot2) library(knitr) knitr::opts_chunk$set(echo = TRUE)
This tutorial demonstrates how to analyse data from SILAC MS/MS experiment in Proteus. We strongly suggest the user should read the main tutorial first and familiarize themselves with the package.
vignette("proteus")
Here, we are only going to point out the differences between label-free and SILAC analysis. For this, we use a small subset of data from Ly et al. (2018). These data are distributed in a separate package proteusSILAC, which needs loading first:
library(proteusSILAC, warn.conflicts=FALSE) data(proteusSILAC)
library(proteusSILAC) data(proteusSILAC)
The default measure.cols
object is designed for label-free data. For SILAC data we need to specify isotope ratios. Our data contain M/L ratios:
measCols <- list( ML = "ratio_ml" )
The column naming convention in our evidence file is different from the default, included in the evidenceColumns
object. Hence, we need to create a new named list, as follows:
eviCols <- list( sequence = 'pep_sequence', modified_sequence = 'modified_sequence', modifications = 'modifications', protein_group = 'proteins', protein = 'leading_razor_protein', experiment = 'experiment', charge = 'charge', reverse = 'reverse', contaminant = 'potential_contaminant' )
Due to its large size we do not attach the SILAC evidence file to this package. The command to read the file would be:
evi <- readEvidenceFile(evidenceFile, measure.cols=measCols, data.cols=eviCols, zeroes.are.missing=FALSE)
Please note the argument zeroes.are.missing=FALSE
. By default, readEvidenceFile
assumes that zeroes represent missing data (which usually is true for label-free and TMT experiments). In this case we read ratios which, in theory, can be zeroes. In most cases its doesn't make any difference.
We do attach an example of processed evidence data with this package, it is called evi
. It contains all reported intensity columns:
head(evi)
Metadata for this experiment needs to specify all the reporter columns, experiments and corresponding condition and replicates:
metadataFile <- system.file("extdata", "metadata.txt", package="proteusSILAC") meta <- read.delim(metadataFile, header=TRUE, sep="\t")
meta
We chose the median to aggregate peptide intensities from multiple entries in the evidence data:
pepdat <- makePeptideTable(evi, meta, measure.cols=measCols, aggregate.fun=aggregateMedian, experiment.type="SILAC")
We can use generic summary
function to see more information about xppepdat
.
summary(pepdat)
This is the number of non-zero peptide intensities per sample.
plotCount(pepdat)
Now we create protein data. The aggregate.fun=aggregateMedian
indicates that protein ratios are medians of constituent peptide ratios. This approach is appropriate for SILAC data, unlike the default hi-flyer method which only works for label-free (or TMT) data.
prodat <- makeProteinTable(pepdat, aggregate.fun=aggregateMedian)
Again, we can use a generic summary
function to see its properties.
summary(prodat)
We normalize protein ratios to the median, that is, after the normalization each sample is going to have the same median. Median is the default normalization, so we do not need to specify the norm.fun
parameter:
prodat.norm <- normalizeData(prodat)
These two figures show reporter intensity distributions before and after normalization.
plotSampleDistributions(prodat, fill="replicate") + labs(title="Before") plotSampleDistributions(prodat.norm, fill="replicate") + labs(title="After")
For SILAC data we sometimes need to do differential expression on an isotope ratio, e.g. $M/L$. In our case, $M$ and $L$ labelled two different biological conditions, hence the $M/L$ ratio compares them. The null hypothesis is that $M/L = 1$ or $\log H/L = 0$. Because SILAC ratios tend to be symmetric in log space (see figure above), we chose the latter approach. Let's do the differential expression for the time point T48
.
res <- limmaRatioDE(prodat.norm, condition="T48") res <- res[order(res$P.Value),] head(res)
There are no statistically significant departures from the null hypothesis. Here is the volcano plot:
plotVolcano(res, binhex=FALSE)
Alternatively, we can compare the two time points with a two-condition differential expression:
res2 <- limmaDE(prodat.norm) res2 <- res2[order(res2$P.Value),] head(res2)
Though not statistically significant at the assumes 0.05 level, protein P03372-3 stands out. Here is a SILAC ratio plot for this protein:
plotIntensities(prodat.norm, id="P03372-3")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.