library(proteus) library(knitr) library(ggplot2) knitr::opts_chunk$set(echo = TRUE)
This tutorial demonstrates how to analyse data from TMT MS/MS experiment in Proteus. We strongly suggest the user should read the main tutorial first and familiarize themselves with the package.
vignette("proteus")
Here, we are only going to point out the differences between label-free and TMT analysis. For this, we use data from Crozier et al. 2018, Pride accession PXD008741. These data are distributed in a separate package proteusTMT, which needs loading first:
library(proteusTMT, warn.conflicts=FALSE) data(proteusTMT)
library(proteusTMT) data(proteusTMT)
The default measure.cols
object is designed for label-free data. For TMT data we need to specify all reporter intensity columns. In out example we have 10 reporter columns, numbered from 0 to 9:
measCols <- paste0("Reporter intensity ", 0:9) names(measCols) <- paste0("reporter_", 0:9)
The resulting object is
str(as.list(measCols))
We are going to use the default evidenceColumns
object, but the user should carefully check that the column names correspond to those in the evidence file.
Due to its large size we do not attach the TMT evidence file to this package. The command to read the file would be:
evi <- readEvidenceFile(evidenceFile, measure.cols=measCols)
We do attach an example processed evidence data with the package, it is called evi
. It contains all reported intensity columns:
head(evi)
Metadata for this experiment needs to specify all the reporter columns, experiments and corresponding condition and replicates:
metadataFile <- system.file("extdata", "metadata.txt", package="proteusTMT") meta <- read.delim(metadataFile, header=TRUE, sep="\t")
meta
It is possible to run MaxQuant for one experiment where conditions are encoded in reporter columns only. In this case the evidence file will not contain the column 'Experiment'. Proteus can deal with such configuration. All you need is to remove the 'experiment' field from the default column list:
eviCols <- evidenceColumns eviCols$experiment <- NULL
and read evidence file with an additional argument:
evi <- readEvidenceFile(evidenceFile, measure.cols=measCols, data.cols = eviCols)
The 'Experiment' column will be skipped. Obviously, metadata should not contain 'Experiment' column either.
Please note that in such case peptide and protein objects created by Proteus will contain a metadata data frame (metadata
field in the object, see, e.g., pepdat$metadata
) with column experiment
filled with dummy values. This is done for consistent internal processing. OK, we are lazy and it was easier to do it this way.
We chose the median to aggregate peptide intensities from multiple entries in the evidence data:
pepdat <- makePeptideTable(evi, meta, measure.cols=measCols, aggregate.fun=aggregateMedian, experiment.type="TMT")
We can use generic summary
function to see more information about pepdat
.
summary(pepdat)
This is the number of non-zero peptide intensities per sample.
plotCount(pepdat)
We create protein data using the high-flyer method.
prodat <- makeProteinTable(pepdat, aggregate.fun=aggregateHifly, hifly=3)
Again, we can use a generic summary
function to see its properties.
summary(prodat)
For TMT data we recommend using CONSTANd normalization Maes et al. 2016.
prodat.norm <- normalizeTMT(prodat)
These two figures show reporter intensity distributions before and after normalization.
plotSampleDistributions(prodat, fill="replicate") + labs(title="Before") plotSampleDistributions(prodat.norm, log.scale=FALSE, fill="replicate") + labs(title="After")
We can use the same function plotClustering()
to see the dendrogram for the proteins.
plotClustering(prodat.norm)
These particular data come from a time-course experiment that would require analysis beyond Proteus package. We can however show how differential expression can be used to identify proteins changing between time points 0.5^h^ and 8^h^.
res <- limmaDE(prodat.norm, conditions=c("T0.5", "T8"))
The distribution of p-values looks fine.
plotPdist(res)
The top of the table of differentially expressed proteins:
res <- res[order(res$adj.P.Val),] head(res)
And the top protein from the table:
prot <- as.character(res$protein[1]) plotIntensities(prodat.norm, id=prot)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.