knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=6, fig.height=6 )
library(MSstatsTMTPTM) library(MSstatsTMT)
This vignette summarizes the functionalities and options of MSstastTMTPTM and provides a workflow example.
MSstatsTMTPTM includes the following two functions for data visualization and statistical testing:
dataProcessPlotsTMTPMT
groupComparisonTMTPTM
To install this package, start R (version "4.0") and enter:
``` {r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("MSstatsTMTPTM")
## 1. dataProcessPlotsTMTPTM() To illustrate the quantitative data and quality control of MS runs, dataProcessPlotsTMT takes the quantitative data from MSstatsTMT converter functions as input and generate two types of figures in pdf files as output : 1. Profile plot (specify "ProfilePlot" in option type), to identify the potential sources of variation for each protein; 2. Quality control plot (specify "QCPlot" in option type), to evaluate the systematic bias between MS runs. ### Arguments * `data.ptm` name of the data with PTM sites in protein name, which can be the output of MSstatsTMT converter functions. * `data.protein` name of the data with peptide level, which can be the output of MSstatsTMT converter functions. * `data.ptm.summarization` name of the data with ptm sites in protein-level name , which can be the output of the MSstatsTMT \code{\link{proteinSummarization}} function. * `data.protein.summarization` name of the data with protein-level, which can be the output of the MSstatsTMT \code{\link{proteinSummarization}} function. * `type` choice of visualization. "ProfilePlot" represents profile plot of log intensities across MS runs. "QCPlot" represents box plots of log intensities across channels and MS runs. * `ylimUp` upper limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot uses the upper limit as rounded off maximum of log2(intensities) after normalization + 3.. * `ylimDown` lower limit for y-axis in the log scale. FALSE(Default) for Profile Plot and QC Plot uses 0.. * `x.axis.size` size of x-axis labeling for "Run" and "channel in Profile Plot and QC Plot. * `y.axis.size` size of y-axis labels. Default is 10. * `text.size` size of labels represented each condition at the top of Profile plot and QC plot. Default is 4. * `text.angle` angle of labels represented each condition at the top of Profile plot and QC plot. Default is 0. * `legend.size` size of legend above Profile plot. Default is 7. * `dot.size.profile` size of dots in Profile plot. Default is 2. * `ncol.guide` number of columns for legends at the top of plot. Default is 5. * `width` width of the saved pdf file. Default is 10. * `height` height of the saved pdf file. Default is 10. * `which.Protein` Protein list to draw plots. List can be names of Proteins or order numbers of Proteins. Default is "all", which generates all plots for each protein. For QC plot, "allonly" will generate one QC plot with all proteins. * `originalPlot` TRUE(default) draws original profile plots, without normalization. * `summaryPlot` TRUE(default) draws profile plots with protein summarization for each channel and MS run. * `address` the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of "ProfilePlot.pdf" or "QCplot.pdf". The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window. ### Example The raw dataset for both the PTM and Protein datasets are required for the plotting function. This can be the output of the MSstatsTMT converter functions: `PDtoMSstatsTMTFormat`, `SpectroMinetoMSstatsTMTFormat`, and `OpenMStoMSstatsTMTFormat`. Both the PTM and protein datasets must include the following columns: `ProteinName`, `PeptideSequence`, `Charge`, `PSM`, `Mixture`, `TechRepMixture`, `Run`, `Channel`, `Condition`, `BioReplicate`, and `Intensity`. ``` {r} # read in raw data files # raw.ptm <- read.csv(file="raw.ptm.csv", header=TRUE) # raw.protein <- read.csv(file="raw.protein.csv", header=TRUE) head(raw.ptm) head(raw.protein)
# Run MSstatsTMT proteinSummarization function quant.msstats.ptm <- proteinSummarization(raw.ptm, method = "msstats", global_norm = TRUE, reference_norm = FALSE, MBimpute = TRUE) quant.msstats.protein <- proteinSummarization(raw.protein, method = "msstats", global_norm = TRUE, reference_norm = FALSE, MBimpute = TRUE)
head(quant.msstats.ptm) head(quant.msstats.protein) # Profile Plot dataProcessPlotsTMTPTM(data.ptm=raw.ptm, data.protein=raw.protein, data.ptm.summarization=quant.msstats.ptm, data.protein.summarization=quant.msstats.protein, type='ProfilePlot' ) # Quality Control Plot # dataProcessPlotsTMTPTM(data.ptm=ptm.input.pd, # data.protein=protein.input.pd, # data.ptm.summarization=quant.msstats.ptm, # data.protein.summarization=quant.msstats.protein, # type='QCPlot')
Tests for significant changes in PTM abundance adjusted for global protein abundance across conditions based on a family of linear mixed-effects models in TMT experiment. Experimental design of case-control study (patients are not repeatedly measured) is automatically determined based on proper statistical model.
data.ptm
: Name of the output of proteinSummarization function with PTM
data. It should have columns named Protein
, TechRepMixture
, Mixture
,
Run
, Channel
, Condition
, BioReplicate
, Abundance
.data.protein
: Name of the output of proteinSummarization function with
Protein data. It should have columns named Protein
, TechRepMixture
,Mixture
, Run
, Channel
, Condition
, BioReplicate
, Abundance
.contrast.matrix
: Comparison between conditions of interests. 1) default is
pairwise
, which compare all possible pairs between two conditions.
2) Otherwise, users can specify the comparisons of interest. Based on the levels
of conditions, specify 1 or -1 to the conditions of interests and 0 otherwise.
The levels of conditions are sorted alphabetically.moderated
: If moderated = TRUE, then moderated t statistic will be
calculated; otherwise, ordinary t statistic will be used.adj.method
: adjusted method for multiple comparison. 'BH` is default.# test for all the possible pairs of conditions model.results.pairwise <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm, data.protein=quant.msstats.protein) names(model.results.pairwise) head(model.results.pairwise[[1]]) # Load specific contrast matrix #example.contrast.matrix <- read.csv(file="example.contrast.matrix.csv", header=TRUE) example.contrast.matrix # test for specified condition comparisons only model.results.contrast <- groupComparisonTMTPTM(data.ptm=quant.msstats.ptm, data.protein=quant.msstats.protein, contrast.matrix = example.contrast.matrix) names(model.results.contrast) head(model.results.contrast[[1]])
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.