knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=8, fig.height=8 )
This vignette provides an example workflow for how to use the package
MSstatsPTM
for a mass spectrometry based proteomics experiment acquired with
a labelfree acquisition methods, such as DDA/DIA/SRM/PRM. It also provides
examples and an analysis of how adjusting for global protein levels allows for
better interpretations of PTM modeling results.
To install this package, start R (version "4.0") and enter:
``` {r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("MSstatsPTM")
```r library(MSstatsPTM) library(MSstats)
Note: We are actively developing dedicated converters for MSstatsPTM
. If you
have data from a processing tool that does not have a dedicated converter in
MSstatsPTM please add a github issue
https://github.com/Vitek-Lab/MSstatsPTM/issues
and we will add the
converter.
The first step is to load in the raw dataset for both the PTM and Protein
datasets. Each dataset can formatted using dedicated converters in MSstatsPTM
,
such as FragePipetoMSstatsPTMFormat
. If there is not a dedicated converter
for your tool in MSstatsPTM
, you can alternatively leverage converters from
base MSstats
. If using converters from MSstats
note they will need to be
run both on the global protein and PTM datasets.
You might notice a FASTA file is also needed for some converters. This FASTA file can be obtained by querying Uniprot with all of the protein IDs present in your PTM dataset. The FASTA file is a necessary input because some tools (e.g. MaxQuant) do not report the specific amino acid that is modified relative to the whole protein sequence. Rather, they report the specific amino acid relative to the reported peptide. This distinction is important because modifications like phosphorylation, methylation, or acetylation often have specific roles depending on where they occur within the full-length protein. With the help of a FASTA file, MSstatsPTM can determine the specific amino acid that is modified in the context of the whole protein sequence.
Please note for the PTM dataset, both the protein and modification site (or
peptide), must be added into the ProteinName
column. This allows for the
package to summarize to the peptide level, and avoid the off chance there are
matching peptides between proteins. For an example of how this can be done
please see the code below.
The output of the converter is a list with two formatted data.tables. One each
for the PTM and Protein datasets. If a global profiling run was not performed,
the Protein data.table will just be NULL
MaxQtoMSstatsPTMFormat
MSstatsPTM
includes a dedicated converter for MaxQuant output. Experiments can
be acquired with label-free or TMT labeling methods. No matter the experiment,
we recommend using the evidence.txt
file, and not a PTM specific file such as
the Phospho (STY).txt
file. The MaxQtoMSstatsPTMFormat
converter includes a
variety of parameters. Examples for processing a TMT and LF dataset can be seen
below.
``` {r maxq, eval=TRUE}
head(maxq_tmt_evidence) head(maxq_tmt_annotation)
msstats_format_tmt = MaxQtoMSstatsPTMFormat(evidence=maxq_tmt_evidence, annotation=maxq_tmt_annotation, fasta=system.file("extdata", "maxq_tmt_fasta.fasta", package="MSstatsPTM"), fasta_protein_name="uniprot_ac", mod_id="\(Phospho \(STY\)\)", use_unmod_peptides=TRUE, labeling_type = "TMT", which_proteinid_ptm = "Proteins")
head(msstats_format_tmt$PTM) head(msstats_format_tmt$PROTEIN)
head(maxq_lf_evidence) head(maxq_lf_annotation)
msstats_format_lf = MaxQtoMSstatsPTMFormat(evidence=maxq_lf_evidence, annotation=maxq_lf_annotation, fasta=system.file("extdata", "maxq_lf_fasta.fasta", package="MSstatsPTM"), fasta_protein_name="uniprot_ac", mod_id="\(Phospho \(STY\)\)", use_unmod_peptides=TRUE, labeling_type = "LF", which_proteinid_ptm = "Proteins")
head(msstats_format_lf$PTM) head(msstats_format_lf$PROTEIN)
#### 1.1.2 FragPipe - `FragPipetoMSstatsPTMFormat` `MSstatsPTM` includes a dedicated converter for FragPipe output. Experiments must be acquired with TMT labeling methods (a label-free converter is currently in development).The input is the `msstats.csv` and annotation files automatically generated by FragPipe. FragPipe provides additional info on the localization of modification sites, and the `FragPipetoMSstatsPTMFormat` converter includes localization options that are not present in other converters. ``` {r fragpipe, eval=TRUE} head(fragpipe_input) head(fragpipe_annotation) head(fragpipe_input_protein) head(fragpipe_annotation_protein) msstats_data = FragPipetoMSstatsPTMFormat(fragpipe_input, fragpipe_annotation, fragpipe_input_protein, fragpipe_annotation_protein, mod_id_col = "STY", localization_cutoff=.75, remove_unlocalized_peptides=TRUE) head(msstats_data$PTM) head(msstats_data$PROTEIN)
PDtoMSstatsPTMFormat
MSstatsPTM
includes a dedicated converter for Proteome Discoverer output.
Experiments can be acquired with label-free or TMT labeling methods. The input
is the psm
file and a custom built annotation file. Proteome Discoverer
provides additional info on the localization of modification sites, and the
PDtoMSstatsPTMFormat
converter includes localization options that are not
present in other converters.
``` {r pd, eval=TRUE} head(pd_psm_input) head(pd_annotation)
msstats_format = PDtoMSstatsPTMFormat(pd_psm_input, pd_annotation, system.file("extdata", "pd_fasta.fasta", package="MSstatsPTM"), use_unmod_peptides=TRUE, which_proteinid = "Master.Protein.Accessions")
head(msstats_format$PTM) head(msstats_format$PROTEIN)
#### 1.1.4 Spectronaut - `SpectronauttoMSstatsPTMFormat` `MSstatsPTM` includes a dedicated converter for Spectronaut output. Experiments can be acquired with label-free labeling methods. ``` {r Spectronaut, eval=TRUE} head(spectronaut_input) head(spectronaut_annotation) msstats_input = SpectronauttoMSstatsPTMFormat(spectronaut_input, annotation=spectronaut_annotation, fasta_path=system.file("extdata", "spectronaut_fasta.fasta", package="MSstatsPTM"), use_unmod_peptides=TRUE, mod_id = "\\[Phospho \\(STY\\)\\]", fasta_protein_name = "uniprot_iso" ) head(msstats_input$PTM) head(msstats_input$PROTEIN)
SkylinetoMSstatsPTMFormat
MSstatsPTM
includes a dedicated converter for Skyline output.
Experiments can be acquired with label-free labeling methods.
PStoMSstatsPTMFormat
MSstatsPTM
includes a dedicated converter for Peak Studio output.
Experiments can be acquired with label-free labeling methods.
ProgenesistoMSstatsPTMFormat
MSstatsPTM
includes a dedicated converter for Progenesis output.
Experiments can be acquired with label-free labeling methods.
If there is not a dedicated MSstatsPTM
converter for a processing tool,
the existing converters in MSstats
and MSstatsTMT
converters can be used as
described below.
Note, in order to do this, it is critical that the ProteinName column be a combination of the Protein Name and modification site.
As an example, if you would like to analyze an experiment processed with DIANN,
you can leverage the DIANNtoMSstatsFormat
in MSstats
. Given two datasets,
named raw_ptm_df
and raw_protein_df
, and an annotation file, we can process
the data as follows.
# Add site into ProteinName column raw_ptm_df$ProteinName = paste(raw_ptm_df$ProteinName, raw_ptm_df$Site, sep = "_") # Run MSstats Converters PTM_data = MSstats::DIANNtoMSstatsFormat(raw_ptm_df, annotation) PROTEIN_data = MSstats::DIANNtoMSstatsFormat(raw_protein_df, annotation) # Combine into one list msstatsptm_input_data = list(PTM = PTM_data, PROTEIN = PROTEIN_data)
The variable msstatsptm_input_data
can now be used as the input to the
remainder of the MSstatsPTM
processing pipeline.
dataSummarizationPTM
After loading in the input data, the next step is to use the
dataSummarizationPTM
function. This provides the summarized dataset needed to
model the protein/PTM abundance. The function will summarize the
Protein dataset up to the protein level and will summarize the PTM dataset up to
the peptide level. There are multiple options for normalization and missing
value imputation. These options should be reviewed in the package documentation.
MSstatsPTM.summary = dataSummarizationPTM(raw.input, verbose = FALSE, use_log_file = FALSE, append = FALSE) head(MSstatsPTM.summary$PTM$ProteinLevelData) head(MSstatsPTM.summary$PROTEIN$ProteinLevelData)
The summarize function returns a list with PTM and Protein summarization
information. Each PTM and Protein include a list of data.tables: FeatureLevelData
is a data.table of reformatted input of dataSummarizationPTM, ProteinLevelData
is
the run level summarization data.
Once summarized, MSstatsPTM provides multiple plots to analyze the experiment. Here we show the quality control boxplot. The first plot shows the modified data and the second plot shows the global protein dataset.
dataProcessPlotsPTM(MSstatsPTM.summary, type = 'QCPLOT', which.PTM = "allonly", address = FALSE)
Here we show a profile plot. Again the top plot shows the modified peptide, and the bottom shows the overall protein.
dataProcessPlotsPTM(MSstatsPTM.summary, type = 'ProfilePlot', which.Protein = "Q9Y6C9", address = FALSE)
groupComparisonPTM
After summarization, the summarized datasets can be modeled using the
groupComparisonPTM
function. This function will model the PTM and Protein
summarized datasets, and then adjust the PTM model for changes in overall
protein abundance. The output of the function is a list containing these three
models named: PTM.Model
, PROTEIN.Model
, ADJUSTED.Model
.
# Specify contrast matrix comparison = matrix(c(-1,0,1,0),nrow=1) row.names(comparison) = "CCCP-Ctrl" colnames(comparison) = c("CCCP", "Combo", "Ctrl", "USP30_OE") MSstatsPTM.model = groupComparisonPTM(MSstatsPTM.summary, data.type = "LabelFree", contrast.matrix = comparison, use_log_file = FALSE, append = FALSE, verbose = FALSE) head(MSstatsPTM.model$PTM.Model) head(MSstatsPTM.model$PROTEIN.Model) head(MSstatsPTM.model$ADJUSTED.Model)
The models from the groupComparisonPTM
function can be used in the model
visualization function, groupComparisonPlotsPTM
. Here we show Volcano Plots
for the models.
``` {r volcano, message=FALSE, warning=FALSE} groupComparisonPlotsPTM(data = MSstatsPTM.model, type = "VolcanoPlot", FCcutoff= 2, logBase.pvalue = 2, address=FALSE)
### 1.3.2 Heatmap Plot Here we show a Heatmap for the models. ``` {r heatmap, message=FALSE, warning=FALSE} groupComparisonPlotsPTM(data = MSstatsPTM.model, type = "Heatmap", which.PTM = 1:30, address=FALSE)
designSampleSizePTM
Finally, sample size calculation can be performed using the output of the model
and the designSampleSizePTM
# Specify contrast matrix sample_size = designSampleSizePTM(MSstatsPTM.model, c(2.0, 2.75), FDR = 0.05, numSample = TRUE, power = 0.8) head(sample_size)
The output of the sample size function can be plotted using the MSstats
designSampleSizePlots
function.
MSstats::designSampleSizePlots(sample_size)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.