In Vitek-Lab/MSstatsPTM: Statistical Characterization of Post-translational Modifications

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=8, 
  fig.height=8
)

This vignette provides an example workflow for how to use the package MSstatsPTM for a mass spectrometry based proteomics experiment acquired with a labelfree acquisition methods, such as DDA/DIA/SRM/PRM. It also provides examples and an analysis of how adjusting for global protein levels allows for better interpretations of PTM modeling results.

Installation

To install this package, start R (version "4.0") and enter:

``` {r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("MSstatsPTM")

```r
library(MSstatsPTM)
library(MSstats)

1. Workflow

1.1 Raw Data Format

Note: We are actively developing dedicated converters for MSstatsPTM. If you have data from a processing tool that does not have a dedicated converter in MSstatsPTM please add a github issue https://github.com/Vitek-Lab/MSstatsPTM/issues and we will add the converter.

The first step is to load in the raw dataset for both the PTM and Protein datasets. Each dataset can formatted using dedicated converters in MSstatsPTM, such as FragePipetoMSstatsPTMFormat. If there is not a dedicated converter for your tool in MSstatsPTM, you can alternatively leverage converters from base MSstats. If using converters from MSstats note they will need to be run both on the global protein and PTM datasets.

You might notice a FASTA file is also needed for some converters. This FASTA file can be obtained by querying Uniprot with all of the protein IDs present in your PTM dataset. The FASTA file is a necessary input because some tools (e.g. MaxQuant) do not report the specific amino acid that is modified relative to the whole protein sequence. Rather, they report the specific amino acid relative to the reported peptide. This distinction is important because modifications like phosphorylation, methylation, or acetylation often have specific roles depending on where they occur within the full-length protein. With the help of a FASTA file, MSstatsPTM can determine the specific amino acid that is modified in the context of the whole protein sequence.

Please note for the PTM dataset, both the protein and modification site (or peptide), must be added into the ProteinName column. This allows for the package to summarize to the peptide level, and avoid the off chance there are matching peptides between proteins. For an example of how this can be done please see the code below.

The output of the converter is a list with two formatted data.tables. One each for the PTM and Protein datasets. If a global profiling run was not performed, the Protein data.table will just be NULL

1.1.1 MaxQuant - `MaxQtoMSstatsPTMFormat`

MSstatsPTM includes a dedicated converter for MaxQuant output. Experiments can be acquired with label-free or TMT labeling methods. No matter the experiment, we recommend using the evidence.txt file, and not a PTM specific file such as the Phospho (STY).txt file. The MaxQtoMSstatsPTMFormat converter includes a variety of parameters. Examples for processing a TMT and LF dataset can be seen below.

``` {r maxq, eval=TRUE}

TMT experiment

head(maxq_tmt_evidence) head(maxq_tmt_annotation)

msstats_format_tmt = MaxQtoMSstatsPTMFormat(evidence=maxq_tmt_evidence, annotation=maxq_tmt_annotation, fasta=system.file("extdata", "maxq_tmt_fasta.fasta", package="MSstatsPTM"), fasta_protein_name="uniprot_ac", mod_id="$Phospho \(STY$\)", use_unmod_peptides=TRUE, labeling_type = "TMT", which_proteinid_ptm = "Proteins")

head(msstats_format_tmt$PTM) head(msstats_format_tmt$PROTEIN)

LF experiment

head(maxq_lf_evidence) head(maxq_lf_annotation)

msstats_format_lf = MaxQtoMSstatsPTMFormat(evidence=maxq_lf_evidence, annotation=maxq_lf_annotation, fasta=system.file("extdata", "maxq_lf_fasta.fasta", package="MSstatsPTM"), fasta_protein_name="uniprot_ac", mod_id="$Phospho \(STY$\)", use_unmod_peptides=TRUE, labeling_type = "LF", which_proteinid_ptm = "Proteins")

head(msstats_format_lf$PTM) head(msstats_format_lf$PROTEIN)

#### 1.1.2 FragPipe - `FragPipetoMSstatsPTMFormat`

`MSstatsPTM` includes a dedicated converter for FragPipe output. Experiments 
must be acquired with TMT labeling methods (a label-free converter is currently 
in development).The input is the `msstats.csv` and annotation files 
automatically generated by FragPipe. FragPipe provides additional info on the 
localization of modification sites, and the `FragPipetoMSstatsPTMFormat` 
converter includes localization options that are not present in other 
converters.

``` {r fragpipe, eval=TRUE}
head(fragpipe_input)
head(fragpipe_annotation)
head(fragpipe_input_protein)
head(fragpipe_annotation_protein)

msstats_data = FragPipetoMSstatsPTMFormat(fragpipe_input,
                                          fragpipe_annotation,
                                          fragpipe_input_protein, 
                                          fragpipe_annotation_protein,
                                          mod_id_col = "STY",
                                          localization_cutoff=.75,
                                          remove_unlocalized_peptides=TRUE)
head(msstats_data$PTM)
head(msstats_data$PROTEIN)

1.1.3 Proteome Discoverer - `PDtoMSstatsPTMFormat`

MSstatsPTM includes a dedicated converter for Proteome Discoverer output. Experiments can be acquired with label-free or TMT labeling methods. The input is the psm file and a custom built annotation file. Proteome Discoverer provides additional info on the localization of modification sites, and the PDtoMSstatsPTMFormat converter includes localization options that are not present in other converters.

``` {r pd, eval=TRUE} head(pd_psm_input) head(pd_annotation)

msstats_format = PDtoMSstatsPTMFormat(pd_psm_input, pd_annotation, system.file("extdata", "pd_fasta.fasta", package="MSstatsPTM"), use_unmod_peptides=TRUE, which_proteinid = "Master.Protein.Accessions")

head(msstats_format$PTM) head(msstats_format$PROTEIN)

#### 1.1.4 Spectronaut - `SpectronauttoMSstatsPTMFormat`

`MSstatsPTM` includes a dedicated converter for Spectronaut output. 
Experiments can be acquired with label-free labeling methods.

``` {r Spectronaut, eval=TRUE}
head(spectronaut_input)
head(spectronaut_annotation)

msstats_input = SpectronauttoMSstatsPTMFormat(spectronaut_input, 
                  annotation=spectronaut_annotation, 
                  fasta_path=system.file("extdata", "spectronaut_fasta.fasta", 
                                         package="MSstatsPTM"),
                  use_unmod_peptides=TRUE,
                  mod_id = "\\[Phospho \\(STY\\)\\]",
                  fasta_protein_name = "uniprot_iso"
                  )

head(msstats_input$PTM)
head(msstats_input$PROTEIN)

1.1.5 Skyline - `SkylinetoMSstatsPTMFormat`

MSstatsPTM includes a dedicated converter for Skyline output. Experiments can be acquired with label-free labeling methods.

1.1.6 Peak Studio - `PStoMSstatsPTMFormat`

MSstatsPTM includes a dedicated converter for Peak Studio output. Experiments can be acquired with label-free labeling methods.

1.1.6 Progenesis - `ProgenesistoMSstatsPTMFormat`

MSstatsPTM includes a dedicated converter for Progenesis output. Experiments can be acquired with label-free labeling methods.

1.1.7 Additional tools

If there is not a dedicated MSstatsPTM converter for a processing tool, the existing converters in MSstats and MSstatsTMT converters can be used as described below.

Note, in order to do this, it is critical that the ProteinName column be a combination of the Protein Name and modification site.

As an example, if you would like to analyze an experiment processed with DIANN, you can leverage the DIANNtoMSstatsFormat in MSstats. Given two datasets, named raw_ptm_df and raw_protein_df, and an annotation file, we can process the data as follows.

# Add site into ProteinName column
raw_ptm_df$ProteinName = paste(raw_ptm_df$ProteinName,
                                raw_ptm_df$Site, sep = "_")

# Run MSstats Converters
PTM_data = MSstats::DIANNtoMSstatsFormat(raw_ptm_df, annotation)
PROTEIN_data = MSstats::DIANNtoMSstatsFormat(raw_protein_df, annotation)

# Combine into one list
msstatsptm_input_data = list(PTM = PTM_data, PROTEIN = PROTEIN_data)

The variable msstatsptm_input_data can now be used as the input to the remainder of the MSstatsPTM processing pipeline.

1.2 Summarization - `dataSummarizationPTM`

After loading in the input data, the next step is to use the dataSummarizationPTM function. This provides the summarized dataset needed to model the protein/PTM abundance. The function will summarize the Protein dataset up to the protein level and will summarize the PTM dataset up to the peptide level. There are multiple options for normalization and missing value imputation. These options should be reviewed in the package documentation.

MSstatsPTM.summary = dataSummarizationPTM(raw.input, verbose = FALSE, 
                                          use_log_file = FALSE, append = FALSE)

head(MSstatsPTM.summary$PTM$ProteinLevelData)
head(MSstatsPTM.summary$PROTEIN$ProteinLevelData)

The summarize function returns a list with PTM and Protein summarization information. Each PTM and Protein include a list of data.tables: FeatureLevelData is a data.table of reformatted input of dataSummarizationPTM, ProteinLevelData is the run level summarization data.

1.2.1 QCPlot

Once summarized, MSstatsPTM provides multiple plots to analyze the experiment. Here we show the quality control boxplot. The first plot shows the modified data and the second plot shows the global protein dataset.

dataProcessPlotsPTM(MSstatsPTM.summary,
                    type = 'QCPLOT',
                    which.PTM = "allonly",
                    address = FALSE)

1.2.2 Profile Plot

Here we show a profile plot. Again the top plot shows the modified peptide, and the bottom shows the overall protein.

dataProcessPlotsPTM(MSstatsPTM.summary,
                    type = 'ProfilePlot',
                    which.Protein = "Q9Y6C9",
                    address = FALSE)

1.3 Modeling - `groupComparisonPTM`

After summarization, the summarized datasets can be modeled using the groupComparisonPTM function. This function will model the PTM and Protein summarized datasets, and then adjust the PTM model for changes in overall protein abundance. The output of the function is a list containing these three models named: PTM.Model, PROTEIN.Model, ADJUSTED.Model.

# Specify contrast matrix
comparison = matrix(c(-1,0,1,0),nrow=1)
row.names(comparison) = "CCCP-Ctrl"
colnames(comparison) = c("CCCP", "Combo", "Ctrl", "USP30_OE")

MSstatsPTM.model = groupComparisonPTM(MSstatsPTM.summary, 
                                      data.type = "LabelFree",
                                      contrast.matrix = comparison,
                                      use_log_file = FALSE, append = FALSE,
                                      verbose = FALSE)
head(MSstatsPTM.model$PTM.Model)
head(MSstatsPTM.model$PROTEIN.Model)
head(MSstatsPTM.model$ADJUSTED.Model)

1.3.1 Volcano Plot

The models from the groupComparisonPTM function can be used in the model visualization function, groupComparisonPlotsPTM. Here we show Volcano Plots for the models.

``` {r volcano, message=FALSE, warning=FALSE} groupComparisonPlotsPTM(data = MSstatsPTM.model, type = "VolcanoPlot", FCcutoff= 2, logBase.pvalue = 2, address=FALSE)

### 1.3.2 Heatmap Plot

Here we show a Heatmap for the models.

``` {r heatmap, message=FALSE, warning=FALSE}
groupComparisonPlotsPTM(data = MSstatsPTM.model,
                        type = "Heatmap",
                        which.PTM = 1:30,
                        address=FALSE)

1.4 Sample Size Calculation - `designSampleSizePTM`

Finally, sample size calculation can be performed using the output of the model and the designSampleSizePTM

# Specify contrast matrix
sample_size = designSampleSizePTM(MSstatsPTM.model, c(2.0, 2.75), FDR = 0.05, 
                                  numSample = TRUE, power = 0.8)

head(sample_size)

1.4.1 Sample Size Plot

The output of the sample size function can be plotted using the MSstats designSampleSizePlots function.

MSstats::designSampleSizePlots(sample_size)

Vitek-Lab/MSstatsPTM documentation built on Dec. 19, 2024, 6:02 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Vitek-Lab/MSstatsPTM
Statistical Characterization of Post-translational Modifications

In Vitek-Lab/MSstatsPTM: Statistical Characterization of Post-translational Modifications

Installation

1. Workflow

1.1 Raw Data Format

1.1.1 MaxQuant - `MaxQtoMSstatsPTMFormat`

TMT experiment

LF experiment

1.1.3 Proteome Discoverer - `PDtoMSstatsPTMFormat`

1.1.5 Skyline - `SkylinetoMSstatsPTMFormat`

1.1.6 Peak Studio - `PStoMSstatsPTMFormat`

1.1.6 Progenesis - `ProgenesistoMSstatsPTMFormat`

1.1.7 Additional tools

1.2 Summarization - `dataSummarizationPTM`

1.2.1 QCPlot

1.2.2 Profile Plot

1.3 Modeling - `groupComparisonPTM`

1.3.1 Volcano Plot

1.4 Sample Size Calculation - `designSampleSizePTM`

1.4.1 Sample Size Plot

R Package Documentation

Browse R Packages

We want your feedback!

Vitek-Lab/MSstatsPTM Statistical Characterization of Post-translational Modifications

In Vitek-Lab/MSstatsPTM: Statistical Characterization of Post-translational Modifications

Installation

1. Workflow

1.1 Raw Data Format

1.1.1 MaxQuant - MaxQtoMSstatsPTMFormat

TMT experiment

LF experiment

1.1.3 Proteome Discoverer - PDtoMSstatsPTMFormat

1.1.5 Skyline - SkylinetoMSstatsPTMFormat

1.1.6 Peak Studio - PStoMSstatsPTMFormat

1.1.6 Progenesis - ProgenesistoMSstatsPTMFormat

1.1.7 Additional tools

1.2 Summarization - dataSummarizationPTM

1.2.1 QCPlot

1.2.2 Profile Plot

1.3 Modeling - groupComparisonPTM

1.3.1 Volcano Plot

1.4 Sample Size Calculation - designSampleSizePTM

1.4.1 Sample Size Plot

R Package Documentation

Browse R Packages

We want your feedback!

Vitek-Lab/MSstatsPTM
Statistical Characterization of Post-translational Modifications

1.1.1 MaxQuant - `MaxQtoMSstatsPTMFormat`

1.1.3 Proteome Discoverer - `PDtoMSstatsPTMFormat`

1.1.5 Skyline - `SkylinetoMSstatsPTMFormat`

1.1.6 Peak Studio - `PStoMSstatsPTMFormat`

1.1.6 Progenesis - `ProgenesistoMSstatsPTMFormat`

1.2 Summarization - `dataSummarizationPTM`

1.3 Modeling - `groupComparisonPTM`

1.4 Sample Size Calculation - `designSampleSizePTM`