knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width=8, fig.height=8 )
This Vignette provides an example workflow for how to use the package MSstatsLiP.
To install this package, start R (version "4.1") and enter:
``` {r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("MSstatsLiP")
```r library(MSstatsLiP) library(tidyverse) library(data.table)
The first step is to load in the raw dataset for both the PTM and Protein datasets. This data can then be converted into MSstatsLiP format with one of the built in converters, or manually converted. In this case we use the converter for Spectronaut.
## Read in raw data files head(LiPRawData) head(TrPRawData)
## Run converter MSstatsLiP_data <- SpectronauttoMSstatsLiPFormat(LiPRawData, "../inst/extdata/ExampleFastaFile.fasta", TrPRawData, use_log_file = FALSE, append = FALSE) head(MSstatsLiP_data[["LiP"]]) head(MSstatsLiP_data[["TrP"]]) ## Make conditions match MSstatsLiP_data[["LiP"]][MSstatsLiP_data[["LiP"]]$Condition == "Control", "Condition"] = "Ctrl" MSstatsLiP_data[["TrP"]]$Condition = substr(MSstatsLiP_data[["TrP"]]$Condition, 1, nchar(MSstatsLiP_data[["TrP"]]$Condition)-1)
In the case above the SpectronauttoMSstatsLiPFormat was ran to convert the data into the format required for MSstatsLiP. Note that the condition names did not match between the LiP and TrP datasets. Here we edit the conditions in each dataset to match.
Additionally, it is important that the column "FULL_PEPTIDE" in the LiP dataset contains both the Protein and Peptide information, seperated by an underscore. This allows us to summarize up to the peptide level, while keeping important information about which protein the peptide corresponds to.
The next step in the MSstatsLiP workflow is to summarize feature intensities per run into one value using the dataSummarizationLiP function. This function takes as input the formatted data from the converter.
MSstatsLiP_Summarized <- dataSummarizationLiP(MSstatsLiP_data, MBimpute = FALSE, use_log_file = FALSE, append = FALSE) names(MSstatsLiP_Summarized[["LiP"]]) lip_protein_data <- MSstatsLiP_Summarized[["LiP"]]$ProteinLevelData trp_protein_data <- MSstatsLiP_Summarized[["TrP"]]$ProteinLevelData head(lip_protein_data) head(trp_protein_data)
Again the summarization function returns a list of two dataframes one each for LiP and TrP. Each LiP and TrP is also a list with additional summary information. This summarized data can be used as input into some of the plotting functions included in the package.
MSstatsLiP has a wide variety of plotting functionality to analysis and assess the results of experiments. Here we plot the number of half vs fully tryptic peptides per replicate.
trypticHistogramLiP(MSstatsLiP_Summarized, "../inst/extdata/ExampleFastaFile.fasta", color_scale = "bright", address = FALSE)
MSstatsLiP also provides a function to plot run correlation.
correlationPlotLiP(MSstatsLiP_Summarized, address = FALSE)
Here we provide a simple script to examine the coefficient of variance between conditions
MSstatsLiP_Summarized[["LiP"]]$FeatureLevelData %>% group_by(PEPTIDE, GROUP) %>% summarize(cv = sd(INTENSITY) / mean(INTENSITY)) %>% ggplot() + geom_violin(aes(x = GROUP, y = cv, fill = GROUP)) + labs(title = "Coefficient of Variation between Condtions", y = "Coefficient of Variation", x = "Conditon")
The following plots are used to view the summarized data and check for potential systemic issues.
## Quality Control Plot dataProcessPlotsLiP(MSstatsLiP_Summarized, type = 'QCPLOT', which.Peptide = "allonly", address = FALSE)
dataProcessPlotsLiP(MSstatsLiP_Summarized, type = 'ProfilePlot', which.Peptide = c("P14164_ILQNDLK"), address = FALSE)
In addition, Priciple Component Analysis can also be done on the summarized dataset. Three different PCA plots can be created one each for: Percent of explained variance per component, PC1 vs PC2 for peptides, and PC1 vs PC2 for conditions.
PCAPlotLiP(MSstatsLiP_Summarized, bar.plot = FALSE, protein.pca = FALSE, comparison.pca = TRUE, which.comparison = c("Ctrl", "Osmo"), address=FALSE) PCAPlotLiP(MSstatsLiP_Summarized, bar.plot = FALSE, protein.pca = TRUE, comparison.pca = FALSE, which.pep = c("P14164_ILQNDLK", "P17891_ALQLINQDDADIIGGRDR"), address=FALSE)
Finally, the trypticity of a peptide can also be calculated and added to any dataframe with the ProteinName and PeptideSequence column.
feature_data <- data.table::copy(MSstatsLiP_Summarized$LiP$FeatureLevelData) data.table::setnames(feature_data, c("PEPTIDE", "PROTEIN"), c("PeptideSequence", "ProteinName")) feature_data$PeptideSequence <- substr(feature_data$PeptideSequence, 1, nchar(as.character( feature_data$PeptideSequence)) - 2) calculateTrypticity(feature_data, "../inst/extdata/ExampleFastaFile.fasta") MSstatsLiP_Summarized$LiP$FeatureLevelData%>% rename(PeptideSequence=PEPTIDE, ProteinName=PROTEIN)%>% mutate(PeptideSequence=substr(PeptideSequence, 1, nchar(as.character(PeptideSequence))-2) ) %>% calculateTrypticity("../inst/extdata/ExampleFastaFile.fasta")
The modeling function groupComparisonLiP takes as input the output of the summarization function dataSummarizationLiP.
MSstatsLiP_model <- groupComparisonLiP(MSstatsLiP_Summarized, fasta = "../inst/extdata/ExampleFastaFile.fasta", use_log_file = FALSE, append = FALSE) lip_model <- MSstatsLiP_model[["LiP.Model"]] trp_model <- MSstatsLiP_model[["TrP.Model"]] adj_lip_model <- MSstatsLiP_model[["Adjusted.LiP.Model"]] head(lip_model) head(trp_model) head(adj_lip_model) ## Number of significant adjusted lip peptides adj_lip_model %>% filter(adj.pvalue < .05) %>% nrow()
The groupComparisonLiP function outputs a list with three separate models. These models are as follows: LiP model, TrP model, and adjusted LiP model.
groupComparisonPlotsLiP(MSstatsLiP_model, type = "VolcanoPlot", address = FALSE)
groupComparisonPlotsLiP(MSstatsLiP_model, type = "HEATMAP", numProtein=50, address = FALSE)
Here we show a barcode plot, showing the coverage of LiP and TrP peptides. This function requires the data in MSstatsLiP format and the path to a fasta file.
BarcodePlotLiP(MSstatsLiP_model, "../inst/extdata/ExampleFastaFile.fasta", model_type = "Adjusted", which.prot = c("P53235"), address = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.