knitr::opts_chunk$set( #collapse = TRUE, comment = "#>", fig.width = 4, fig.height = 4, message = FALSE, warning = FALSE, tidy.opts = list( keep.blank.line = TRUE, width.cutoff = 150 ), options(width = 150), eval = TRUE )
In Vignette 6 we created a data frame called protPepNSA_AT5tmtMS2
that consists of all protein profiles, with each protein profile followed by its component peptide profiles. In this vignette, we shall first calculate RSA transformed profiles for all proteins and peptides, and then compute the constrained proportional assignments (CPA) for all proteins and peptides in a form ready for export. Then we show how to use it to plot profiles for any protein and its component peptides, with outlier peptides labelled in the plot.
First, we attach the protlocassign
package, which includes the protPepNSA_AT5tmtMS2
data frame. Note that for rows containing proteins, the peptide column contains just the protein name while for rows containing peptides, the peptide column contains a concatenated protein and peptide sequence. As in previous vignettes, we rename the embedded data frames to remove experiment specific designations (e.g., AT5tmtMS2) for ease of presentation.
library(protlocassign) data(protPepNSA_AT5tmtMS2) data(totProtAT5) protPepNSA <- protPepNSA_AT5tmtMS2 str(protPepNSA, strict.width="cut", width=65) totProt <- totProtAT5 totProt
Next, we extract the NSA reference profiles from the nine profile columns of protPepNSA
:
data(markerListJadot) refLocationProfilesNSA <- locationProfileSetup(profile=protPepNSA[, 4 + (1:9)], markerList=markerListJadot, numDataCols=9) round(refLocationProfilesNSA, digits=4)
Using the RSAfromNSA
function described previously in Vignette 3, we calculate the RSA-transformed marker profiles:
refLocationProfilesRSA <- RSAfromNSA(NSA=refLocationProfilesNSA, NstartMaterialFractions=6, totProt=totProtAT5) round(refLocationProfilesRSA, digits=4)
We transform the protein/peptide profiles by taking the nine columns containing the profile data from protPepNSA
and then, using the RSAfromNSA
function described previously in Vignette 3, we calculate an intermediate nine-column data frame protPepRSA_trimmed
of RSA-transformed profiles.
protPepRSA_trimmed <- RSAfromNSA(NSA=protPepNSA[, 4 + (1:9)], NstartMaterialFractions=6, totProt=totProtAT5) str(protPepRSA_trimmed, strict.width="cut", width=65)
Finally, we add the five reference columns back in as the first columns of protPepRSA
and also the two columns listing the number of spectra and peptides per protein. The resulting data frame protPepRSA
has the same structure as the original data frame protPepNSA
.
protPepRSA <- data.frame(protPepNSA[, 1:4], protPepRSA_trimmed, protPepNSA[,14:15] ) # add in the ref columns str(protPepRSA, strict.width="cut", width=65)
Next, we identify rows with proteins only, and extract them. The resulting data frame, protRSA
, parallels the structure of protNSA
. We also extract the rows with peptides only in the data frame pepRSA
.
protRSA.ind <- {protPepRSA$prot == protPepRSA$peptide} # protein indicators protRSA <- protPepRSA[protRSA.ind,] # these are the data for proteins only dim(protRSA) pepRSA <- protPepRSA[!protRSA.ind,] # these are the data for peptides only data.frame(colnames(protRSA))
Now we calculate the constrained proportional assignments on proteins only, using RSA-transformed profiles:
protCPAfromRSA <- fitCPA(profile=protRSA[, 4+1:9], refLocationProfiles=refLocationProfilesRSA, numDataCols=9) str(protCPAfromRSA, strict.width="cut", width=65)
The following commands generate a plot of TLN1 protein/peptides, with CPA estimates. Outlier peptide profiles are in orange. The header reports the number of peptides and spectra used to compute the protein profile, which in this case excludes outlier peptides and outlier spectra.
#windows(width=7.5, height=10) # open a window 7.5 by 10 inches protPepPlotfun(protName="TLN1", protProfile=protRSA[,5:15], Nspectra=TRUE, pepProfile=pepRSA, numRefCols=4, numDataCols=9, n.compartments=8, refLocationProfiles=refLocationProfilesRSA, assignPropsMat=protCPAfromRSA, yAxisLabel="Relative Specific Amount")
Note that the outlier peptides do not contribute to the CPA analysis of the proteins, but these may be of interest. For instance, they may represent protein isoforms with distinct distributions. Thus, there may be specific biological questions that require CPA estimates for all proteins and peptides without outlier removal. This can be accomplished using the following command:
protPepCPAfromRSA <- fitCPA(profile=protPepRSA[,4 + 1:9], refLocationProfiles=refLocationProfilesRSA, numDataCols=9) str(protPepCPAfromRSA, strict.width="cut", width=65)
We next assemble the final CPA values for the protein/peptide data along with ancillary information, ready for export. Then we output the data to C:\temp\myProteinOutput; users will select their own directory.
protPepCPAfromRSAout <- data.frame(protPepRSA[,1:4], protPepCPAfromRSA, protPepRSA[,14:15]) protPepCPAfromRSAout$prot <- paste("`", protPepCPAfromRSAout$prot, sep="") protPepCPAfromRSAout$peptide <- paste("`", protPepCPAfromRSAout$peptide, sep="") setwd("C:\\temp\\myProteinOutput") write.csv(protPepCPAfromRSAout, file="protPepCPAfromRSAout.csv", row.names=FALSE, na=".")
To output plots of all of the protein and peptide profiles into a single pdf file, we first use setwd
to point to the desired output directory, and then we can set up a loop as follows:
setwd("C:\\temp\\myProteinOutput") pdf(file="allProtPepPlotsRSA.pdf", width=7, height=10) n.prots <- nrow(protRSA) for (i in 1:n.prots) { protPepPlotfun(protName=protRSA$prot[i], protProfile=protRSA[,5:15], Nspectra=TRUE, pepProfile=pepRSA, numRefCols=4, numDataCols=9, n.compartments=8, refLocationProfiles=refLocationProfilesRSA, assignPropsMat=protCPAfromRSA, yAxisLabel="Relative Specific Amount") } dev.off()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.