Home

/

GitHub

/

In ComputationalProteomicsUnit/Pbase: Manipulating and exploring protein and proteomics data

BiocStyle::markdown()

Package: Pbase
Authors: Laurent Gatto and Sebastian Gibb
Last compiled: r date()
Last modified: r file.info("Pbase-data.Rmd")$mtime

library("Pbase")

Introduction

This vignette briefly introduces the central data object of the Pbase package, namely Proteins instances, as depicted below. They contain a set of protein sequences (10 in the figure below), composed of the protein sequences (grey boxes) and annotation data (table on the left). Each protein links to a set of ranges of interest, such as protein domains of experimentally observed peptides (also in grey) that are also decorated with their own annotation data. The figure also show the accessors for the different data slots, that are detailed in ?Proteins.

Pbase:::pplot()

Proteins objects are populated by protein sequences stemming from a fasta file and the peptides typically originate from an LC-MSMS experiment.

The original data used below is a 10 fmol Peptide Retention Time Calibration Mixture spiked into 50 ng HeLa background acquired on a Thermo Orbitrap Q Exactive instrument. A restricted set of high scoring human proteins from the UniProt release 2015_02 were searched using the MSGF+ search engine.

The fasta database

library("Biostrings")
fafile <- system.file("extdata/HUMAN_2015_02_selected.fasta",
                      package = "Pbase")
fa <- readAAStringSet(fafile)
fa

The PSM data

library("mzID")
idfile <- system.file("extdata/Thermo_Hela_PRTC_selected.mzid",
                      package = "Pbase")
id <- flatten(mzID(idfile))
dim(id)
head(id)

The Proteins object

library("Pbase")
p <- Proteins(fafile)
p <- addIdentificationData(p, idfile)
p

A Proteins object is composed of a set of protein sequences accessible with the aa accessor as well as an optional set of peptides features that are mapped as coordinates along the proteins, available with pranges. The actual peptide sequences can be extraced with pfeatures. The names of the protein sequences can be extraced with seqnames.

aa(p)
seqnames(p)
pranges(p)
pfeatures(p)

A Proteins instance is further described by general metadata list. Protein sequence and peptide features annotations can be accessed with acols and pcols respectively, which return DataFrame instances.

metadata(p)
acols(p)
pcols(p)

Specific proteins can be extracted by index of name using [ and proteins and their peptide features can be plotted with the default plot method.

seqnames(p)
plot(p[c(1,9)])

More details can be found in ?Proteins. The object generated above is also directly available as data(p).

Session information

sessionInfo()

ComputationalProteomicsUnit/Pbase documentation built on Aug. 10, 2019, 1:25 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ComputationalProteomicsUnit/Pbase
Manipulating and exploring protein and proteomics data

In ComputationalProteomicsUnit/Pbase: Manipulating and exploring protein and proteomics data

Introduction

The fasta database

The PSM data

The Proteins object

Session information

R Package Documentation

Browse R Packages

We want your feedback!

ComputationalProteomicsUnit/Pbase Manipulating and exploring protein and proteomics data

In ComputationalProteomicsUnit/Pbase: Manipulating and exploring protein and proteomics data

Introduction

The fasta database

The PSM data

The Proteins object

Session information

R Package Documentation

Browse R Packages

We want your feedback!

ComputationalProteomicsUnit/Pbase
Manipulating and exploring protein and proteomics data