Description Objects from the Class Details Developement notes Slots Extends Methods Functions Author(s) References See Also Examples
The Proteins
class encapsulates data and meta-data for
proteomics experiments. The class stores the protein sequences as well
as specific subsets of interest, typically peptides, as ranges. The
Proteins
instances, the sequence and peptide slots are
described by their respective metadata attributes.
Objects can be created using its constructor Proteins
. The
constructor either takes a fasta
file name as first argument,
an EnsDb
object or a named uniprotIds
argument with valid UniProt accession numbers (not yet implemented).
The Proteins
constructor with the EnsDb
loads protein data
directly from the EnsDb
object. The additional arguments
filter
, loadProteinDomains
, columns
and
fetchLRG
allow to additionally specify if only proteins
matching a certain criteria should be fetched, whether all protein
domains should be added as pranges
, optional additional
annotation columns that should be retrieved and whether proteins from
Locus Reference Genes (LRG) should also be retrieved from the database.
An instance of class Proteins
is characterised by one or
multiple protein sequences that can be accessed as AAStringSet
with the aa
accessor. Sequence-specific annotation, such as
accession numbers, protein and gene names, ... is available with the
acols
method. General metadata such as the
data of creation of the instance are stored as a list
returned
by the metadata
accessor, which would typically contain a
created
character that documents when the object was created, a
reference genome
descriptor, a UniProtRelease
with the
release data of the UniProt database and possibly others.
Each sequence of a Proteins
instance can also be characterised
by a set of specific ranges describing peptides of interest. These
peptide features can be extracted as an AAStringSetList
,
where each protein sequence contains 0 or more peptide features. These
peptides features are encode as ranges along the original proteins
sequences (a list
of IRanges
) that can be extracted with
the pranges
function. These peptide features have their own
metadata describing for example peptide identification scores, number
of missed cleavages, ... available with the pcols
methods.
See also the Pbase-data
vignette.
The Proteins
constructor with argument file
being an
EnsDb
object allows to retrieve protein
sequences along with all their related protein domains from an
EnsDb
annotation database. The optional filter
argument
can be used to fetch only proteins matching the defined filtering criteria
from the database. The filter
argument takes an object
extending the AnnotationFilter
class, an AnnotationFilterList
combining such objects or a filter expression in form of a
formula
. See the
AnnotationFilter
and
proteins
documentation for more details.
Additional annotation columns from the database that should be
retrieved from the database and included into acols
can be
specified with the columns
argument. The
listColumns
can be used to list all available
annotation columns from the database.
Ensembl protein IDs will be used as the names
of the returned
Proteins
object. See the vignette from the ensembldb
package for an overview of supported filters or below for some
examples.
Since version 0.2.0, addIdentificationData
supports multiple
identification file names to be added to a Proteins
instance
(argument renamed filenames
) using either mzID
or
mzR
. Added new Pparams
parametrisation infrastructure.
See news(package = "Pbase")
for a description of all changes.
Other possible metadata fields: Uniprot.sw
, biomaRt
instances.
metadata
: Object of class "list"
containing
global metadata, accessed with metadata
.
aa
: Object of class "AAStringSet"
storing the
protein sequences, accessed with aa
.
.__classVersion__
: Object of class "Versions"
documenting the class verions. Intended for developer use and
debugging.
Class "Versioned"
, directly.
signature(x = "Proteins")
: Returns an
AAStringSet
instance representing the
sequences of the proteins.
signature(x = "Proteins")
: ...
signature(x = "Proteins")
: ...
signature(x = "Proteins")
: Returns a
list
of global metadata of the instance x
, including
data of instance creation or, if created from a set of UnitProt
identifiers (see constructors above), the UniProt version and
UnitProt.WS
version number.
signature(x = "Proteins")
: Returns a
DataFrame
of protein metadata.
signature(x = "Proteins")
: Returns a
list
of feature metadata.
signature(x = "Proteins")
: Returns the
names of the sequences metadata.
signature(x = "Proteins")
: Returns the
names of the peptide feature metadata.
signature(x = "Proteins")
: Returns the
protein sequence names defined as UniProt accession numbers.
signature(x = "Proteins")
: Returns the
protein sequence names defined as UniProt accession numbers.
It is just a synonym for seqnames
.
signature(x = "Proteins")
: Returns the number
of proteins.
signature(x = "Proteins", i = "ANY", j = "missing")
:
Creates a subset of the Proteins
instance.
signature(x = "Proteins", i = "ANY", j = "missing")
:
Returns an AAString
instance representing the
sequence of the selected protein.
signature(x = "Proteins", mass = "numeric", len
= "numeric", ...)
: ...
signature(x = "Proteins", enzym = "character",
missedCleavages = "numeric")
:
Cleaves all proteins using the enzym
rule while allowing
missedCleavages
missing cleavages. Please see
cleave
for details.
signature(object = "Proteins",
id = "character", rmEmptyRanges = "logical", par =
"Pparams")
: Adds identification data from an IdentMzMl file (id
)
to the Proteins object
. If rmEmptyRanges
is TRUE
proteins without any identification data are removed. See
Pparams
for further settings.
signature(object = "Proteins",
filenames = "character", rmEmptyRanges = "logical", par =
"Pparams")
: Adds identification data from a fasta file (filenames
)
to the Proteins object
. Please note that both fasta files (the origin
of the Proteins object
and the ones given in filenames
) must
share the same Uniprot accession numbers.
If rmEmptyRanges
is TRUE
proteins without any identification data are removed. See
Pparams
for further settings.
signature(x = "Proteins", y = "missing")
: Plots
all proteins and associated peptides using the
Gviz
/Pviz
infrastructure.
signature(object = "Proteins")
: Displays object
summary as text.
signature(x = "Proteins")
: removes proteins with empty peptide ranges.
signature(x = "Proteins")
: returns a
modified Proteins object. pcols(x)
gains a "Proteotypic"
logical column, indicating of the peptide is proteotypic or now.
signature(pattern = "Proteins")
:
calulates the coverage of proteins. pcols(x)
gains a
"Coverage" numeric column.
signature(x = "Proteins", missedCleavages =
"numeric")
: Tests whether a Protein object was cleaved
already.
Laurent Gatto <lg390@cam.ac.uk>, Sebastian Gibb <mail@sebastiangibb.de> and Johannes Rainer <johannes.rainer@eurac.edu>
Definition of the UniProt fasta comment format: http://www.uniprot.org/help/fasta-headers
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | ## Create a Protein object reading all proteins from a fasta file.
fastaFiles <- list.files(system.file("extdata", package = "Pbase"),
pattern = "fasta", full.names = TRUE)
p <- Proteins(fastaFiles)
p
metadata(p)
## Adding custom metadata
metadata(p, "Comment") <- "I love R"
metadata(p)
## Plotting
plot(p[1:5], from = 1, to = 30)
## Cleaving
pp <- cleave(p[1:100])
pp <- proteotypic(pp)
pp
pcols(pp[1:2])
plot(pp[1:2], from = 20, to = 30)
## Protein coverage
pp <- proteinCoverage(pp)
avarLabels(pp)
acols(pp)$Coverage
pp
## Add indentification data
idfile <- system.file("extdata/Thermo_Hela_PRTC_selected.mzid",
package = "Pbase")
p <- addIdentificationData(p, idfile)
pranges(p)
pfeatures(p)
plot(p[1])
plot(p[1], # the first protein has 36 peptides
fill = c(rep("orange", 13), rep("steelblue", 13)))
## Retrieve a Proteins object from an EnsDb object: first load the annotation
## database for all human genes defined in Ensembl version 86.
library(ensembldb)
library(EnsDb.Hsapiens.v86)
edb <- EnsDb.Hsapiens.v86
## Define a filter to retrieve all genes from chromosome Y
sqnf <- SeqNameFilter("Y")
## Retrieve the proteins without protein domains but specify to retrieve in
## addition the transcript biotype for the encoding transcripts and the gene
## names
prts <- Proteins(edb, filter = sqnf, loadProteinDomains = FALSE,
columns = c("tx_biotype", "gene_name"))
aa(prts)
acols(prts)
## The listColumns method lists all available columns from the database.
listColumns(edb)
## Load all proteins from the gene ZBTB16 including all protein domains from
## the database. Here we pass the filter criteria as a formula to the method
prts <- Proteins(edb, filter = ~ gene_name == "ZBTB16")
## List available pranges
pcols(prts)
## Access the protein domains
pcols(prts)$ProteinDomains
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.