clustDist: Pairwise Distance Computation for Protein Information Sets
In lgatto/pRoloc: A unifying bioinformatics framework for spatial proteomics

clustDist

R Documentation

Pairwise Distance Computation for Protein Information Sets

Description

This function computes the mean (normalised) pairwise distances for pre-defined sets of proteins.

Usage

clustDist(object, k = 1:5, fcol = "GOAnnotations", n = 5, verbose = TRUE, seed)

Arguments

`object`	An instance of class `"MSnSet"`.
`k`	The number of clusters to try fitting to the protein set. Default is `k = 1:5`.
`fcol`	The feature meta-data containing matrix of protein sets/ marker definitions. Default is `GOAnnotations`.
`n`	The minimum number of proteins per set. If protein sets contain less than `n` instances they will be ignored. Defualt is 5.
`verbose`	A logical defining whether a progress bar is displayed.
`seed`	An optional seed for the random number generator.

Details

The input to the function is a MSnSet dataset containing a matrix appended to the feature data slot identifying the membership of protein instances to a pre-defined set(s) e.g. a specific Gene Ontology term etc.

For each protein set, the clustDist function (i) extracts all instances belonging to the set, (ii) using the kmeans algorithm fits and tests k = c(1:5) (default) cluster components to each set, (iii) calculates the mean pairwise distance for each k tested.

Note: currently distances are calcualted in Euclidean space, but other distance metrics will be supported in the future).

The output is a list of ClustDist objects, one per information cluster. The ClustDist class summarises the algorithm information such as the number of k's tested for the kmeans, and mean and normalised pairwise Euclidean distances per numer of component clusters tested. See ?ClustDist for more details.

Value

An instance of "ClustDistList" containing a "ClustDist" instance for every protein set, which summarises the algorithm information such as the number of k's tested for the kmeans, and mean and normalised pairwise Euclidean distances per numer of component clusters tested.

Author(s)

Lisa Breckels

Examples

library(pRolocdata)
data(dunkley2006)
par <- setAnnotationParams(inputs =
                   c("Arabidopsis thaliana genes",
                   "Gene stable ID"))
## add protein sets/annotation information
xx <- addGoAnnotations(dunkley2006, par)
## filter
xx <- filterMinMarkers(xx, n = 50)
xx <- filterMaxMarkers(xx, p = .25)
## get distances for protein sets 
dd <- clustDist(xx)
## plot clusters for first 'ClustDist' object 
## in the 'ClustDistList'
plot(dd[[1]], xx)
## plot distances for all protein sets 
plot(dd)
## Extract normalised distances
## Normalise by n^1/3
minDist <- getNormDist(dd, p = 1/3)
## Get new order according to lowest distance
o <- order(minDist)
## Re-order GOAnnotations 
fData(xx)$GOAnnotations <- fData(xx)$GOAnnotations[, o]
if (interactive()) {
pRolocVis(xx, fcol = "GOAnnotations")
}

lgatto/pRoloc documentation built on Feb. 20, 2025, 10:45 p.m.