Description Usage Arguments Details Value Author(s) References See Also Examples
It classifies and aligns the peaks stored in the GRanges object. The method applies the k-mean alignment algorithm with shift of the peaks and distance based on the convex combination of the L^p distances between the spline-smoothed peaks and their derivatives. The order p can be one of 1, 2 and ∞.
1 2 3 4 5 |
object |
GRanges object of length N. It must contain the metadata columns
|
parallel |
logical. If |
num.cores |
integer. If |
n.clust |
integer vector (or scalar). Number of clusters in which the data set
is divided (possibly one, if |
seeds |
vector. Indices of the initial centers of the clusters, needed to initialize the k-mean procedure. The k-mean alignment, like all the k-mean-like algorithms,
is dependent on the choice of the initial centers
of the clusters, and each initialization
of the seeds can generate slightly different results. The
values must be included in 1, …, N. The length of the vector must be equal to
the maximum number of clusters analyzed ( |
shift.peak |
logical. It indicates whether the alignment via a translation of the abscissae
is performed ( |
weight |
real. Weight w of the distance function (see Details for the
definitions of the distance function), needed to
make the distance between splines and derivatives comparable.
If no value is provided (default is w = median d0(i,j)/d1(i,j) with i, j = 1: … N. |
subsample.weight |
integer value. Number of data points used
to define the |
alpha |
real value between 0 and 1. Value of the convex weight α of the distance to balance the distance between data and derivatives. See details for the definition. Default is 1. |
p |
integer value in {0, 1 , 2}. Order of the L^p distance
used. In particular |
t.max |
real value. It tunes the maximum shift allowed. In particular the maximum shift at each iteration is computed as max_shift = t.max * range(object) and the optimum registration coefficient will be identified between - max_shift and
+ max_shift. range( |
plot.graph.k |
logical. If |
verbose |
logical. If |
rescale |
logical. If |
See [Sangalli et al., 2010] and the package vignette for the complete description of the algorithm. The algorithm is completely defined once we fix the family of the warping function for the alignment and the distance function. In this function we focus only on the specific case of
warping functions: shifts with integer coefficients
h(t) = t + c,
with c an integer value;
distance: convex combination of the L^p distance between data and derivatives. The distance between f and g is
d(f, g) = (1 - α) || f - g ||_p + α w || f' - g' ||_p
The choice of || . ||_p corresponds to
the value of p
in input. In particular p = 0
stands for
||.||_L^∞, p = 1
for || . ||_L^1
and p = 2
for || . ||_L^2
the GRanges object
with new metadata columns:
if align
is TRUE
or NULL
, i.e. the
clustering with alignment is performed the following metadata columns are added:
cluster_shift
: for each peak, a vector of length equal to the
maximum number of chosen clusters,
containing at each position k the label of the cluster the peak is
assigned to, when the total number of clusters is k
and alignment is performed during the clustering.
If k is not present in the n.clust
vector, the corresponding
value is NA
.
coef_shift
: for each peak, a vector of length equal to the
maximum number of chosen clusters,
containing at each position k the shift coefficient
assigned to the peak, when the total number of clusters is k
and alignment is performed during clustering.
If k is not present in the vector n.clust
the corresponding
value is NA
.
dist_shift
: for each peak, a vector of length equal to the
maximum number of chosen clusters,
containing at each position k the distance of the specific peak
from the corresponding center of the cluster, when
the total number of clusters is k
and alignment is performed during clustering.
If k is not present in the vector n.clust
the corresponding
value is NA
.
if shift.peak
is FALSE
or NULL
, i.e. clustering is performed
without alignment, the following metadata columns are added:
cluster_NOshift
: for each peak, a vector of length equal to the
maximum number of chosen clusters,
containing at each position k the label of the cluster
the peak is assigned to, when the total number of clusters is k
and no alignment is performed during clustering.
If k is not present in the vector n.clust
the corresponding
value is NA
.
dist_NOshift
: for each peak, vector of length equal to the
maximum number of chosen cluster,
containing at each position k the distance of the peak
from the corresponding center of the cluster , when
the total number of clusters is k
and no alignment is performed during clustering.
If k is not present in the vector n.clust
the corresponding
value is NA
.
Alice Parodi, Marco J. Morelli, Laura M. Sangalli, Piercesare Secchi, Simone Vantini
Sangalli, L. M., Secchi, P., Vantini, S. and Vitelli, V., 2010. K-mean alignment for curve clustering. Computational Statistics and Data Analysis, 54 1219 - 1233.
choose_k
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # load the data
data(peaks)
# cluster and align the data as a
# function of the
# number of cluster (from 1 to 5)
# with and without alignment.
# The automathically generated plot
# can be used to detect the
# optimal number of clusters and the
# classification method to be used
# (with or without alignment)
clustered_peaks <- cluster_peak ( peaks.data.summit, parallel = FALSE ,
n.clust = 1:5, shift.peak = NULL,
weight = 1, alpha = 1, p = 2,
plot.graph.k = TRUE, verbose = TRUE )
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.