Description Usage Arguments Details Value Author(s) References Examples
It approximates the read counts associated to every peak with a suitable B-spline function, so that a smoothing representation of the peaks is obtained. The first derivative of the spline is also computed. To obtain a smooth representation, the peak is extended and new initial and final points are identified. See the Vignette of the FunChIP package for a graphical representation of the spline approximation.
1 2 3 4 5 |
object |
GRanges object. It must contain the metadata column |
n.breaks |
integer. Number of breaks, or knots, for the B-spline basis domain definition.
Default is |
subsample |
logical. If |
subsample.data |
integer. Number of data used for the
cross-validation (if |
order |
integer. Order of the B-spline basis used for the smoothing. The order is one higher than the degree of the spline. Default is 4 (cubic splines). |
lambda |
vector (or single value). Contains all the possible values of the smoothing parameter to be considered for the final choice. If a single value is provided, this will be automatically chosen for the smoothing. Default value is 10^{\textrm{\code{seq(-5,5,by=0.5)}}} to analyze a sufficiently wide set of values. See details below. |
GCV.derivatives |
logical. If |
plot.GCV |
logical. If |
rescale |
logical. If |
It creates a piece-wise polynomial of fixed order
s
approximating the data (B-spline expansion, Ramsay and Silverman, 2005).
Given the point wise defined function f: (x,f(x)), the
smooth_peak
method returns
the evaluation of s on the x
grid (s(x)) minimizing, for a fixed λ,
ERR(λ) = || f - s ||^2_{L^2} + λ ||s''||^2_{L_2}
, with s'' being the second derivative of the function s and ||s||_{L^2} the L^2 norm of the function, i.e. the integral on the domain of s of s^2.
The choice of λ is crucial for the definition of the spline, and it can be selected by minimizing the Generalized Cross-Validation index
GCV(λ) = (n SSE)/(n-df(λ))^2
, with SSE the error computed as
SSE = || f - s ||^2_{L^2}
,
if GCV.derivatives = FALSE
, or
SSE = || grad(f) - s' ||^2_{L^2}
,
if GCV.derivatives = TRUE
, and df(λ) is the
number of the degrees of freedom of the basis expansion automatically
computed from s. For further details on the cross-validation procedure
and on the computation of the number of degrees of freedom see Ramsay and Silverman, 2005.
If plot.GCV
is TRUE
, the plot of the GCV index as a function of λ
is presented, which can be used to identify the optimal value of the parameter. If the plot
is decreasing in λ, one could consider to increase the allowed values of λ
to find the minimum of the curve.
the GRanges object
with new metadata columns:
width_spline
integer. Value containing the width of the smoothed peak,
i.e. the number of non-zero values of the spline approximation.
This value is not necessarily equal to the original width of the peak, as the approximation
can stretch outside the original width of the peak: to ensure smoothness some 0 values can be
introduced at the edges of the region.
spline
vector. Evaluation of the spline on the grid of size width_spline
.
spline_der
vector. Evaluation of the derivatives
of the spline on the grid of size width_spline
.
start_spline
integer. Genomic coordinate of the initial point of the
spline approximation.
end_spline
integer. Genomic coordinate of the final point of the
spline approximation.
If rescale
is TRUE
two more metadata columns are added:
spline_rescaled
vector. Evaluation of the scaled peaks functions on
a grid of width equal to the minimum of width_spline
.
spline_der_rescaled
vector. Evaluation of the derivatives of the
scaled peaks on a grid of width equal to the minimum of width_spline
.
Alice Parodi, Marco J. Morelli, Laura M. Sangalli, Piercesare Secchi, Simone Vantini
Ramsay, J.O., Silverman, B.W., 2005. Functional Data Analysis, 2nd ed. Springer, New York, NY.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # load the data
data(peaks)
# it computes the spline approximation
# of the pealks given the
# GRange with the metadata counts.
# It is obtained by the pileup_peak method
# Default paramters are used: GCV is
# computed on the derivatives.
peaks.spline <- smooth_peak(peaks.data, lambda = 10^(-4:6),
subsample.data = 50, GCV.derivatives = TRUE )
peaks.spline.scaled <- smooth_peak(peaks.data, lambda = 10^(-4:6),
subsample.data = 50, GCV.derivatives = TRUE, rescale = TRUE )
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.