nclass: Compute the Number of Classes for a Histogram

nclassR Documentation

Compute the Number of Classes for a Histogram

Description

Compute the number of classes for a histogram.

Usage

nclass.Sturges(x)
nclass.scott(x)
nclass.FD(x)

Arguments

x

a data vector.

Details

nclass.Sturges uses Sturges' formula, implicitly basing bin sizes on the range of the data.

nclass.scott uses Scott's choice for a normal distribution based on the estimate of the standard error, unless that is zero where it returns 1.

nclass.FD uses the Freedman-Diaconis choice based on the inter-quartile range (IQR(signif(x, 5))) unless that's zero where it uses increasingly more extreme symmetric quantiles up to c(1,511)/512 and if that difference is still zero, reverts to using Scott's choice.

Value

The suggested number of classes.

References

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S-PLUS. Springer, page 112.

Freedman, D. and Diaconis, P. (1981). On the histogram as a density estimator: L_2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 57, 453–476. \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1007/BF01025868")}.

Scott, D. W. (1979). On optimal and data-based histograms. Biometrika, 66, 605–610. \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.2307/2335182")}.

Scott, D. W. (1992) Multivariate Density Estimation. Theory, Practice, and Visualization. Wiley.

Sturges, H. A. (1926). The choice of a class interval. Journal of the American Statistical Association, 21, 65–66. \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1080/01621459.1926.10502161")}.

See Also

hist and truehist (package MASS); dpih (package KernSmooth) for a plugin bandwidth proposed by Wand(1995).

Examples

set.seed(1)
x <- stats::rnorm(1111)
nclass.Sturges(x)

## Compare them:
NC <- function(x) c(Sturges = nclass.Sturges(x),
      Scott = nclass.scott(x), FD = nclass.FD(x))
NC(x)
onePt <- rep(1, 11)
NC(onePt) # no longer gives NaN