skKMeans: interface to sklearn.cluster.KMeans using basilisk discipline
In vjcitn/BiocSklearn: interface to python sklearn via Rstudio reticulate

skKMeans

R Documentation

interface to sklearn.cluster.KMeans using basilisk discipline

Description

interface to sklearn.cluster.KMeans using basilisk discipline

Usage

skKMeans(mat, ...)

Arguments

`mat`	a matrix-like datum or reference to such
`...`	arguments to sklearn.cluster.KMeans

Value

a list with cluster assignments (integers starting with zero) and asserted cluster centers.

Note

This is a demonstrative interface to the resources of sklearn.cluster. In this particular interface, we are using sklearn.cluster.k_means_.KMeans. There are many other possibilities in sklearn.cluster: _dbscan_inner, feature_agglomeration, hierarchical, k_means, k_means_elkan, affinity_propagation, bicluster, birch, dbscan, hierarchical, k_means, mean_shift, setup, spectral.

Basilisk discipline has not been used for this function, 1 June 2022.

Examples

irloc = system.file("csv/iris.csv", package="BiocSklearn")
np = reticulate::import("numpy", delay_load=TRUE, convert=FALSE)
h5py = reticulate::import("h5py", delay_load=TRUE)
irismat = np$genfromtxt(irloc, delimiter=',')
ans = skKMeans(irismat, n_clusters=2L)
names(ans) # names of available result components
table(iris$Species, ans$labels)
# now use an HDF5 reference
irh5 = system.file("hdf5/irmat.h5", package="BiocSklearn")
fref = h5py$File(irh5)
ds = fref$`__getitem__`("quants") 
ans2 = skKMeans(np$array(ds)$T, n_clusters=2L) # HDF5 matrix is transposed relative to python array layout!  Is the np$array conversion unduly costly?
table(ans$labels, ans2$labels)
ans3 = skKMeans(np$array(ds)$T, 
   n_clusters=8L, max_iter=200L, 
   algorithm="lloyd", random_state=20L)
dem = skKMeans(iris[,1:4], n_clusters=3L, max_iter=100L, algorithm="lloyd",
   random_state=20L)
str(dem)
tab = table(iris$Species, dem$labels)
tab
plot(iris[,1], iris[,3], col=as.numeric(factor(iris$Species)))
points(dem$centers[,1], dem$centers[,3], pch=19, col=apply(tab,2,which.max))

vjcitn/BiocSklearn documentation built on Aug. 30, 2024, 2:13 p.m.