skKMeans | R Documentation |
interface to sklearn.cluster.KMeans using basilisk discipline
skKMeans(mat, ...)
mat |
a matrix-like datum or reference to such |
... |
arguments to sklearn.cluster.KMeans |
a list with cluster assignments (integers starting with zero) and asserted cluster centers.
This is a demonstrative interface to the resources of sklearn.cluster. In this particular interface, we are using sklearn.cluster.k_means_.KMeans. There are many other possibilities in sklearn.cluster: _dbscan_inner, feature_agglomeration, hierarchical, k_means, k_means_elkan, affinity_propagation, bicluster, birch, dbscan, hierarchical, k_means, mean_shift, setup, spectral.
Basilisk discipline has not been used for this function, 1 June 2022.
irloc = system.file("csv/iris.csv", package="BiocSklearn")
np = reticulate::import("numpy", delay_load=TRUE, convert=FALSE)
h5py = reticulate::import("h5py", delay_load=TRUE)
irismat = np$genfromtxt(irloc, delimiter=',')
ans = skKMeans(irismat, n_clusters=2L)
names(ans) # names of available result components
table(iris$Species, ans$labels)
# now use an HDF5 reference
irh5 = system.file("hdf5/irmat.h5", package="BiocSklearn")
fref = h5py$File(irh5)
ds = fref$`__getitem__`("quants")
ans2 = skKMeans(np$array(ds)$T, n_clusters=2L) # HDF5 matrix is transposed relative to python array layout! Is the np$array conversion unduly costly?
table(ans$labels, ans2$labels)
ans3 = skKMeans(np$array(ds)$T,
n_clusters=8L, max_iter=200L,
algorithm="lloyd", random_state=20L)
dem = skKMeans(iris[,1:4], n_clusters=3L, max_iter=100L, algorithm="lloyd",
random_state=20L)
str(dem)
tab = table(iris$Species, dem$labels)
tab
plot(iris[,1], iris[,3], col=as.numeric(factor(iris$Species)))
points(dem$centers[,1], dem$centers[,3], pch=19, col=apply(tab,2,which.max))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.