kmeans | bellplay~

`kmeans`

kmeans(
    @numclusters 4
    @centroids null
    @encoding 0.25
    @maxiter 100
    @mode 0
    @useseed 0
) -> llll

Generates a K-Means Clustering object for clustering/classification tasks. kmeans facilitates learning of clusters from a dataset. This allows you to assign each point in the data a discrete membership to a group or cluster. The algorithm works by partitioning points into discrete clumps that ideally have equal variance . For a more technical explanation, visit https://scikit-learn.org/stable/modules/clustering.html#k-means.

Arguments

@numclusters [int]: The number of clusters to classify data into. (default: 4).
@centroids [null/llll]: A list of cluster centroids to initialize or seed the algorithm, as a N x M matrix, where N is the number of clusters, and M is the data dimensions. (default: null).
@encoding [float]: The encoding threshold (i.e., the alpha parameter). When used for feature learning, this can be used to produce sparser output features by setting the least active output dimensions to 0. Ignored if @mode is 0. (default: 0.25).
@maxiter [int]: The maximum number of iterations the algorithm will use whilst fitting. (default: 100).
@mode [int]: KMeans clustering mode. (default: 0).
- 0: Euclidean (standard) KMeans, minimizing squared distance between points and cluster centers.
- 1: Spherical, clustering by cosine similarity after normalizing all vectors to unit length.
@useseed [int]: Use random seed for parameter initialization. (default: 0).
- 0: Off
- 1: On

Output

KNN object [llll]

Usage

$numcentroids = 4; ## number of clusters
## dummy 2D centroids
$centroids = for $i in 1...4 collect [randn() randn()];
$dataset = dataset(
    ## data points near fake centroids
    for $i in 1...300 collect (
        $c = choose($centroids)::1;
        $point = $c + (randn() randn()) * 0.25;
        [$point] 
    ) 
);
## create kmeans instance
$model = kmeans(@numclusters $numcentroids);
## training loop
for $i in 1...10 do fit($model, $dataset);
## get predicted centroids for each point in dataset
$labelset = predict($model, $dataset);
## convert dataset and labelset into SQL table for visualization
dataset2dbtable(
    @dataset $dataset
    @labelset $labelset
    @labelfield 'cluster' 
);
## trigger visualization
browsedbtable(
    @colorfield 'cluster' @shapefield 'cluster' 
)

kmeans​

Arguments​

Output​

Usage​

`kmeans`

Arguments

Output

Usage