Kmeans from scratch
Kmeans is an incredibly simple algorithm, with a concepts that can easily be explained to anyone. But in order to understand something better sometimes is nice to build it from scratch. R and python both have some amazing libraries for clustering, but the beauty of Kmeans lies in the fact in that is so simple that its implementation can fit into a tutorial like this one.
Let’s refresh what Kmeans is
On Introduction to statistical learning with R is stated that:
Kmeans clustering is a simple and elegant approach for partitioning a data set into K distinct, nonoverlapping clusters
Kmeans basically is to group observation based on similarity. Here are the rules to understand it:
 Centroids is almost always a value calculated based on the mean. (Except in the initial stage)
 There is K number of cluster to classify the data, usually this requires son prior knowledge.
 There is some distance that can calculate the difference between one observation and the another. (Sometimes is refer distance a similarity).
 There is no overlapping between cluster, so if an observation is in cluster A it cannot be in cluster B at the same time.
The algorithm goes as follow (From ISLR)

Randomly assign a number, from 1 to K, to each of the observations. These serve as initial cluster assignments for the observations.

Iterate until the cluster assignments stop changing

For each of the K clusters, compute the cluster centroid. The kth cluster centroid is the vector of the p feature means for the observations in the kth cluster

Assign each observation to the cluster whose centroid is closest (where closest is defined using Euclidean distance).

Looks doable right? Well let’s code it.
The Code in R
The distance function definition is crucial, it’s the core of Kmeans along with the concept of centroids. For the purpose of this example, Euclidean distance will be used, but there is no reason to not used any other distance function, in fact based on the data many times Euclidean distance may not be suitable.
Super easy a one liner. Not much to see, move along.
Now the fun part of defining kmeans function:
The signature reflect the three minimum requirement for kmeans.
 K number of clusters
 Data to be clustered.
 Number of iteration to run the algorithm . (This one could be optional but we need to stop at some point so is wise to use it)
Initially the centroid are pick at random, this step is one of the most important, for this implementation the selection of initial centroid will be based on any element of the data, but it does not necessarily has to be the case.
Then the first iteration to initialize the observation and group them based on the randomly selected centroid
By plotting the results we get the following. Blue dots reflect the centroids_points. It is obvious how the distance function is grouping each observation based on the proximity to the centroid.
Now the rest of the algorithm is rather trivial, to update the centroid is the mean of each cluster, this will allow the centroid to move towards the center of its grouped observation, and then regroup new observations.
So here there are two condition for the algorithm to finally finish. Either the centroid are converging which mean there is no change between the last centroid and the current, or the maximum iteration has been reached.