K -means Clustering
Popular unsupervised machine learning algorithm K-means clustering is used to cluster or group together comparable data points. It is extensively used in many different uses, including document clustering, market segmentation, and image processing.
K-means clustering is basically a vector quantization method where we aim to partition “n” observations into “k” clusters in which each cluster with the nearest mean, which serves as the prototype of the cluster. Before we dive into k-means let us know what clustering is.
What is Clustering?
Clustering is a classification of objects into different groups with similar characteristics. We may say that it is a partition mechanism of a data set into subsets or clusters so that each subset (ideally) will have a common trait.
Types of Clustering
- Hierarchical Algorithms
- Partitional Clustering
K- menas clustering is a type of Partitional Clustering.
hear we put “n” objects in “k” partitions where k<n.
This is practically formulated like an algorithm. Grouping is done by minimizing the sum of squares of distances between the data and the corresponding cluster centroid. The clustering is subjective.
Methodology:
- Select the number of clusters, your data must identify in. Say “3”
- Randomly select “3” distinct data points.
- Measure the distance between 1st point and the 3 initial clusters
- Assign the first point to the nearest cluster and proceed to do the same for the next points
- Calculate the mean for each cluster.
This blog post was just a basic outline of k-means clustering. I don't know why I found this concept similar to the “Pigeon Hole Principal”.
This algorithm is a very powerful algorithm that gives efficiency, flexibility, scalability, and overall better results than many others.
However, K-means clustering has some drawbacks as well. It may not perform well when the data is highly dimensional or when there are overlapping clusters because, for instance, it believes that the clusters are spherical and equal in area. The algorithm might reach a local rather than a global optimum, and the outcomes might be sensitive to the original centroids’ placement.
End note: Choose the algorithm tailored to your problem. Not necessarily all algorithms will be high-end, sophisticated, and will solve your problem so it is better to pick the algorithm which suits your problem. On that note, I will end my Blog post for today. Meet you next Monday!
Till then take care❤