K-Means Algorithm: An Unsupervised Clustering Approach using Various Similarity/Dissimilarity Measures

Patel, SS and Kumar, N and Aswathy, J and Vaddadi, SK and Akbar, SA and Panchariya, PC (2021) K-Means Algorithm: An Unsupervised Clustering Approach using Various Similarity/Dissimilarity Measures. In: 4th International Conference on Intelligent Sustainable Systems (ICISS-2021), February 26-27, 2021, SCAD College of Engineering and Technology, Tirunelveli, India.

Download (1375Kb) | Preview


Clustering is an unsupervised method of classifying data objects into similar groups based on some features or properties usually known as similarity or dissimilarity measures. K-Means is one of the most popular method of clustering falls under the category of hard clustering. In this clustering method, any data object can belong to a single cluster. On the other hand, in soft clustering methods (e.g. fuzzy c-means clustering), the data object can be clustered in more than one cluster with some degree which is specified by the membership value with limitation imposed as the summation of these membership values should he equal to one. Although K-Means clustering technique is fairly old approach but still enjoy immense popularity in terms of being used in data grouping applications and machine learning. In this paper K-Means approach with five different distance measures e.g. Euclidean, Squared Euclidean, Half Squared Euclidean. Cosine and City Block distance has been explored and a comparative study is made based on the performance of these similarity criterions on real time Edible oil dataset acquired using MIR spectroscopy. Furthermore, it is also tried to investigate which similarity measure performs well for a particular set of data carrying unique pattern. The K-Means algorithm with various similarity-dissimilarity measures have been formulated and implemented in MATLAB R2015b environment provided by Mathworks.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Supervised/Unsupervised learning, Clustering, Classification, K-Means clustering, Similarity and Dissimilarity measures
Subjects: Electronic Systems > Digital Systems
Divisions: Electronic Systems
Depositing User: Mr. Jitendra Nath Bajpai
Date Deposited: 14 Sep 2021 07:14
Last Modified: 14 Sep 2021 07:14
URI: http://ceeri.csircentral.net/id/eprint/571

Actions (login required)

View Item View Item