
ENHANCING CLUSTER INTERPRETABILITY IN K-MEANS VIA PCA VISUALIZATION, FEATURE NORMALIZATION, AND ROBUST INITIALIZATION METHODS
Tuxtabayev Qudratillo Axmadjanovich ,Ergasheva Shohsanam Elmurod qizi , National University of UzbekistanAbstract
The k-Means algorithm is one of the most widely used unsupervised learning methods for partitioning data into homogeneous groups. This study explores the efficacy of k-Means clustering on three real-world datasets: a synthetic 2D blob dataset, a customer segmentation dataset based on spending behavior, and a global cities dataset based on geographic coordinates. We analyze the influence of data scaling, initial centroid selection (random vs k-means++), and the number of clusters (k) using silhouette score, elbow method, and PCA visualizations. The study also highlights the limitations of k-Means such as sensitivity to outliers and non-convex cluster shapes, offering guidelines for practical implementation.
Keywords
k-Means, unsupervised learning, clustering, silhouette score, elbow method, k-means++, PCA visualization, customer segmentation, geospatial clustering.
References
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 281–297.
Pedregosa, F. et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666.
Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137.
Kassambara, A. (2017). Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning. STHDA.
Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques. Morgan Kaufmann.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience.
Article Statistics
Downloads
Copyright License

This work is licensed under a Creative Commons Attribution 4.0 International License.