An Adaptive Hybrid Clustering Framework Integrating K-Means and Differential Evolution for High-Dimensional Data Analysis

Authors

Siti Nur Afiqah binti Ruslan Universitas Teknologi Malaysia

DOI:

https://doi.org/10.55642/eatij.v8i01.1261

Keywords:

differential evolution; K-Means clustering; high-dimensional data; adaptive parameter control; metaheuristic optimisation; centroid-based clustering; data mining

Abstract

Clustering high-dimensional data remains a foundational yet persistently challenging problem in unsupervised machine learning, primarily because the performance of centroid-based methods such as K-Means degrades sharply in high-dimensional spaces due to local optima sensitivity and the curse of dimensionality. This paper proposes an Adaptive Hybrid Clustering Framework (AHCF) that integrates K-Means with Differential Evolution (DE) optimisation to systematically overcome K-Means's dependence on initial centroid placement in high-dimensional settings. The proposed framework introduces three novel components: (1) an adaptive mutation factor (F) governed by a monotonically decreasing annealing schedule that transitions from broad global exploration (F=0.90) to fine local exploitation (F=0.40) across generations; (2) an adaptive crossover probability (CR) that increases linearly from 0.50 to 0.90, progressively favouring population diversity as the search converges; and (3) a centroid refinement step that projects each DE trial solution back to the cluster mean, ensuring geometrically valid centroid positions throughout the evolutionary search. Experiments on a synthetically generated high-dimensional dataset (n=1,500, d=32, k=5) demonstrate that AHCF achieves a Silhouette Score of 0.6127, Davies-Bouldin Index of 0.5023, and Calinski-Harabasz Index of 2834.6 — improvements of 2.7%, 7.2%, and 6.9% respectively over the strong K-Means baseline (n_init=20). The proposed adaptive mechanism delivers a 75.2% reduction in Within-Cluster Sum of Squares (from 22,516 to 5,592) and achieves faster convergence compared to a static parameter equivalent. These results establish AHCF as a robust, theoretically grounded, and practically deployable framework for high-dimensional clustering tasks in data mining and machine learning applications.

Downloads

Download data is not yet available.

References

Aggarwal, C. C., Hinneburg, A., & Keim, D. A. (2001). On the surprising behavior of distance metrics in high dimensional space. In J. Van den Bussche & V. Vianu (Eds.), Database theory — ICDT 2001, Lecture Notes in Computer Science (Vol. 1973, pp. 420–434). Springer. https://doi.org/10.1007/3-540-44503-X_27

Ahmad, I., & Hashim, F. A. (2022). An adaptive differential evolution—particle swarm optimisation approach for clustering problems. Engineering Applications of Artificial Intelligence, 109, 104629. https://doi.org/10.1016/j.engappai.2021.104629

Aloise, D., Deshpande, A., Hansen, P., & Popat, P. (2009). NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2), 245–248. https://doi.org/10.1007/s10994-009-5103-0

Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J. M., & Perona, I. (2013). An extensive comparative study of cluster validity indices. Pattern Recognition, 46(1), 243–256. https://doi.org/10.1016/j.patcog.2012.07.021

Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (pp. 1027–1035). SIAM.

Bellman, R. E. (1961). Adaptive control processes: A guided tour. Princeton University Press.

Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is 'nearest neighbor' meaningful? Proceedings of ICDT 1999, Lecture Notes in Computer Science (Vol. 1540, pp. 217–235). Springer. https://doi.org/10.1007/3-540-49257-7_15

Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3(1), 1–27. https://doi.org/10.1080/03610927408827101

Celebi, M. E., Kingravi, H. A., & Vela, P. A. (2013). A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems with Applications, 40(1), 200–210. https://doi.org/10.1016/j.eswa.2012.07.021

Das, S., Abraham, A., Chakraborty, U. K., & Konar, A. (2008). Differential evolution using a neighborhood-based mutation operator. IEEE Transactions on Evolutionary Computation, 13(3), 526–553. https://doi.org/10.1109/TEVC.2008.2009457

Das, S., & Suganthan, P. N. (2011). Differential evolution: A survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation, 15(1), 4–31. https://doi.org/10.1109/TEVC.2010.2059031

Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 224–227. https://doi.org/10.1109/TPAMI.1979.4766909