Big data clustering through fusion of FCM, optimized encoder-decoder CNN, and BiLSTM

F. Belhabib; K. El Moutaouakil; S. Rbihou; A. Elafaar

Clustering Big Data, as a fundamental component in the processing and analysis of massive datasets, holds crucial importance in addressing complex challenges inherent in handling extensive data sets. Falling within the realm of unsupervised learning methods, the primary objective of clustering is to efficiently organize substantial datasets into homogeneous clusters without relying on pre-existing labels. Our innovative approach seeks to optimize this process by synergistically combining three techniques: the fuzzy C-Means (FCM) methodology, the optimized encoder–decoder CNN model, and the bidirectional recurrent neural network (BiLSTM). This synergy represents a strategic convergence between supervised and unsupervised paradigms. The introduction of BiLSTM is of significant importance, leveraging its capability to sequentially process data from both sides using LSTM cells. This bidirectional approach enhances the understanding of data sequences, a crucial feature in the demanding context of Big Data clustering. Simultaneously, FCM benefits from substantial improvement through the introduction of a function that calculates the separation between the cluster center and the instance, thereby reinforcing the precision of clustering. To optimize performance and reduce computation time, our methodology advocates for the use of the Optimized Encoder–Decoder CNN model. This refined architecture promotes more efficient extraction of data features, thereby enhancing the intrinsic quality of clustering. The rigorous evaluation of our approach revolves around specific data sources, namely fashion MNIST. Performance criteria such as accuracy, adjusted rand index (ARI), and normalized mutual information (NMI) convincingly attest to the remarkable capability of our methodology. In comparative analyses, our approach significantly outperforms existing models, demonstrating its effectiveness and relevance in the complex domain of Big Data clustering.

fuzzy C-Means (FCM)

кластеризація

optimized encoder-decoder

bidirectional recurrent neural network (BiLSTM)

Han J., Kamber M., Pei J. Mining: Concepts and Techniques. Morgan Kaufmann (2011).
Chandola V., Banerjee A., Kumar V. Anomaly detection: A survey. ACM Computing Surveys. 41 (3), 1–58 (2009).
Yeganejou M., Dick S. Classification via Deep Fuzzy c-Means Clustering. 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). 1–6 (2018).
Rajesh T., Malar R. S. M. Rough set theory and feed-forward neural network-based brain tumor detection in magnetic resonance images. International Conference on Advanced Nanomaterials, Emerging Engineering Technologies. 240–244 (2013).
Kuznietsov S., Chen Q. C., Wang X. L. Semisupervised deep learning for monocular depth map prediction. Preprint arXiv:1702.02706 (2017).
Venkat R., Reddy K. S. Dealing with big data using fuzzy c-means (FCM) clustering and optimizing with gravitational search algorithm (GSA). 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI). 465–467 (2019).
Venkat R., Reddy K. S. Clustering of huge data with fuzzy c-means and applying gravitational search algorithm for optimization. International Journal of Recent Technology and Engineering. 8 (5), 3206–3209 (2020).
Siebel N. T., Maybank S. J. Fusion of Multiple Tracking Algorithms for Robust People Tracking. Computer Vision – ECCV 2002. 373–387 (2002).
Riaz S., Arshad A., Jiao L. C. Fuzzy rough C-mean based unsupervised CNN clustering for large-scale image data. Applied Sciences. 8 (10), 1869 (2018).
Zhou S., Chen Q., Wang X. Fuzzy deep belief networks for semi-supervised sentiment classification. Neurocomputing. 131, 312–322 (2014).
Tarvainen A., Valpola H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems. 1196–1205 (2014)
Aqel F., Alaa K., Alaa N. E., Atounti M. Hybridization of Divide-and-Conquer technique and Neural Network algorithm for better contrast enhancement in medical images. Mathematical Modeling and Computing. 9 (4), 921–935 (2022).
Zhang T., Lu H., Li S. Z. Learning semantic scene models by object classification and trajectory clustering. 2009 IEEE Conference on Computer Vision and Pattern Recognition. 1940–1947 (2009).
El Moutaouakil K., Ahourag A., Chakir S., Kabbaj Z., Chellack S., Cheggour M., Baizri H. Hybrid firefly genetic algorithm and integral fuzzy quadratic programming to an optimal Moroccan diet. Mathematical Modeling and Computing. 10 (2), 338–350 (2023).
Hochreiter S., Schmidhuber J. Long short-term memory. Neural Computation. 9 (8), 1735–1780 (1997).
Little R., Rubin D. Statistical Analysis with Missing Data. Wiley (2019).
Patcha A., Park J.-M. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Computer Networks. 51 (12), 3448–3470 (2007).
Bezdek J. C. Fuzzy Algorithms for Perceptual Grouping. Computer Vision for Robots. Academic Press (1984).
Bezdek J. C. Fuzzy mathematics in pattern classification: A critique and some recommendations. Pattern Recognition Letters. 2 (3), 173–183, 3448–3470 (1984).
LeCun Y., Bengio Yo., Hinton G. Deep learning. Nature. 521 (7553), 436–444 (2015).
Hodge V. J., Austin J. A survey of outlier detection methodologies. Artificial Intelligence Review. 22 (2), 85–126 (2004).
Batista G. E., Monard M. C. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence. 17 (5–6), 519–533 (2003).