clustering

Numerical Optimization Method for Clustering in Content-Based Image Retrieval Systems

The object of the study is the process of organizing a descriptor repository in content-based image retrieval systems. The subject of the study is a method of numerical optimization of descriptor clustering in a multidimensional space. The aim of this work is to develop a clustering optimization method in the Multidimensional Cube model to improve search efficiency. The core idea is to ensure a more uniform distribution of descriptors across clusters by adjusting interval boundaries in each dimension, which reduces imbalance in cluster density and improves retrieval performance.

Development of an Automated Natural Language Text Analysis System Using Transformers

The article is dedicated to the study of the development of an automated medical text analysis system using modern artificial intelligence technologies and natural language processing. The current state and prospects for the development of automated medical text analysis are analyzed. The main methods and technologies used in this field, including machine learning, deep learning, and natural language processing, are examined.

Study of the Effectiveness of Applying the K-Means Method to Decompose Large-Scale Traveling Salesman Problems

The decomposition of the problem is based on clustering the input set of points using the well-known k- means method, combined with an algorithm for extending partial solutions within clusters. k-means clustering algorithm is examined for partitioning the input data set of large-scale TSP instances into smaller subproblems. The efficiency of using it to reduce problem size is substantiated. Based on experiments, the application of a hierarchical version of the algorithm is proposed for problems with more than one million points.

Comparison and Clustering of Textual Information Sources Based on the Cosine Similarity Algorithm

This article presents a study aimed at developing an optimal concept for analyzing and comparing information sources based on large amounts of text information using natural language processing (NLP) methods. The object of the study was Telegram news channels, which are used as sources of text data. Pre-processing of texts was carried out, including cleaning, tokenization and lemmatization, to form a global dictionary consisting of unique words from all information sources.

METHODS OF BUILDING A MODEL OF USER BEHAVIOR

The number of clustering methods and algorithms were analysed and the peculiarities of their application were singled out. The main advantages of density based clustering methods are the ability to detect free-form clusters of different sizes and resistance to noise and emissions, and the disadvantages include high sensitivity to input parameters, poor class description and unsuitability for large data. The analysis showed that the main problem of all clustering algorithms is their scalability with increasing amount of processed data.