clustering

Development of an Automated Natural Language Text Analysis System Using Transformers

The article is dedicated to the study of the development of an automated medical text analysis system using modern artificial intelligence technologies and natural language processing. The current state and prospects for the development of automated medical text analysis are analyzed. The main methods and technologies used in this field, including machine learning, deep learning, and natural language processing, are examined.

Study of the Effectiveness of Applying the K-Means Method to Decompose Large-Scale Traveling Salesman Problems

The decomposition of the problem is based on clustering the input set of points using the well-known k- means method, combined with an algorithm for extending partial solutions within clusters. k-means clustering algorithm is examined for partitioning the input data set of large-scale TSP instances into smaller subproblems. The efficiency of using it to reduce problem size is substantiated. Based on experiments, the application of a hierarchical version of the algorithm is proposed for problems with more than one million points.

Comparison and Clustering of Textual Information Sources Based on the Cosine Similarity Algorithm

This article presents a study aimed at developing an optimal concept for analyzing and comparing information sources based on large amounts of text information using natural language processing (NLP) methods. The object of the study was Telegram news channels, which are used as sources of text data. Pre-processing of texts was carried out, including cleaning, tokenization and lemmatization, to form a global dictionary consisting of unique words from all information sources.

METHODS OF BUILDING A MODEL OF USER BEHAVIOR

The number of clustering methods and algorithms were analysed and the peculiarities of their application were singled out. The main advantages of density based clustering methods are the ability to detect free-form clusters of different sizes and resistance to noise and emissions, and the disadvantages include high sensitivity to input parameters, poor class description and unsuitability for large data. The analysis showed that the main problem of all clustering algorithms is their scalability with increasing amount of processed data.