опрацювання природної мови

STUDIES OF REPETITIVENESS FOR THE SIMPLEST RANDOM NATURAL LANGUAGE MODELS

The article addresses a currently important problem of natural language processing, the development of methods for assessing repetitiveness in textual documents and the empirical clarification of the resources of these methods for analyzing the presence of semantic load in texts. So far, the approaches based on the laws of statistical linguistics such as Zipf’s, Pareto’s and Heaps’ laws have been mainly used for this aim, as well as the analysis of word-clustering phenomena and long-term correlations of word tokens.

A Pattern Search Algorithm in Graph Representations of Textual Data for an Ontology Construction System

The article presents the development and formalization of an algorithm for pattern matching in graph representations of textual data as a core component of syntactic-semantic transformations for ontology construction from text documents. The study aims to bridge the gap between natural language processing and formal logic by introducing a universal SPARQL-based approach for executing transf- ormation rules directly on graph database servers.

SED-UA-Small: Ukrainian Synthetic Dataset for Text Embedding Models

This paper presents Small Synthetic Embedding Dataset, a fully synthetic dataset in Ukrainian designed for training, fine-tuning, and evaluating text embedding models. The use of large language models (LLMs) allows for controlling the diversity of generated data in aspects such as NLP tasks, asymmetry between queries and documents, the presence of instructions, support for various languages, and avoidance of social biases. A zero-shot generation approach was used to create a set of Ukrainian query-documents pairs with corresponding similarity scores.

Development of an Automated Natural Language Text Analysis System Using Transformers

The article is dedicated to the study of the development of an automated medical text analysis system using modern artificial intelligence technologies and natural language processing. The current state and prospects for the development of automated medical text analysis are analyzed. The main methods and technologies used in this field, including machine learning, deep learning, and natural language processing, are examined.

Development of a Unified Output Format for Text Parsers in the Ontology Construction System From Text Documents

The challenge of effectively constructing ontologies from text documents remains unresolved, posing a critical gap in modern knowledge extraction methodologies. One of the primary obstacles is the lack of a standardized output format across different NLP tools, particularly text parsers, which serve as the foundational step in multi-stage knowledge extraction processes. While several widely used text parsers exist, each excels in specific functions, making it beneficial to leverage multiple parsers for more comprehensive ontology construction.

Comparison and Clustering of Textual Information Sources Based on the Cosine Similarity Algorithm

This article presents a study aimed at developing an optimal concept for analyzing and comparing information sources based on large amounts of text information using natural language processing (NLP) methods. The object of the study was Telegram news channels, which are used as sources of text data. Pre-processing of texts was carried out, including cleaning, tokenization and lemmatization, to form a global dictionary consisting of unique words from all information sources.

DECISION SUPPORT SYSTEM FOR DISINFORMATION, FAKES AND PROPAGANDA DETECTION BASED ON MACHINE LEARNING

Due to the simplification of the processes of creating and distributing news via the Internet, as well as due to the physical impossibility of checking large volumes of information circulating in the network, the volume of disinformation and fake news distribution has increased significantly. A decision support system for identifying disinformation, fakes and propaganda based on machine learning has been built. The method of news text analysis for identifying fakes and predicting the detection of disinformation in news texts has been studied.

Intelligent Fake News Prediction System Based on NLP and Machine Learning Technologies

The article describes a study of identification of fake news based on natural language processing, big data analysis and deep learning technology. The developed system automatically checks the news for signs of fake news, such as the use of manipulative language, unverified sources and unreliable information. Data visualization is implemented on the basis of a friendly user interface that displays the results of news analysis in a convenient and understandable format.

Intelligent System for Complex Military Information Analysis Based on Machine Learning and NLP to Assist Tactical Links Commanders

 The article describes the results of research into the processes of complex analysis of military information based on machine learning and natural language processing to help commanders of tactical units. The system should allow users to have the following capabilities: combining the dictionary and information material, adding terms and abbreviations to the dictionary, classifying objects for radio technical intelligence, visualizing aerial objects, classifying aerial objects, using information materials, organizing information materials.

Data Set Formation Method for Checking the Quality of Learning Language Models of the Transitive Relation in the Logical Conclusion Problem Context

A method for data set formation has been developed to verify the ability of pre-trained models to learn transitivity dependencies. The generated data set was used to test the quality of learning the transitivity dependencies in the task of natural language inference (NLI). Testing of a data set with a size of 10,000 samples (MultiNLI) used to test the RoBerta model.