natural language processing

Controlled Synthesis and Hierarchical Structuring of Ukrainian Datasets

The article addresses the urgent scientific and practical problem of overcoming the "cold start" effect in the development and deployment of Natural Language Processing (NLP) systems aimed at monitoring public opinion and sentiment analysis of Ukrainian-language content. The critical shortage of representative, balanced, and pre-labeled datasets that accurately reflect the specifics of social, economic, and political processes in modern Ukrainian society is identified as the main barrier to integrating advanced neural network solutions.

Analysis of Repetitiveness Parameters for Natural Language and Random Texts

This article highlights a new approach to the analysis of text documents by clarifying the quantitative patterns of their repetitiveness. For this aim, software was developed for analyzing the repetitiveness characteristic $v(t)$ and a number of calculation modes were suggested. The behavior of the $v(t)$ function for natural language texts was clarified in all the calculation modes and the resources of this approach for identifying ‘excessive’ repetitions of fragments in text documents were clarified.

Words Matter – Using Machine Learning to Verify How Words Absolve or Condemn Defendants⋆

Legal prediction is one of the most critical subfields in Natural Language Processing. The researchers use state-of-the-art machine learning and artificial intelligence methodologies to predict specific judicial facets, such as the judicial outcome. For this research, we have built a web text crawler to extract homicide data cases from Brazilian electronic legal systems.

Data Protection in the Utilization of Natural Language Processors for Trend Analysis and Public Opinion: Cryptographic Aspect

In the digital age, the significant increase in information generation and processing is accompanied by a growing threat of unauthorized access, illegal distribution, and use. One of the most promising strategies for protecting information from various cyber threats and malicious attacks is the use of Natural Language Processing (NLP) processors. This article focuses on the methodology of data protection in the context of utilizing Natural Language Processing for sentiment analysis and trend detection.

PREDICTION OF AN INDIVIDUAL’S EMOTIONAL STATE BASED ON TEXTUAL DATA USING BERT AND PAD MODELS

This paper examines the problem of predicting a user’s multidimensional emotional state from textual records under conditions where most existing text-based approaches emphasize either categorical emotion recognition or coarse sentiment polarity, which limits the interpretability of broader affective assessment.

MATHEMATICAL MODEL OF ERRORS IDENTIFICATION IN TEXTS OF UKRAINIAN CONTENT

The problem of automated error detection in Ukrainian texts is becoming particularly relevant in the context of the growth of digital content. A mathematical model of a decision support system for detecting errors in Ukrainian-language texts has been developed. The process of error identification has been studied as a multi-class classification task at the token level, considering the context of the text. The use of probabilistic models has been proposed to determine the type of error depending on the environment of tokens in the text.

STUDIES OF REPETITIVENESS FOR THE SIMPLEST RANDOM NATURAL LANGUAGE MODELS

The article addresses a currently important problem of natural language processing, the development of methods for assessing repetitiveness in textual documents and the empirical clarification of the resources of these methods for analyzing the presence of semantic load in texts. So far, the approaches based on the laws of statistical linguistics such as Zipf’s, Pareto’s and Heaps’ laws have been mainly used for this aim, as well as the analysis of word-clustering phenomena and long-term correlations of word tokens.

A Pattern Search Algorithm in Graph Representations of Textual Data for an Ontology Construction System

The article presents the development and formalization of an algorithm for pattern matching in graph representations of textual data as a core component of syntactic-semantic transformations for ontology construction from text documents. The study aims to bridge the gap between natural language processing and formal logic by introducing a universal SPARQL-based approach for executing transf- ormation rules directly on graph database servers.

SED-UA-Small: Ukrainian Synthetic Dataset for Text Embedding Models

This paper presents Small Synthetic Embedding Dataset, a fully synthetic dataset in Ukrainian designed for training, fine-tuning, and evaluating text embedding models. The use of large language models (LLMs) allows for controlling the diversity of generated data in aspects such as NLP tasks, asymmetry between queries and documents, the presence of instructions, support for various languages, and avoidance of social biases. A zero-shot generation approach was used to create a set of Ukrainian query-documents pairs with corresponding similarity scores.

Development of an Automated Natural Language Text Analysis System Using Transformers

The article is dedicated to the study of the development of an automated medical text analysis system using modern artificial intelligence technologies and natural language processing. The current state and prospects for the development of automated medical text analysis are analyzed. The main methods and technologies used in this field, including machine learning, deep learning, and natural language processing, are examined.