TF-IDF

Information Technology for Text Classification Tasks Using Large Language Models

The article addresses the problem of text classification in the context of growing information flows and the need for automated content analysis. A universal information technology is proposed, combining classical machine learning methods with the potential of Large Language Models for processing news, scientific, literary, journalistic and legal texts. Using the BBC News corpus (2225 texts), k-means clustering with TF-IDF demonstrated clear thematic grouping.

Topic Modeling for News Recommendations: Evaluating the Performance of LDA and BERTopic

Text analysis is an important component in the evolution of recommender systems, as it enables meaningful information to be extracted from vast amounts of textual data.  This study performs a comparative analysis of two main topic modeling techniques, Latent Dirichlet Allocation (LDA) and BERTopic in the context of news recommender systems.  Using a dataset of Moroccan news articles, we evaluate the ability of these models to generate coherent and interpretable topics.  Our results demonstrate that BERTopic outperforms LDA in terms of topic consistency and semantic rich

Method for Detection of Disinformation Based on Text Data Analysis Using TF-IDF and Contextual Vector Representations

The article considers an approach to detecting fake news in the digital environment through text analysis using machine learning and natural language processing methods. The proposed method is based on a hybrid text representation combining frequency features (TF-IDF) and contextual embeddings obtained using the IBM Granite model. A complete data processing cycle was developed, covering the stages of exploratory analysis (EDA), text preprocessing and tokenization, forming vector representations, training a logistic regression model, and obtaining key metrics.

Automation of Formation of a Specialist's Professional Curriculum Based on Analysis of Vacancies and Text Data

The article explores the development and implementation of an automated system for constructing professionograms of artificial intelligence (AI) specialists using modern information technologies. Against the backdrop of an ever-evolving digital economy and rising demand for qualified AI professionals, the study highlights the need for accurate and scalable tools to identify and assess relevant competencies.

Information system for extraction of information from open web resources

The purpose of the work is to develop a project of an information and reference system for finding answers to questions based on the highest degree of comparison using text content from open English- language web resources. Examples of such questions can be: “What is the best book ever?”, “What is the most popular IDE for Python”. The result of the functioning of the information and reference system is a ranked list of answers based on the frequency of appearance of each of the answer options.