NLP | Academic Journals and Conferences

IT Project Concept for Detecting and Automatically Correcting Spelling Errors in German-Language Texts

This paper presents the concept and technical justification of a software product for automatic detection and correction of spelling errors in German-language texts. The relevance of the topic is due to the complexity of the German language in terms of grammar, spelling and word formation, which creates significant difficulties for speakers of other languages. This is especially true for those who study the language or use it in professional activities, where the accuracy of speech is crucial.

Information Technologies for Errors Correction in Ukrainian-Language Texts Based on Machine Learning

The relevance of the research is due to the growing need to automate the processes of text analysis and correction, in particular for Ukrainian-language content, which is characterized by a wealth of morphological and syntactic structure. Due to the wide range of errors that can occur in texts, from spelling to contextual, there is an urgent need to create systems that can accurately identify errors and offer their correct corrections.

Intelligent test case generation from textual security requirements in SCRUM: an NLP-driven approach

This paper presents a method for automati- cally generating security-oriented test cases from textual requirements in SCRUM environments using Natural Language Processing. The proposed approach has com- bined transformer-based semantic analysis with behavior- driven development test templates to extract and translate functional, non-functional, and misuse-case security requirements. The solution has been tested on 30 real- world requirements derived from agile software projects.

Improving Amazigh POS tagging using machine learning

Tamazight, Berber, and Amazigh are the multiple names for the same language.

Evaluating machine learning models efficacy in sentiment analysis for Moroccan Darija: An exploration with MAC dataset

Sentiment analysis is an essential technique for classifying and extracting emotions from several data sets. While many basic methods distinguish between negative and positive emotions, advanced approaches may consider additional categories, such as neutral emotions. This becomes very important and difficult when we need to deal with less parsed languages and dialects, such as Moroccan Darija. Our study highlights the nuances of conducting sentiment analysis implementing the MAC dataset, which includes comments in Moroccan Darija. Our main target is to do comparativ

Data Set Formation Method for Checking the Quality of Learning Language Models of the Transitive Relation in the Logical Conclusion Problem Context

A method for data set formation has been developed to verify the ability of pre-trained models to learn transitivity dependencies. The generated data set was used to test the quality of learning the transitivity dependencies in the task of natural language inference (NLI). Testing of a data set with a size of 10,000 samples (MultiNLI) used to test the RoBerta model.

Information System for Ukrainian Text Voiceover Based on Nlp and Machine Learning Methods

During the research, an information system for voicing Ukrainian-language text was developed based on NLP and machine learning methods. The created information system is implemented in the form of a desktop application, which allows the process of voicing the Ukrainian-language text. The created system included all stages of software development: the design process, the implementation process, and the testing process.

Overview of the Ukrainian language resources within the multilingual European MULTEXT-East project, v. 4

The article presents an overview of computational resources for the Ukrainian language within a multilingual European MULTEXT-East project (MTE, http://nl.ijs.si/ME/V4) freely available for researchers since May 2010, including a formal representation of morphosyntactic specifications consisting of 1239 unique grammatical tags in the XML, TEI-5 compatible, format and a morphosyntactic lexicon covering over 200000 wordforms with lemmas and morphosyntactic codes.