transcoding

Standardizing Arabic Dialects for NLP: A BERT-Based Transcoding Approach with a Focus on Moroccan Darija

Processing Arabic dialects in Natural Language Processing (NLP) presents significant challenges due to linguistic diversity and the lack of standardized resources.  While Modern Standard Arabic (MSA) benefits from advanced NLP tools and extensive annotated datasets, dialects such as Moroccan Darija remain underrepresented.  This study introduces a BERT-based transcoding framework that bridges the gap between dialectal Arabic and MSA, enabling the use of pre-trained models optimized for MSA, such as AraBERT.  By integrating contextual multilingual embeddings, the propose