Fine-tuning AraGPT2 for Hierarchical Arabic Text Classification
DOI:
https://doi.org/10.54327/set2025/v5.i1.224Keywords:
Arabic Text Classification, Hierarchical Classification, AraGPT2, GPT-2 Fine-tuning, Generative Pre-trained Transformer, Large Language ModelAbstract
Text classification consists in attributing a text to its corresponding category. It is a crucial task in natural language processing (NLP), with applications spanning content recommendation, spam detection, sentiment analysis, and topic categorization. While significant advancements have been made in text classification for widely spoken languages, Arabic remains underrepresented despite its large and diverse speaker base. Another challenge is that, unlike flat classification, hierarchical text classification involves categorizing texts into a multi-level taxonomy. This adds layers of complexity, particularly in distinguishing between closely related categories within the same super-class. To tackle these challenges, we propose a novel approach using AraGPT2, a variant of the Generative Pre-trained Transformer 2 (GPT-2) model adapted specifically for Arabic. Fine-tuning AraGPT2 for hierarchical text classification leverages the model's pre-existing linguistic knowledge and adapts it to recognize and classify Arabic text according to hierarchical structures. Fine-tuning, in this context, refers to the process of training a pre-trained model on a specific task or dataset to improve its performance on that task. Our experiments and comparative study demonstrate the efficiency of our solution. The fine-tuned AraGPT2 classifier achieves a hierarchical HF score of 80.64%, outperforming the machine learning-based classifier, which scores 41.90%.
Downloads

Downloads
Published
Data Availability Statement
To verify our proposed model, we used the WiHArD dataset (Wikipedia-based Hierarchical Arabic Dataset). This dataset is open-access and available on the Elsevier Mendeley Data Platform at [https://data.mendeley.com/datasets/kdkryh5rs2/2].
License
Copyright (c) 2025 Djelloul BOUCHIHA, Abdelghani BOUZIANE, Noureddine DOUMI, Benamar HAMZAOUI

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website, social networking sites, etc).