Fine-tuning AraGPT2 for Hierarchical Arabic Text Classification

Authors

DOI:

https://doi.org/10.54327/set2025/v5.i1.224

Keywords:

Arabic Text Classification, Hierarchical Classification, AraGPT2, GPT-2 Fine-tuning, Generative Pre-trained Transformer, Large Language Model

Abstract

Text classification consists in attributing a text to its corresponding category. It is a crucial task in natural language processing (NLP), with applications spanning content recommendation, spam detection, sentiment analysis, and topic categorization. While significant advancements have been made in text classification for widely spoken languages, Arabic remains underrepresented despite its large and diverse speaker base. Another challenge is that, unlike flat classification, hierarchical text classification involves categorizing texts into a multi-level taxonomy. This adds layers of complexity, particularly in distinguishing between closely related categories within the same super-class. To tackle these challenges, we propose a novel approach using AraGPT2, a variant of the Generative Pre-trained Transformer 2 (GPT-2) model adapted specifically for Arabic. Fine-tuning AraGPT2 for hierarchical text classification leverages the model's pre-existing linguistic knowledge and adapts it to recognize and classify Arabic text according to hierarchical structures. Fine-tuning, in this context, refers to the process of training a pre-trained model on a specific task or dataset to improve its performance on that task. Our experiments and comparative study demonstrate the efficiency of our solution. The fine-tuned AraGPT2 classifier achieves a hierarchical HF score of 80.64%, outperforming the machine learning-based classifier, which scores 41.90%.

Downloads

Download data is not yet available.

Downloads

Published

17.03.2025

Data Availability Statement

To verify our proposed model, we used the WiHArD dataset (Wikipedia-based Hierarchical Arabic Dataset). This dataset is open-access and available on the Elsevier Mendeley Data Platform at [https://data.mendeley.com/datasets/kdkryh5rs2/2].

Issue

Section

Research Article

Categories

How to Cite

[1]
D. BOUCHIHA, A. BOUZIANE, N. DOUMI, and B. HAMZAOUI, “Fine-tuning AraGPT2 for Hierarchical Arabic Text Classification”, Sci. Eng. Technol., vol. 5, no. 1, Mar. 2025, doi: 10.54327/set2025/v5.i1.224.

Similar Articles

1-10 of 86

You may also start an advanced similarity search for this article.