Detection of Offensive and Abusive Marathi Comments Using MBERT and Trie Algorithm
DOI:
https://doi.org/10.21467/proceedings.7.6.58Keywords:
Multilingual, Hybrid Approach, Linguistic NuancesAbstract
The rapid growth of online platforms has led to a surge in offensive and harmful speech, particularly on social media and comment sections. Regional languages such as Marathi face unique challenges in this area due to limited Natural Language Processing (NLP) resources and small annotated datasets. This paper presents a hybrid approach for detecting and moderating abusive and offensive Marathi comments by integrating Multilingual BERT (MBERT) with a Trie-based algorithm. While MBERT provides deep contextual understanding of linguistic nuances such as sarcasm, code-mixing, and polysemy, the Trie data structure enables efficient real-time detection of explicit offensive words. The proposed system was implemented in Python and trained on a labeled dataset of offensive and non-offensive Marathi comments. Experimental evaluation demonstrates that the hybrid model achieved an accuracy of 92.4%, with a precision of 90.8%, recall of 91.6%, and an F1-score of 91.2%. This combination of deep learning and data structure–based methods ensures both contextual accuracy and computational efficiency, offering a scalable solution for real-time content moderation in low-resource languages. The proposed framework contributes to safer and more inclusive online spaces by effectively identifying explicit and implicit offensive language in Marathi.
References
[1] G. M. Barrientos, R. Alaiz-Rodríguez, V. González-Castro, and A. C. Parnell, “Machine learning techniques for the detection of inappropriate erotic content in text,” Int. J. Comput. Intell. Syst., vol. 13, no. 1, pp. 591–603, 2020.
[2] F. Z. El-Alami, S. Ouatik El Alaoui, and N. En Nahnahi, “A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model,” J. King Saud Univ. — Comput. Inf. Sci., vol. 34, no. 8, pp. 6048–6056, 2022.
[3] K. Ghosh and A. Senapati, “Hate speech detection in low-resourced Indian languages: an analysis of transformer-based monolingual and multilingual models with cross-lingual experiments,” Nat. Lang. Process. J., pp. 1–22, 2024.
[4] V. U. Gongane, M. V. Munot, and A. D. Anuse, “Detection and moderation of detrimental content on social media platforms: current status and future directions,” Soc. Netw. Anal. Min., vol. 12, no. 1, pp. 1–15, 2022.
[5] P. K. Jada, R. K. Srihari, and S. K. Bhat, “Analyzing social media content for detection of offensive text,” in Proc. Int. Conf. Adv. Comput. Commun. Control, 2021, pp. 1–6.
[6] L. Yuan and M. A. Rizoiu, “Generalizing hate speech detection using multi-task learning: A case study of political public figures,” Comput. Speech Lang., vol. 89, 2025.
[7] A. Al Maruf, M. Rahman, and M. S. Rahman, “Hate speech detection in the Bengali language: a comprehensive survey,” J. Big Data, vol. 11, no. 1, pp. 1–30, 2024.
[8] M. N. García and I. Segura-Bedmar, “Detecting offensiveness in social network comments,” in Proc. Int. Conf. Comput. Linguist. Intell. Text Process., Aachen, Germany: CEUR-WS, vol. 2943, 2021, pp. 201–213.
[9] T. Gillespie, “Content moderation, AI, and the question of scale,” Big Data Soc., vol. 7, no. 2, pp. 1–12, 2020.
[10] M. Mozafari, K. Mnassri, R. Farahbakhsh, and N. Crespi, “Offensive language detection in low-resource languages: A use case of Persian language,” PLoS ONE, vol. 19, no. 6, pp. 1–15, Jun. 2024.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Deepali Jadhav, Sakshi Sonawane, Shreaysh Chudiwal, Chaitanya Dhotre, Ghansham Pawar, Renuka Raut (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.