Sentiment Analysis for Low Resource Language Using Machine Learning and Deep Learning
DOI:
https://doi.org/10.21467/proceedings.7.6.21Keywords:
Punjabi sentiment analysis, low-resource languages, annotated datasetsAbstract
The challenging task of sentiment analysis on Punjabi and other languages with limited resources and technology is investigated in this work. Though sentiment analysis is a prominent topic in NLP, low-resource languages like Punjabi lack digital resources, annotated datasets, and language-specific tools, so research in them is ignored. This paper addresses such information gaps by means of Punjabi text sentiment analysis. Internet news and social media posts were compiled into an annotated Punjabi dataset with sentiment polarity. Text sentiment was extracted and analysed using deep learning tools and rule-based machine learning methods—hybrid models and supervised classifiers. We preprocessed carefully to manage Punjabi's transliterations, dialectal variances, and cultural terms. We evaluated model performance with reference to recall, accuracy, precision, and F1-score. Despite few resources, deep learning models identified more complex sentiment signals than machine learning methods. For slang, rule-based sentiment analysis also performed admirably. We draw attention to problems with Punjabi sentiment analysis including a lack of high-quality annotated data and a platform-wide shared script. While advanced natural language processing techniques might be useful, more data and research will let you create better models for low-resource languages. This work gives a Punjabi sentiment analysis method, a platform for further low-resource language processing research, and defines the cultural background required for correct sentiment interpretation.
References
[1] Agarwal, B., & Mittal, N. (2018). (2014). Prominent feature extraction for review analysis: an empirical study. Journal of Experimental & Theoretical Artificial Intelligence, 28(3), 485–498. https://doi.org/10.1080/0952813X.2014.977830
[2] Alfonseca, E., & Manandhar, S. (2006). Automatising the learning of lexical patterns: An application to the identification of semantic relationships. Data & Knowledge Engineering, 58(3), 571–600. https://doi.org/10.1016/j.datak.2006.06.011
[3] Aliyu, Y., Sarlan, A., Danyaro, K., Rahman, A., & Abdullahi, M. (2024). Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data Sources. IEEE Access, 12, 66883-66909. https://doi.org/10.1109/ACCESS.2024.3398635.
[4] Arora, P., & Kaur, B. (2015). Sentiment Analysis of Political Reviews in Punjabi Language. International Journal of Computer Applications, 126, 20-23. https://doi.org/10.5120/IJCA2015906297.
[5] Ashraf, M., Jana, Y., Umer, Q., Jaffar, M., Chung, S., & Ramay, W. (2023). BERT-Based Sentiment Analysis for Low-Resourced Languages: A Case Study of Urdu Language. IEEE Access, 11, 110245-110259. https://doi.org/10.1109/ACCESS.2023.3322101.
[6] Basile, V., & Greco, S. (2020). AcCompl-it @ EVALITA2020: Overview of the Acceptability and Complexity Evaluation Task for Italian. In Proceedings of the 7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA 2020). https://doi.org/10.1007/978-3-030-77091-4_26
[7] Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8. https://doi.org/10.1016/j.jocs.2010.12.007
[8] Ghasemi, R., Asli, S., & Momtazi, S. (2020). Deep Persian sentiment analysis: Cross-lingual training for low-resource languages. Journal of Information Science, 48, 449 - 462. https://doi.org/10.1177/0165551520962781
[9] Kaur, A., & Gupta, V. (2017). A novel approach for sentiment analysis of punjabi text using SVM. Int. Arab J. Inf. Technol., 14, 707-712
[10] Kaur, G., & Kaur, K. (2017). Sentiment Analysis on Punjabi News Articles Using SVM.
[11] Meetei, L., Singh, T., Borgohain, S., & Bandyopadhyay, S. (2021). Low resource language specific pre-processing and features for sentiment analysis task. Language Resources and Evaluation, 55, 947 - 969. https://doi.org/10.1007/s10579-021-09541-9
[12] Nazir, M., Faisal, C., Habib, M., & Ahmad, H. (2025). Leveraging Multilingual Transformer for Multiclass Sentiment Analysis in Code-Mixed Data of Low-Resource Languages. IEEE Access, 13, 7538-7554. https://doi.org/10.1109/ACCESS.2025.3527710
[13] R, G., T, S., & Cheriyan, R. (2023). Analysis of Sentiments in Low Resource Languages: Challenges and Solutions. 2023 IEEE International Conference on Recent Advances in Systems Science and Engineering (RASSE), 1-6. https://doi.org/10.1109/RASSE60029.2023.10363469
[14] Sharma, A. (2014). Sentiment Analyzer using Punjabi Language. International Journal of Innovative Research in Computer and Communication Engineering, 2, 5904-5909
[15] Singh, J., Singh, G., Singh, R., & Singh, P. (2018). Morphological evaluation and sentiment analysis of Punjabi text using deep learning classification. J. King Saud Univ. Comput. Inf. Sci., 33, 508-517. https://doi.org/10.1016/J.JKSUCI.2018.04.003
[16] Yadav, K., Lamba, A., Gupta, D., Gupta, A., Karmakar, P., & Saini, S. (2020). Bilingual Sentiment Analysis for a Code-mixed Punjabi English Social Media Text. 2020 5th International Conference on Computing, Communication and Security (ICCCS), 1-5. https://doi.org/10.1109/ICCCS49678.2020.9277309
[17] Zhang, L., Wang, S., & Liu, B. (2018). Deep Learning for Sentiment Analysis: A Survey. Published in Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1253. https://doi.org/10.1002/widm.1253
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.