Sentiment Analysis for Roman Urdu Text over Social Media, a Comparative Study

Abstract
Authors
Keywords
Conclusion
References

In present century, data volume is increasing enormously. The data could be in form for image, text, voice, and video. One factor in this huge growth of data is usage of social media where everyone is posting data on daily basis during chatting, exchanging information, and uploading their personal and official credential. Research of sentiments seeks to uncover abstract knowledge in Published texts in which users communicate their emotions and thoughts about shared content, including blogs, news and social networks. Roman Urdu is the one of most dominant language on social networks in Pakistan and India. Roman Urdu is among the varieties of the world's third largest Urdu language but yet not sufficient work has been done in this language. In this article we addressed the prior concepts and strategies used to examine the sentiment of the roman Urdu text and reported their results as well.

Published In : IJCSN Journal Volume 9, Issue 5

Date of Publication : October 2020

Pages : 217-224

Figures :12

Tables : 01

Irfan Qutab : is research scholar of MSCS in department of Computer Science and Information Technology at The University of Lahore. He is working over Sentiment Analysis of Roman Urdu as his final research work.

Khawar Iqbal Malik : has completed MSCS in 2015 and currently student of PhD Computer Science from University of Sargodha . Now he is working as Lecturer in department of Computer S cience and I nformation T echnology in The U niversity of Lahore Sargodha c ampus. H is research area is Artificial Neural N etwork and Information Retrieval Techniques.

Hira Arooj : has completed MPhil Statistics in 2016 currently she is working as Lecturer statis tics in department of Mathematics and S tatistics in The U niversity of Lahore Sargodha campus. She is teaching from 4 years in department of computer science. Her research area is statistical Models A pplied in NLP & Information retrieval Techniques.

Lexicon, Urdu sentiments, Pre-processing, Corpus, Datasets, sentiment classification

In this paper we have discussed different methods and techniques that were used to achieve the results of sentiment analysis of roman Urdu text/comments. We have tried our best to include all the researches that have been done so far in roman Urdu sentiment classification. In all this study we have seen that classification was performed in two or three classes whether it was (positive, negative or neutral). Most researchers have used Naïve Bays, logistic Regression and Support Vector Machine. We also concluded that very limited work has been done in roman Urdu language. There are few researchers who proposed the models for roman Urdu sentiment analysis. Most of the research is performed in English language. As Roman Urdu is the variant of third largest language in the world there a lot of work is needed in this language

[1] L. Zhang, R. Ghosh, M. Dekhil, M. Hsu and B. Liu, "Combining lexicon-based and learning-based methods for Twitter sentiment analysis," HP Laboratories, Technical Report HPL-2011, vol. 89, 2011. [2] E. Dogan and B. Kaya, "Deep Learning Based Sentiment Analysis and Text Summarization in Social Networks," in 2019 International Artificial Intelligence and Data Processing Symposium (IDAP), IEEE, 2019, pp. 1-6. [3] G. Z. Nargis and N. Jamil, "Generating an emotion ontology for roman urdu text," International Journal of Computational Linguistics Research, vol. 7, 2016. [4] L. Barbosa and J. Feng, "Robust sentiment detection on twitter from biased and noisy data," in Coling 2010: Posters, 2010, pp. 36-44. [5] A. Rafique, M. K. Malik, Z. Nawaz, F. Bukhari and A. H. Jalbani, "Sentiment Analysis for Roman Urdu," Mehran University of Engineering and Technology, vol. 38, no. 2, p. 463, 2019. [6] F. Noor, M. Bakhtyar and J. Baber, "Sentiment analysis in E-commerce using SVM on roman urdu text," in International Conference for Emerging Technologies in Computing, Springer, 2019, pp. 213- 222. [7] M. Bilal, H. Israr, M. Shahid and A. Khan, "Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques," Journal of King Saud University-Computer and Information Sciences, vol. 28, no. 3, pp. 330-244, 2016. [8] Z. e. a. Mahmood, "Deep sentiments in Roman Urdu text using Recurrent Convolutional Neural Network model," Information Processing & Management, vol. 57, no. 5, p. 102233, 2020. [9] F. Memood, M. U. Ghani, M. A. Ibrahim, R. Shehzadi and M. N. Asim, "A Precisely Xtreme-Multi Channel Hybrid Approach For Roman Urdu Sentiment Analysis," arXiv preprint arXiv:2003.05443, 2020. [10] P. Bojanowski, E. Grave, A. Joulin and T. Mikolov, "Enriching word vectors with subword information," Transactions of the Association for Computational Linguistics, vol. 5, pp. 135-146, 2017. [11] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in neural information processing systems, 2013, pp. 3111--3119. [12] J. Pennington, R. Socher and C. D. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532-1543. [13] K. e. a. Cho, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014. [14] A. Rafique, M. K. Malik, Z. Nawaz, F. Bukhari and A. H. Jalbani, "Sentiment Analysis for Roman Urdu," Mehran University of Engineering and Technology, vol. 38, no. 2, p. 463, 2019. [15] Z. Sharf and S. U. Rahman, "Lexical normalization of roman Urdu text," International Journal of Computer Science and Network Security, vol. 17, no. 12, pp. 213-221, 2017. [16] H. Ghulam, F. Zeng, W. Li and Y. Xiao, "Deep learning-based sentiment analysis for Roman Urdu text," Procedia computer science, vol. 146, pp. 131-135, 2019. [17] G. E. Hinton and T. J. e. a. Sejnowski, "Unsupervised learning: foundations of neural computation," MIT Press, 1999. [18] M. Sundermeyer, H. Ney and R. Schlüter, "From feedforward to recurrent LSTM neural networks for language modeling," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 3, pp. 517-529, 2015. [19] K. Mehmood, D. Essam and K. Shafi, "Sentiment analysis system for Roman Urdu," in Science and Information Conference, Springer, 2018, pp. 29-42.