Pre-processing Techniques in Sentiment Analysis through FRN: A Review

Abstract
Authors
Keywords
Conclusion
References

The objective of the paper is to demonstrate the viability of analyzing online data. It displays a framework which after effects pattern investigation that will be shown as results with various segments introducing positive, negative and neutral. It is challenging task to summarize opinion about the products due to diversity and size. Mining online opinion mining is a difficult text classification task of sentiment analysis. Multivariate content technique called Feature Relation Network that considers semantic data, influencing the syntactic connections between n-gram features. FRN empowers the consideration of heterogeneous n-gram features for improved opinion classification, by joining syntactic data about n-gram relations. FRN selects the features in a more computationally effective way than numerous multivariate and hybrid methods. Appropriate feature selection and representation with sentiment analysis, accuracies using support vector mechanism sentiment analysis; the task of text pre-processing is to be explored.

Published In : IJCSN Journal Volume 5, Issue 2

Date of Publication : April 2016

Pages : --

Figures :01

Tables : 03

Publication Link : Pre-processing Techniques in Sentiment Analysis through FRN: A Review

Ashwini.M.Baikerikar : Department of Computer Science and Technology, Department of Technology, Shivaji University, Kolhapur, Maharashtra, India.

P.C.Bhaskar : Department of Electronics and Communication Technology, Department of Technology, Shivaji University, Kolhapur, Maharashtra, India.

Sentiment analysis; Text pre-processing; Feature Relation Network (FRN); Support Vector Machine (SVM)

In sentiment analysis feature selection, that emerges as a challenging area with lots of obstacles as it involves natural language processing. The challenge of this field is to develop the machines ability to understand text as human readers do. In this paper, we analyzed the part of text pre-processing in sentiment analysis, experimental results that demonstrate with appropriate feature selection and representation, sentiment analysis correctness using SVM in this area may be increased up to the level achieved in topic classification. Various pre-processing methods are used to reduce the noise in the text in addition to using chi-squared method to remove unwanted features that does not affect its orientation. The level of accuracy achieved on the two data sets is comparable to the sort of accuracy that can be achieved in topic categorizing. Concluding that hybrid method for feature selection can be the future direction in the field of feature selection in sentiment analysis.

[1] H. Tang, S. Tan, X. Cheng, “A survey on sentiment detection of reviews, Expert Systems with Applications” 36 (7) (2009) 10760 10773. [2] B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? “Sentiment classification using machine learning techniques”, in: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002. [3] M. Thelwall, K. Buckley, G. Paltoglou, “Sentiment in twitter events”, Journal of the American Society for Information Science and Technology 62 (2) (2011) 406 418. [4] A. Abbasi, S. France, Z. Zhang, H. Chen, “Selecting attributes for sentiment classification using feature relation networks”, Knowledge and Data Engineering, IEEE Transactions on 23 (3) (2011) 447 462. [5] T. Wilson, J. Wiebe, P. Hoffmann, “Recognizing contextual polarity in phrase-level sentiment analysis”, in: Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), 2005, pp. 347 354. [6] H. Yu, V. Hatzivassiloglou, “Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences”, in: Proceedings of the conference on Empirical methods in natural language processing, EMNLP-2003, 2003, pp. 129 136. [7] L. Tan, J. Na, Y. Theng, K. Chang, “Sentence-level sentiment polarity classification using a linguistic approach”, Digital Libraries: For Cultural Heritage, Knowledge Dissemination, and Future Creation (2011) 77 87. [8] S. R. Das, “News Analytics: Framework, Techniques and Metrics”, Wiley Finance, 2010, Ch. 2, the Handbook of News Analytics in Finance. [9] P. Melville, W. Gryc, R. Lawrence, “Sentiment analysis of blogs by combining lexical knowledge with text classification”, in: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2009, pp. 1275 1284. [10] C. Tan, L. Lee, J. Tang, L. Jiang, M. Zhou, P. Li, “User-level sentiment analysis incorporating social networks”,Arxiv preprint arXiv:1109.6018. [11] X. Ding, B. Liu, P. Yu, “A holistic lexicon-based approach to opinion mining”, in: Proceedings of the international conference on Web search and web data mining, ACM, 2008, pp. 231 240. [12] I. Feinerer, K. Hornik, D. Meyer, “Text mining infrastructure”, Journal of Statistical Software 25 (5) (2008) 1 54. [13] J.-C. Na, H. Sui, C. Khoo, S. Chan, Y. Zhou, “Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews”, in: Conference of the International Society for Knowledge Organization (ISKO), 2004, pp. 49 54. [14] V. Vapnik, “The nature of statistical learning theory”, springer, 1999. [15] C. Lee, G. Lee, “Information gain and divergence-based feature selection for machine learning-based text categorization”, Information processing & management 42 (1) (2006) 155 165. [16] S. Russell, P. Norving, “Artificial Intelligence: A Modern Approach”, second edition, Prentice Hall Artificial Intelligence Series, Pearson Education Inc., 2003. [17] J. Wang, P. Neskovic, L. N. Cooper, “Training data selection for support vector machines”, in: ICNC 2005. LNCS, International Conference on Neural Computation, 2005, pp. 554 564. [18] B. Schölkopf, J. Platt, J. Shawe-Taylor, A. Smola, R. Williamson, “Estimating the support of a high-dimensional distribution”, Neural computation 13(7) (2001) 1443 1471. [19] B. Pang, L. Lee, “A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts”, in: Proceedings of the ACL, 2004. [20] K. Dave, S. Lawrence, D. M. Pennock, “Mining the peanut gallery: Opinion extraction and semantic classification of product reviews”, in: Proceedings of WWW, 2003, p. 519 528. [21] B. Scholkopf, K. Sung, C. Burges, F. Girosi, P. Niyogi, T. Poggio, V. Vapnik, “Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers, Signal Processing”, IEEE Transactions on 45 (11) (1997) 2758 2765. [22] A. Abbasi, H. Chen, and A. Salem, “Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums,” ACM Trans. Information Systems, vol. 26, no. 3, article no. 12, 2008. [23] M. Hall and L.A. Smith, “Feature Subset Selection: A Correlation Based Filter Approach,” Proc. Fourth Int’l Conf. Neural Information Processing and Intelligent Information Systems, pp. 855-858, 1997. [24] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Machine Learning Research, vol. 3, pp. 1157- 1182, 2003. [25] F. Fleuret, “Fast Binary Feature Selection with Conditional Mutual Information,” J. Machine Learning Research, vol. 5, pp. 1531-1555, 2004. [26] T. Zhang, D. Tao, X. Li, and J. Yang, “Patch Alignment for Dimensionality Reduction,” IEEE Trans. Knowledge and Data Eng., vol. 21, no. 9, pp. 1299-1313, Sept. 2009.