Automatic Tweet Classification Using Keyword Strapping

Abstract
Authors
Keywords
Conclusion
References

Recently, a new form of blogging has emerged known as microblogging. Microblogging is a simplified form of blogging where entries are restricted in length, typically to around 140 characters or less. Microblog usage has grown dramatically recently thanks in part to Twitter, the leading provider of microblogs, and the integration of microblogging services. In this dissertation, attempt to address some of the opportunities and challenges of automatically processing microblogging data by considering two specific problems. First, an automatically Keyword Strap classifier that classify a single Twitter post into a set of high-level categories using a Naïve Bayes classifier. While such tasks have been performed before using traditional blogs, no such research exists to our knowledge of applying this technique to microblogging data. Our research indicates that even though an average Twitter post is only 11 words in length they can be categorized into one of ten categories with an Fl-measure up to 78%. Secondly, automatically summarize a large number of Twitter messages and calculate happy index of user.

Published In : IJCSN Journal Volume 5, Issue 2

Date of Publication : April 2016

Pages : --

Figures :03

Tables : --

Publication Link : Automatic Tweet Classification Using Keyword Strapping

RAHUL CHATURVEDI : He is currently pursuing the M.Tech at state university, RGPV Bhopal (Madhya Pradesh). His research interests include Mining over Social Media Data. (1School of Information Technology, RGPV University Bhopal).

NISHCHOL MISHRA : He received the Ph.D. degree in computer science and engineering. His research interests include Mining over Social Media Data. He is currently Assistant Professor with state technical university of Madhya Pradesh, RGPV Bhopal. (2School of Information Technology, RGPV University Bhopal).

Learning Analytics, LAK, EDM, Twitter API’s, Naïve Bayes classifier, SVM classifier

In sentiment analysis feature selection, that emerges as a challenging area with lots of obstacles as it involves natural language processing. The challenge of this field is to develop the machines ability to understand text as human readers do. In this paper, we analyzed the part of text pre-processing in sentiment analysis, experimental results that demonstrate with appropriate feature selection and representation, sentiment analysis correctness using SVM in this area may be increased up to the level achieved in topic classification. Various pre-processing methods are used to reduce the noise in the text in addition to using chi-squared method to remove unwanted features that does not affect its orientation. The level of accuracy achieved on the two data sets is comparable to the sort of accuracy that can be achieved in topic categorizing. Concluding that hybrid method for feature selection can be the future direction in the field of feature selection in sentiment analysis.

[1]. Ben Fei and Jinbai Liu Binary Tree of SVM: A New Fast Multiclass Training and Classification Algorithm, IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 3, MAY 2006. [2]. Bodong Chen University of Minnesota, Xin Chen Purdue University, WanliXind University Of Missouri,“Twitter Archeology” of Learning Analytics and Knowledge Conferences. [3]. D. Gaffney, “#iranElection: Quantifying Online Activism,” Proc. Extending the Frontier of Society On-Line (WebSci), 2010. [4]. Dilrukshi, I.; De Zoysa, K. Twitter news classification: Theoretical and practical comparison of SVM against Naive Bayes algorithms Advances in ICT for Emerging Regions (ICTer), 2013 International Conference on Year: 2013 Pages: 27- 278, DOI: 10.1109/ICTer.2013.6761192. [5]. Durant, K., and Smith, M. (2006). Mining sentiment classification from political web logs. In Proceeding of the workshop on web mining and web usage analysis of the Jih ACM SIGKDD international conference on knowledge discovery and data mining (WebKDD-2006), Philadelphia, PA, USA. SIGKDD Workshop on Web Mining and Web UsageAnalysis. [6]. Kiberly E. Arnold Purdue University, Matthew D. Pitilli Purdue University “Course Signals at Purdue: Using Learning Analytics toIncrease Student Success”. [7]. Jie Yin, CSIRO ICT Centre, Andrew Lampert, Palantir Technologies, Mark Cameron, Bella Robinson, and Robert Power, CSIRO ICT Centre Using Social Media to Enhance EmergenceSituation Awareness. [8]. Jun Zhang; Honavar, V. AVT-NBL: an algorithm for learning compact and accurate naive Bayes classifiers from attribute value taxonomies and data. Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on Year: 2004 Pages: 289 - 296, DOI: 10.1109/ICDM.2004.10083 [9]. Kanakaraj, M.; Guddeti, R.M.R. NLP based sentiment analysis on Twitter data using ensemble classifiers Signal Processing, Communication and Networking (ICSCN), 2015 3rd International Conference on Year: 2015 Pages:15,DOI: 10.1109/ICSCN.2015.7219856. [10]. Lee, K.; Palsetia, D.; Narayanan, R.; Patwary, M.M.A.; Agrawal, A.; Choudhary, A. Twitter Trending Topic Classification Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on Year: 2011 Pages: 251 - 258, DOI: 10.1109/ICDMW.2011.171 [11]. Luca Maria Aiello, Georgios Petkos, Carlos Martin, David Corney, Symeon Papadopoulos, Ryan Skraba, Ayse Göker, Ioannis Kompatsiaris, Senior Member, IEEE, and Alejandro Jaime Sensing Trending Topics in Twitter. IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 15, NO. 6, OCTOBER 2013. [12].MihaelaVorvoreanu, Xin Chen, and Krishna Madhavana“Mining Social Media Data for Understanding StudentsLearning Experiences”. [13]. Pear Analytics (2009). Retrieved from http://www.scribd.com/doc/18548460/Pear-Analytics Twitter-StudyAugust-2009. [14]. Xinchen, Krishna Madhavan School Of Engineering Education Purdue University, MihaelaVorvoreanu Computer Graphics Technology.