Data Cluster Formation Using FAST Algorithm

Abstract
Authors
Keywords
Conclusion
References

Process of selecting relevant features from available dataset is known as features selection. Feature selection is use to remove or reduce redundant and irrelevant features. Various feature selection algorithms such as CFS (correlation feature selection), FCBF (Fast Correlation Based Filter) and CMIM (Conditional Mutual Information Maximization) are used to remove redundant and irrelevant features. To determine efficiency and effectiveness is the aim of feature selection algorithm. Time factor is denoted by efficiency and quality factor is denoted by effectiveness of subset of features. Problem of feature selection algorithm is accuracy is not guaranteed, computational complexity is large, ineffective at removing redundant features. To overcome these problems Fast Clustering based feature selection algorithm (FAST) is used. Removal of irrelevant features, construction of MST (Minimum Spanning Tree) from relative one and partition of MST and selecting representative features using kruskal’s method are the three steps used by FAST algorithm.

Published In : IJCSN Journal Volume 5, Issue 4

Date of Publication : August 2016

Pages : 609-614

Figures :07

Tables : --

Priyanka Patil : received B.E degree in Computer Engineering from the Mumbai University in 2009.She had received Best Paper Award Certificate in International Conference On Advances in Science and Technology in 2015. She currently works as a lecturer in Vishwatmak Om Gurudev College of Engineering, Atgaon, Mumbai.

Babita Bhagat : received M.E degree in Computer Engineering. She currently works as a professor in Pillai HOC college of Engineering, Rasayani, Mumbai.

Feature Clustering, Feature Subset Selection, Minimum Spanning Tree

The existing feature selection algorithms cannot eliminate both unrelated and unwanted features. Hence, it becomes difficult to select relevant features from entire set of features and thereby reduces efficiency and effectiveness of relevant features. So, in order to overcome the difficulties of irrelevant and unwanted feature removal, a new algorithm called “FAST Algorithm” has been implemented. It works in two steps. In first step, graphtheoretic clustering methods are used to partition features into clusters. In second step, subset of features is formed by combining representative’s features from target class. This has improved the performance of search algorithm on large scale. In future we can study and explain different types of correlation measures and properties of feature space.

[1] Qinbao Song, Jingjie Ni and Guangtao Wang , On A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data ,In proceedings of the Fifth IEEE international Conference on Data Mining,2013 [2] Fayyad U. and Irani K., Multi-interval discretization of continuous-valued attributes for classification learning, In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp 1022-1027, 1993. [3] Mitchell T.M., Generalization as Search, Artificial Intelligence, 18(2), pp203-226, 1982. [4] Das S., Filters, wrappers and a boosting-based hybrid for feature Selection, In Proceedings of the Eighteenth International Conference on Machine Learning, pp 74- 81, 2001. [5] Dash M. and Liu H., Feature Selection for Classification, Intelligent Data Analysis, 1(3), pp 131- 156, 1997. [6] Hall M.A., Correlation-Based Feature Subset Selection for Machine Learning, Ph.D. dissertation Waikato, New Zealand: Univ. Waikato, 1999. [7] Yu L. and Liu H., Feature selection for highdimensional data: a fast correlation-based filter solution, in Proceedings of 20th International Conference on Machine Leaning, 20(2), pp 856-863, 2003. [8] Fleuret F., Fast binary feature selection with conditional mutual Information, Journal of Machine Learning Research, 5, pp 1531-1555, 2004. [9] Kira K. and Rendell L.A., The feature selection problem: Traditional methods and a new algorithm, In Proceedings of Nineth National Conference on Artificial Intelligence, pp 129-134, 1992. [10] Van Dijk G. and Van Hulle M.M., Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis, International Conference on Artificial Neural Networks, 2006.