Process of selecting relevant features from
available dataset is known as features selection. Feature
selection is use to remove or reduce redundant and irrelevant
features. Various feature selection algorithms such as CFS
(correlation feature selection), FCBF (Fast Correlation Based
Filter) and CMIM (Conditional Mutual Information
Maximization) are used to remove redundant and irrelevant
features. To determine efficiency and effectiveness is the aim
of feature selection algorithm. Time factor is denoted by
efficiency and quality factor is denoted by effectiveness of
subset of features. Problem of feature selection algorithm is
accuracy is not guaranteed, computational complexity is
large, ineffective at removing redundant features. To
overcome these problems Fast Clustering based feature
selection algorithm (FAST) is used. Removal of irrelevant
features, construction of MST (Minimum Spanning Tree)
from relative one and partition of MST and selecting
representative features using kruskal’s method are the three
steps used by FAST algorithm.
Published In:IJCSN Journal Volume 5, Issue 4
Date of Publication : August 2016
Pages : 609-614
Figures :07
Tables : --
Priyanka Patil : received B.E degree in Computer
Engineering from the Mumbai University in 2009.She had received
Best Paper Award Certificate in International Conference On
Advances in Science and Technology in 2015. She currently works
as a lecturer in Vishwatmak Om Gurudev College of Engineering,
Atgaon, Mumbai.
Babita Bhagat : received M.E degree in Computer
Engineering. She currently works as a professor in Pillai HOC
college of Engineering, Rasayani, Mumbai.
Feature Clustering, Feature Subset Selection,
Minimum Spanning Tree
The existing feature selection algorithms cannot eliminate
both unrelated and unwanted features. Hence, it becomes
difficult to select relevant features from entire set of
features and thereby reduces efficiency and effectiveness of relevant features. So, in order to overcome the
difficulties of irrelevant and unwanted feature removal, a
new algorithm called “FAST Algorithm” has been
implemented. It works in two steps. In first step, graphtheoretic
clustering methods are used to partition features
into clusters. In second step, subset of features is formed
by combining representative’s features from target class.
This has improved the performance of search algorithm on
large scale. In future we can study and explain different
types of correlation measures and properties of feature
space.
[1] Qinbao Song, Jingjie Ni and Guangtao Wang , On A
Fast Clustering-Based Feature Subset Selection
Algorithm for High Dimensional Data ,In proceedings
of the Fifth IEEE international Conference on Data
Mining,2013
[2] Fayyad U. and Irani K., Multi-interval discretization of
continuous-valued attributes for classification learning,
In Proceedings of the Thirteenth International Joint
Conference on Artificial Intelligence, pp 1022-1027,
1993.
[3] Mitchell T.M., Generalization as Search, Artificial
Intelligence, 18(2), pp203-226, 1982.
[4] Das S., Filters, wrappers and a boosting-based hybrid
for feature Selection, In Proceedings of the Eighteenth
International Conference on Machine Learning, pp 74-
81, 2001.
[5] Dash M. and Liu H., Feature Selection for
Classification, Intelligent Data Analysis, 1(3), pp 131-
156, 1997.
[6] Hall M.A., Correlation-Based Feature Subset Selection
for Machine Learning, Ph.D. dissertation Waikato,
New Zealand: Univ. Waikato, 1999.
[7] Yu L. and Liu H., Feature selection for highdimensional
data: a fast correlation-based filter
solution, in Proceedings of 20th International
Conference on Machine Leaning, 20(2), pp 856-863,
2003.
[8] Fleuret F., Fast binary feature selection with
conditional mutual Information, Journal of Machine
Learning Research, 5, pp 1531-1555, 2004.
[9] Kira K. and Rendell L.A., The feature selection
problem: Traditional methods and a new algorithm, In
Proceedings of Nineth National Conference on
Artificial Intelligence, pp 129-134, 1992.
[10] Van Dijk G. and Van Hulle M.M., Speeding Up the
Wrapper Feature Subset Selection in Regression by
Mutual Information Relevance and Redundancy
Analysis, International Conference on Artificial Neural
Networks, 2006.