In the last few years, the problem of class imbalances is a challenging problem in data mining community. The class
imbalance occurs when one of the classes in the data has a larger number than others. That condition causing the classification being not
optimum because the larger class gave more influences in the classification. Some cases of class imbalance issues become a very
important thing, for example, to detect cheating in banking operations, network trouble, cancer diagnose, and prediction of technical
failure. This study conducts a bagging based ensemble method to overcome the problem of class imbalance on 14 datasets. The purpose
of this research is to see the ability of some bagging based ensemble methods on overcoming the class imbalance problem. The results
obtained by using OverBagging method are more stable than other bagging based methods in various datasets.
Published In:IJCSN Journal Volume 6, Issue 6
Date of Publication : December 2017
Pages : 670-676
Figures :04
Tables : 05
Mr. L. Hakim : master student in Department of Statistics, Bogor
Agricultural University. His main interests is on data mining and
bioinformatics.
Dr. B. Sartono : Currently worked as a lecture in Department of
Statistics, Bogor Agricultural University. His main interests is on
data mining and experimental design.
A. Saefuddin : received the M.Sc. and Ph.D.. In University of
Guelph, Canada. He is a professor in Department of Statistics,
Bogor Agricultural University. He is also serving as the Rector of Al
– Azhar University Indonesia in Jakarta. His expertize is on
genetic and biostatistics.
Ensemble, Boosting, Bagging, Class Imbalance, Classification
Overall, bagging based methods can improve results in
minority classes as evidenced by their higher sensitivity
values compared to the CART method. Although the overall value of specificity in the CART method is superior
to that of the bagging method. This illustrates that the
CART method is not able to predict the minority class well.
The OverBagging method is a stable method for various
datasets in both extreme and non-extreme classes.
However, OverBagging method takes a long time in
computing process. Another stable method is the Roughly
Balanced Bagging method because the Roughly Balanced
Bagging method as a whole is able to predict the minority
class better when compared to other methods except in the
extreme data Bagging Ensemble Variation is better when
compared with the method of Roughly Balanced Bagging.
But the Bagging Ensemble Variation not incapable of
predicting trees with equal number of opportunities.
[1] Ramyachitra D. Manikanda P, “ Imbalanced Dataset
Classification And Solutions: A Review” International
Journal of Computing and Business Research (IJCBR).
Vol.5, issue.4, pp. 12-23, 2014.
[2] Shaza M Abd Elrahman1 and Ajith Abraham, “A Review
of Class Imbalance Problem” Journal of Network and
Innovative Computing. Vol. 1, pp. 332-340, 2013.
[3] Nitesh V. Chawla, Nathalie Japkowicz, Aleksander Ko
lcz, “Special Issue on Learning from Imbalanced Data
Sets” SIGKDD Explor. Newsl. Vol. 6, pp. 1-6, 2004.
[4] Mikel Galar,Fransico, “A review on Ensembles for the
class Imbalance Problem: Bagging,Boosting and Hybrid-
Based Approaches” IEEE Transactions On Systems,
Man, And Cybernetics—Part C: Application And
Reviews, Vol.42,No.4 July 2012.
[5] Rushi Longadge, 2 Snehlata S. Dongre, Latesh Malik
“Class Imbalance Problem in Data Mining: Review”
International Journal of Computer Science and Network
(IJCSN). Vol.2, pp. 83-88, 2013.
[6] Yuliana Permatasari, “Penanganan Masalah Kelas Tidak
Seimbang dengan RUSBoost dan UnderBagging (Studi
Kasus: Mahasiswa Drop Out SPs IPB Program
Magister)” Thesis, Bogor Agriculture University: Bogor.
[7] Lior Rokach, “. Ensemble-based classifiers” Artif. Intell.
Vol. 33, 1-39, 2010.
[8] Eric Bauer and Ron Kohavi, “An Empirical Comparison of
Voting Classification Algorithms: Bagging, Boosting and
Variants” Kluwer Academic Publishers. Boston.
Manufactured in The Netherland. Vol. 36, pp. 15-139,
1999.
[9] Achmad Bisri and Romi Satria Wahono, “Penerapan
Adaboost untuk Penyelesaian Ketidakseimbangan
Kelaspada Penentuan Kelulusan Mahasiswa dengan
Metode Decision Tree. Journal of Intelligent Systems. Vol
1, pp. 27-32, 2015.
[10] Rozianiwati Yusof , Khairul Azhar Kasmiran, Aida
Mustapha, Norwati Mustapha, Nor Asma Mohd Zin,
“Techniques For Handling Imbalanced Datasets When
Producing Classifier Models” Journal of Theoretical and
Applied Information Technology, Vol. 95, pp. 1425-1440,
2017.
[11] Zhongbin Sun, QinbaoSong, XiaoyanZhu, HeliSun,
BaowenXu , YumingZhou, “A novel ensemble method
for classifying imbalanced data” Pattern Recognition, Vol.
48, pp. 1623-1637, 2015.
[12] Yubin Park, Member and Joydeep Ghosh, “ Ensembles of
a-Trees for Imbalanced Classification Problems” Journal
Of Latex Class Files, Vol. 6, pp. 1-14. 2007.
[13] Sergio Gónzalez a, Salvador García, Marcelino Lázaro ,
Aníbal R. Figueiras-Vidal and Francisco Herrera, “Class
Switching according to Nearest Enemy Distance for
learning from highly imbalanced data-sets” Science direct.
Vol.70, pp. 12-24, 2017.
[14] Bradley Efron and Robert J. Tibshirani, “An Introduction
to the Bootstrap” Chapman & Hall. New York, 1993.
[15] Esteban Alfaro, Matias Gamez and Noelia García, “An R
Package for Classification with Boosting and Bagging”,
Journal of Statistical Software Vol.54, issue. 32, pp. 11-
35, 2013.
[16] L. Breiman” Bagging Predictors”, Machine Learning.
Vol. 24, pp. 123-140, 1996.
[17] R. Barandela, R. M. Valdovinos, and J. S. S´anchez, “New
applications of ensembles of classifiers,” Pattern Anal.
App, Vol. 6, pp. 245–256, 2003.
[18] J. Blaszczynski , J. Stefanowski, Szajek, ”Local
Neighbourhood in Generalizing Bagging for Imbalanced
Data”, COPEM ECML-PKKD. Workshop Proceedings.
Solving Complex Machine Learning Problems with
Ensemble Methods.2013.
[19] S.Wang and X. Yao, “Diversity analysis on imbalanced
data sets by using ensemble models,” in IEEE Symp.
Comput. Intell. Data Mining, pp. 324–331, 2009.
[20] Y.Liu, NV. Chawla, M.Harper, E. Shriberg and A.Stolcke,
“A Study in Machine Learning from Imbalanced Data for
Sentence Boundary Detection in Speech” Computer
Speech and Language (20): pp. 468–494, 2006.
[21] NV.Chawla, KW. Bowyer, LO. Hall and
WP.Kegelmeyer,”SMOTE:synthetic minority oversampling
technique”, Journal of Artifical Intelligence
Research, vol. 16, pp. 341–378, 2002.
[22] C. Li, “Classifying Imbalanced Data Using A Bagging
Ensemble Variation (BEV)”, Conference: Proceedings of
the 45th Annual Southeast Regional Conference, March
2007. [23] S. Hido, H. Kashima, and Y. Takahashi, “Roughly
balanced bagging for imbalanced data”, Stat. Anal. Data
Min, Vol. 2, pp. 412–426, 2009.
[24] AD. Lynam, “Prediction of Oestrus in Dairy Cows: An
Application of Machine Learning to Skewed Data”,
Degree of Master of Science at the University of
Waikato, 2009.
[25] Z. Zhang, B. Krawczyk , S. Garcia, AR. Perez and F.
Herrera, “Empowering One-vs-One Decomposition with
Ensemble Learning for Multi-Class Imbalanced Data”,
Knowledge-Based Systems. Vol. pp. 106, 251–263,
2016.
[26] B. Krawczyk, M. Wozniak and G. Schaefer, “Costsensitive
decision tree ensembles for effective
imbalanced classification”, Applied Soft Computing.
Vol. 14, pp. 554-562, 2014.
[27] L. Peng, H. Zhang, Y. Chen and B. Yang, “Imbalanced
Traffic Identification Using an Imbalanced Data
Gravitation-based Classification Model”, Computer
Communications. Vol. 102, pp. 177-189, 2017.
[28] FJD. Pintor, MJF. Gomes, A. Troncoso and FM. Alvarez,
“A New Methodology Based on Imbalanced
Classification for Predicting Outliers in Electricity
Demand Time Series”, Article Energies, pp. 1-10, 2016.
[29] Yi Wang and Zhiguo Gong, “Hierarchical Classification
of Web Pages Using Support Vector Machine” ,
International Conference on Asian Digital Libraries,
pp 12-21, 2008.
[30] Mateusz Lango and Jerzy Stefanowski, “Multi-class and
feature selection extensions of Roughly Balanced
Bagging for imbalanced data ”, Journal of Intelligent
Information Systems, Vol. 49, Issue. 141, pp. 1-31,
2017.
[31] Mateusz Lango and Jerzy Stefanowski, “Applicability of
Roughly Balanced Bagging for Complex Imbalanced
Data”, Proceedings of the 4th Workshop on New
Frontiers in Mining Complex Patterns (NFMCP), pp. 62-
73. 2015.