Air pollution is of much concern for the society. There are advancements in the technology leading to developed life. On the
other hand it is responsible for green house gas emissions. There are different sources of GHGs like industries, agriculture construction
sites, etc. GHGs are Nitrous Oxides, Sulphur Dioxide, Carbon Oxides, Methane, CFC and O3. These green house gases are the
pollutants which hampers the quality of air. GHGs are responsible for global warming due to which there is ozone layer depletion. These
pollutants may cause various health problems. Air quality can be assessed based on the Air quality levels (AQL). Air quality index can be
obtained through different sensors or monitoring stations based on which air pollution related health concerns can be predicted. In this
paper analysis was done on the dataset containing AQI of air pollutants such as NO2, O3, CO and SO2. The Random Forest algorithm
shows accuracy of 93.467% while the Multiclass classifier algorithm shows the accuracy of 94.61%. The results shown that Multiclass
classifier is better than the Random Forest algorithm.
Published In:IJCSN Journal Volume 7, Issue 1
Date of Publication : February 2018
Pages : 36-39
Figures :03
Tables : 03
Ranjana Waman Gore : did her BE (Computer
Science & Engineering) from BAMU University in 2006. She did
her post graduation in 2011 from GECA. She has worked as
lecturer in Government College of Engineering, Aurangabad. She
has worked at SPWEC College as head of the department and
Assistant Professor. Currently she is working as assistant
professor at Marathwada Institute of Technology, Aurangabad. She
published two papers in IEEE International conferences, four
papers in International Journals.
Deepa S. Deshpande : did her BE(Computer) from
Pune University in 1995. She did her MTech(Computer) from Pune
university in 2006. She has completed her PhD from SRTMU in
2015. She has total 23 publications at national/International level.
She is having 21 years of teaching experience and 1 year of
industrial experience.
Data Mining; Random Forest; Multiclass Classifier; Air Quality Index.
It is very important to analyze the air quality to have good
quality of life. This analysis plays very important role to
develop smart city and to devise environmental policies.
The data collected was analyzed with Multiclass label
classifier and Random Forest classification techniques.
The RF algorithm accuracy is 93.467% and accuracy of
Multiclass label algorithm is 94.61%. The Multiclass label
algorithm gives better accuracy than the RF classifier.
[1] https://www.kaggle.com/sogun3/uspollution/version/1
15-10-1016
[2] Ruhul Amin Dicken, “Analysis and classification of
respiratory health risk with respect to air pollution levels”,
IEEE, 2015, pp. 1-6.
[3] Elia Georgiana Dragomir, “Air Quality Index Prediction
using K-Nearest Neighbor Technique”, Seria Matematica
- Informatica – Fizica, Vol. LXII No. 1,2010, pp. 103 –
108
[4] Ranjana Waman Gore, Deepa S. Deshpande, “An
Approach for Classification of Health Risks
Based on Air Quality Levels”, International Conference
on Intelligent Systems and Information
Management(ICISIM), 2017, IEEE.
[5] Leo Brieman, “Random Forests”, Machine Learning, vol.
45, 2001, pp. 5-32.
[6] Carmen Capilla, “Neural networks data mining in an air
quality database”, International Environmental
Modelling and SoftwareSociety (iEMSs)8th International
Congress on Environmental Modelling and Software
Toulouse, France, Sabin,e Sauvage, José-Miguel
Sánchez-Pérez, Andrea Rizzoli (Eds.), 2016, pp 1279-
1286.
[7] Krzysztof Siwek A, Stanislaw Osowski, “Data Mining
Methods For Prediction Of Air Pollution”, International
Journal Applied Mathathematics Computer Science,
2016, Vol. 26, No. 2, pp. 467–478.
[8] S. Christy, Dr. V. Khanaa, “ Data Mining in the
prediction of impacts of ambient air quality data analysis
in urban and industrial area”, International Journal on
Recent and Innovation Trends in Computing and
Communication ISSN: 2321-8169 Vol. 4 Issue: 2, pp.
153 – 157.
[9] Han J. and Kamber M. Data Mining: Concepts and
techniques Morgan Kaufmann Publishers (2001).
[10] Sheng-Tun Lia, Li-Yen Shueb, “ Data mining to aid
policy making in air pollution management”, Expert
Systems with Applications,2004, vol. 27, pp. 331-340.
[11] Kavi K. Khedo, Rajiv Perseedoss and Avinash Mungur,
“A wireless sensor network air pollution monitoring
system”, International journal of Wireless and mobile
network, 2010, Vol 2, issue 2.
[12] Ioannis N. Athanasiadis, Kostas D. Karatzas and Pericles
A. Mitkas. "Classification techniques for air quality
forecasting." Fifth ECAI Workshop on Binding
Environmental Sciences and Artificial Intelligence, 17th
European Conference on Artificial Intelligence, Riva del
Garda, Italy, August 2006.
[13] Pandey, Gaurav, Bin Zhang, and Le Jian. "Predicting
submicron air pollution indicators: a machine learning
approach." Environmental Science: Processes & Impacts
15.5 (2013): 996-1005.Elsevier, 2004, pp. 331–340.