Call For Papers
Contact Us

  Air Data Analysis for Predicting Health Risks  
  Authors : Ranjana Gore; Deepa Deshpande
  Cite as:


Air pollution is of much concern for the society. There are advancements in the technology leading to developed life. On the other hand it is responsible for green house gas emissions. There are different sources of GHGs like industries, agriculture construction sites, etc. GHGs are Nitrous Oxides, Sulphur Dioxide, Carbon Oxides, Methane, CFC and O3. These green house gases are the pollutants which hampers the quality of air. GHGs are responsible for global warming due to which there is ozone layer depletion. These pollutants may cause various health problems. Air quality can be assessed based on the Air quality levels (AQL). Air quality index can be obtained through different sensors or monitoring stations based on which air pollution related health concerns can be predicted. In this paper analysis was done on the dataset containing AQI of air pollutants such as NO2, O3, CO and SO2. The Random Forest algorithm shows accuracy of 93.467% while the Multiclass classifier algorithm shows the accuracy of 94.61%. The results shown that Multiclass classifier is better than the Random Forest algorithm.


Published In : IJCSN Journal Volume 7, Issue 1

Date of Publication : February 2018

Pages : 36-39

Figures :03

Tables : 03


Ranjana Waman Gore : did her BE (Computer Science & Engineering) from BAMU University in 2006. She did her post graduation in 2011 from GECA. She has worked as lecturer in Government College of Engineering, Aurangabad. She has worked at SPWEC College as head of the department and Assistant Professor. Currently she is working as assistant professor at Marathwada Institute of Technology, Aurangabad. She published two papers in IEEE International conferences, four papers in International Journals.

Deepa S. Deshpande : did her BE(Computer) from Pune University in 1995. She did her MTech(Computer) from Pune university in 2006. She has completed her PhD from SRTMU in 2015. She has total 23 publications at national/International level. She is having 21 years of teaching experience and 1 year of industrial experience.


Data Mining; Random Forest; Multiclass Classifier; Air Quality Index.

It is very important to analyze the air quality to have good quality of life. This analysis plays very important role to develop smart city and to devise environmental policies. The data collected was analyzed with Multiclass label classifier and Random Forest classification techniques. The RF algorithm accuracy is 93.467% and accuracy of Multiclass label algorithm is 94.61%. The Multiclass label algorithm gives better accuracy than the RF classifier.


[1] https://www.kaggle.com/sogun3/uspollution/version/1 15-10-1016 [2] Ruhul Amin Dicken, “Analysis and classification of respiratory health risk with respect to air pollution levels”, IEEE, 2015, pp. 1-6. [3] Elia Georgiana Dragomir, “Air Quality Index Prediction using K-Nearest Neighbor Technique”, Seria Matematica - Informatica – Fizica, Vol. LXII No. 1,2010, pp. 103 – 108 [4] Ranjana Waman Gore, Deepa S. Deshpande, “An Approach for Classification of Health Risks Based on Air Quality Levels”, International Conference on Intelligent Systems and Information Management(ICISIM), 2017, IEEE. [5] Leo Brieman, “Random Forests”, Machine Learning, vol. 45, 2001, pp. 5-32. [6] Carmen Capilla, “Neural networks data mining in an air quality database”, International Environmental Modelling and SoftwareSociety (iEMSs)8th International Congress on Environmental Modelling and Software Toulouse, France, Sabin,e Sauvage, José-Miguel Sánchez-Pérez, Andrea Rizzoli (Eds.), 2016, pp 1279- 1286. [7] Krzysztof Siwek A, Stanislaw Osowski, “Data Mining Methods For Prediction Of Air Pollution”, International Journal Applied Mathathematics Computer Science, 2016, Vol. 26, No. 2, pp. 467–478. [8] S. Christy, Dr. V. Khanaa, “ Data Mining in the prediction of impacts of ambient air quality data analysis in urban and industrial area”, International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Vol. 4 Issue: 2, pp. 153 – 157. [9] Han J. and Kamber M. Data Mining: Concepts and techniques Morgan Kaufmann Publishers (2001). [10] Sheng-Tun Lia, Li-Yen Shueb, “ Data mining to aid policy making in air pollution management”, Expert Systems with Applications,2004, vol. 27, pp. 331-340. [11] Kavi K. Khedo, Rajiv Perseedoss and Avinash Mungur, “A wireless sensor network air pollution monitoring system”, International journal of Wireless and mobile network, 2010, Vol 2, issue 2. [12] Ioannis N. Athanasiadis, Kostas D. Karatzas and Pericles A. Mitkas. "Classification techniques for air quality forecasting." Fifth ECAI Workshop on Binding Environmental Sciences and Artificial Intelligence, 17th European Conference on Artificial Intelligence, Riva del Garda, Italy, August 2006. [13] Pandey, Gaurav, Bin Zhang, and Le Jian. "Predicting submicron air pollution indicators: a machine learning approach." Environmental Science: Processes & Impacts 15.5 (2013): 996-1005.Elsevier, 2004, pp. 331–340.