Wani Tamas1, Gilles Notton1, Christophe Paoli 1,2, Marie-Laure Nivet1, Cyril Voyant1,3

  • 1 University of Corsica - Pasquale Paoli, UMR CNRS 6134 SPE, 20250 Corte, France
  • 2 Galatasaray University, Department of Computer Engineering, TR-34357 Istanbul, Turkey
  • 3 CHD Castelluccio, radiophysics unit, BP85 20177 Ajaccio, France

Received: April 10, 2015
Revised: July 2, 2015
Accepted: August 21, 2015
Download Citation: ||https://doi.org/10.4209/aaqr.2015.03.0193  

  • Download: PDF

Cite this article:
Tamas, W., Notton, G., Paoli, C., Nivet, M.L. and Voyant, C. (2016). Hybridization of Air Quality Forecasting Models Using Machine Learning and Clustering: An Original Approach to Detect Pollutant Peaks. Aerosol Air Qual. Res. 16: 405-416. https://doi.org/10.4209/aaqr.2015.03.0193


  • High accuracy forecasting of air pollution peaks with machine learning methods.
  • 3 methods: simple MLP, hybridized MLP with hierarchical and k-means clustering.
  • Robustness verified by multi-location and multi-pollutant (PM10, O3, NO2) study.
  • ROC curves used to produce a complete sensitivity analysis.
  • Combination of clustering and MLP improve forecasting results.



This paper presents an original approach combining Artificial Neural Networks (ANNs) and clustering in order to detect pollutant peaks. We developed air quality forecasting models using machine learning methods applied to hourly concentrations of ozone (O3), nitrogen dioxide (NO2) and particulate matter (PM10) 24 hours ahead. MultiLayer Perceptron (MLP) was used alone, then hybridized successively with hierarchical clustering and with a combination of self-organizing map and k-means clustering. Clustering methods were used to subdivide the dataset, and then an MLP was trained on each subset. Two urban sites of Corsica Island in the western Mediterranean Sea were investigated. These models showed a good global precision (Index of Agreement reaching 0.87 for O3, 0.80 for NO2 and 0.74 for PM10). Considering it is particularly important than forecasting model used on an operational basis correctly predict pollution peaks, a sensitivity analysis was performed using Receiver Operating Characteristic curves (ROC curves). It allowed to evaluate the behaviour and the robustness of the models for high concentration situations. The results show that for PM10 and O3, hybrid models made of a combination of clustering and MLP outperform classical MLP most of the time for high concentration prediction. An operational tool has been built with the models presented in this paper, and is used for air quality forecasting in Corsica.

Keywords: Air quality forecasting; ROC curve; Multilayer perceptron; Clustering

Share this article with your colleagues 


Subscribe to our Newsletter 

Aerosol and Air Quality Research has published over 2,000 peer-reviewed articles. Enter your email address to receive latest updates and research articles to your inbox every second week.

77st percentile
Powered by
   SCImago Journal & Country Rank

2022 Impact Factor: 4.0
5-Year Impact Factor: 3.4

Aerosol and Air Quality Research partners with Publons

CLOCKSS system has permission to ingest, preserve, and serve this Archival Unit
CLOCKSS system has permission to ingest, preserve, and serve this Archival Unit

Aerosol and Air Quality Research (AAQR) is an independently-run non-profit journal that promotes submissions of high-quality research and strives to be one of the leading aerosol and air quality open-access journals in the world. We use cookies on this website to personalize content to improve your user experience and analyze our traffic. By using this site you agree to its use of cookies.