Cite this article: Tamas, W., Notton, G., Paoli, C., Nivet, M.L. and Voyant, C. (2016). Hybridization of Air Quality Forecasting Models Using Machine Learning and Clustering: An Original Approach to Detect Pollutant Peaks.
Aerosol Air Qual. Res.
16: 405-416. https://doi.org/10.4209/aaqr.2015.03.0193
High accuracy forecasting of air pollution peaks with machine learning methods.
3 methods: simple MLP, hybridized MLP with hierarchical and k-means clustering.
Robustness verified by multi-location and multi-pollutant (PM10, O3, NO2) study.
ROC curves used to produce a complete sensitivity analysis.
Combination of clustering and MLP improve forecasting results.
This paper presents an original approach combining Artificial Neural Networks (ANNs) and clustering in order to detect pollutant peaks. We developed air quality forecasting models using machine learning methods applied to hourly concentrations of ozone (O3), nitrogen dioxide (NO2) and particulate matter (PM10) 24 hours ahead. MultiLayer Perceptron (MLP) was used alone, then hybridized successively with hierarchical clustering and with a combination of self-organizing map and k-means clustering. Clustering methods were used to subdivide the dataset, and then an MLP was trained on each subset. Two urban sites of Corsica Island in the western Mediterranean Sea were investigated. These models showed a good global precision (Index of Agreement reaching 0.87 for O3, 0.80 for NO2 and 0.74 for PM10). Considering it is particularly important than forecasting model used on an operational basis correctly predict pollution peaks, a sensitivity analysis was performed using Receiver Operating Characteristic curves (ROC curves). It allowed to evaluate the behaviour and the robustness of the models for high concentration situations. The results show that for PM10 and O3, hybrid models made of a combination of clustering and MLP outperform classical MLP most of the time for high concentration prediction. An operational tool has been built with the models presented in this paper, and is used for air quality forecasting in Corsica.
Keywords: Air quality forecasting; ROC curve; Multilayer perceptron; Clustering