Bartosz Czernecki This email address is being protected from spambots. You need JavaScript enabled to view it.1, Michał Marosz2, Joanna Jędruszkiewicz3

1 Department of Meteorology and Climatology, Adam Mickiewicz University in Poznań, Poznań, Poland
2 Institute of Meteorology and Water Management - National Research Institute, Warszawa, Poland
3 Institute of Geography, Pedagogical University of Cracow, Kraków, Poland

Received: October 5, 2020
Revised: February 17, 2021
Accepted: March 20, 2021

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||  

  • Download: PDF

Cite this article:

Czernecki, B., Marosz, M., Jędruszkiewicz, J. (2021). Assessment of Machine Learning Algorithms in Short-term Forecasting of PM10 and PM2.5 Concentrations in Selected Polish Agglomerations . Aerosol Air Qual. Res.


  • PM standards are frequently exceeded in Polish cities in winter season.
  • Air quality forecasting systems can be maintained by using machine learning tools.
  • Key variables for high accuracy: wind speed, boundary layer and air quality (lagged).
  • XGBoost and RF approaches outperform neural nets and linear techniques.
  • RF and XGBoost models have comparable RMSE for PMx with some regional variations.


Air pollution continues to have a significant impact on Europeans living in urban areas. Each year, elevated concentration episodes of PMx are responsible for a large number of premature deaths (mostly due to heart diseases and strokes). Poland is one of the most polluted countries in Europe according to annual EEA reports. A high winter PMx concentration is mostly the result of high emission and unfavourable weather conditions combined with environmental features. It is crucial to create the most accurate PMx concentration forecast so as to be able to alert society on time along with the needed municipal mitigation schemes.

The research is aimed at assessing the possibility of short-term forecast of PMx concentrations by means of machine learning tools with the subsequent identification of primary meteorological covariates. The data comprises 10 years of winter hourly PM10 and PM2.5 concentrations in 4 large Polish agglomerations: Poznań, Kraków, Łódź, and Gdańsk. The research covered a total of 11 urban air quality monitoring stations, including background, traffic, and industrial types. The selected cities cover areas of high population density and quite a diverse environment stretching from the Baltic Sea coast (Tricity), through lowlands (Łódź, Poznań) to highlands (Kraków).

We applied four ML models: stepwise regression (AIC-based), two tree-based algorithms (Random Forest and XGBoost), and a neural network model. The analysis and the application of the cross-validation scheme provided a clear assessment of the optimal algorithm. The presented study confirms the high applicability of ML tools for short-term air quality prediction with the perfect prog approach. Among the used algorithms, there is a clear ranking, with the worst results achieved by linear methods and gradual enhancement through Neural Networks, Random Forest, and finally, XGBoost algorithm providing the best results. This is apparent in the regression approach and binary forecasts for threshold exceedance.

Keywords: PM10, PM2.5, Air quality, Machine learning, Short-term forecasting

Don't forget to share this article 


Subscribe to our Newsletter 

Aerosol and Air Quality Research has published over 2,000 peer-reviewed articles. Enter your email address to receive latest updates and research articles to your inbox every second week.

Aerosol and Air Quality Research (AAQR) is an independently-run non-profit journal, promotes submissions of high-quality research, and strives to be one of the leading aerosol and air quality open-access journals in the world.