Articles online

Application of the XGBoost Machine Learning Method in PM2.5 Prediction: A Case Study of Shanghai

Category: Air Pollution Modeling

Volume: 20 | Issue: 1 | Pages: 128-138
DOI: 10.4209/aaqr.2019.08.0408

Export Citation:  RIS | BibTeX

To cite this article:
Ma, J., Yu, Z., Qu, Y., Xu, J. and Cao, Y. (2020). Application of the XGBoost Machine Learning Method in PM2.5 Prediction: A Case Study of Shanghai. Aerosol Air Qual. Res. 20: 128-138. doi: 10.4209/aaqr.2019.08.0408.

Jinghui Ma1,2,3, Zhongqi Yu 2,3, Yuanhao Qu2,3, Jianming Xu2,3,4, Yu Cao2,3

  • 1 Fudan University, Shanghai 200433, China
  • 2 Shanghai Typhoon Institute, Shanghai Meteorological Service, Shanghai 200030, China
  • 3 Shanghai Key Laboratory of Meteorology and Health, Shanghai Meteorological Service, Shanghai 200030, China
  • 4 Anhui Province Key Laboratory of Atmospheric Science and Satellite Remotes Sensing, Hefei 230000, China


  • XGBoost can improve the accuracy of WRF-Chem prediction of PM2.5.
  • XGBoost model can accurately predict winter heavy pollution.
  • It provides a new method enhance the capacity of air quality forecasting in China.


Air quality forecasting is crucial to reducing air pollution in China, which has detrimental effects on human health. Atmospheric chemical-transport models can provide air pollutant forecasts with high temporal and spatial resolution and are widely used for routine air quality predictions (e.g., 1–3 days in advance). However, the model’s performance is limited by uncertainties in the emission inventory and biases in the initial and boundary conditions, as well as deficiencies in the current chemical and physical schemes. As a result, experimentation with several new methods, such as machine learning, is occurring in the field of air quality forecasting. This study combined hourly PM2.5 mass concentration forecasts from an operational air quality numerical prediction system (WRF-Chem) at the Shanghai Meteorological Service (SMS) with comprehensive near-surface measurements of air pollutants and meteorological conditions to develop a machine learning model that estimates the daily PM2.5 mass concentration in Shanghai, China. With correlation coefficients that are higher by 50–100% and a standard deviation that is lower by 14–24 µg m–3, the machine learning model provides significantly better daily forecasting of PM2.5 than the WRF-Chem model. Thus, this research offers a new technique for enhancing air quality forecasting in China.


XGBoost algorithm PM2.5 WRF-Chem Machine learning

Related Article