Pawan Gupta This email address is being protected from spambots. You need JavaScript enabled to view it.1,2, Shanshan Zhan3,  Vikalp Mishra3,7, Aekkapol Aekakkararungroj4, Amanda Markert3,7, Sarawut Paibong5, Farrukh Chishtie4,6 

1 Universities Space Research Association (USRA), Huntsville, USA
2 Marshall Space Flight Center, Huntsville, AL, USA
3 Earth System Science Center, The University of Alabama in Huntsville, Huntsville, AL, USA
4 Asian Disaster Preparedness Center, Bangkok, Thailand
5 Thai Pollution Control Department, Bangkok, Thailand
6 Spatial Informatics Group, Pleasanton, CA, USA
7 SERVIR Science Coordination Office, NASA Marshall Space Flight Center, Huntsville, AL, USA

Received: May 7, 2021
Revised: August 15, 2021
Accepted: September 10, 2021

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.

Download Citation: ||  

Cite this article:

Gupta, P., Zhan, S., Mishra, V., Aekakkararungroj, A., Markert, A., Paibong, S., Chishtie, F. (2021). Machine Learning Algorithm for Estimating Surface PM2.5 in Thailand. Aerosol Air Qual. Res.


  • The local ground measurements can help calibrate outputs from global models.
  • The MERRA2 PM2.5 data are evaluated, and bias corrected using machine learning algorithm.
  • The bias corrected PM2.5 represents diurnal and seasonal variability in Thailand.


We have used NASA's Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) reanalysis data of aerosols and meteorology into a machine learning algorithm (MLA) to estimate surface PM2.5 concentration in Thailand. One year of hourly data from 51 ground monitoring stations in Thailand was spatiotemporally collocated with MERRA2 fields. The integrated data then used to train and validate a supervised MLA' random forest' to estimate hourly and daily PM2.5 concentrations. The MLA is cross-validated using a 10-fold random sampling approach. The trained MLA can estimate PM2.5 with close to zero mean bias across the country. The correlation coefficient of 0.95 with slope and intercept values of 0.95 and 0.88 are achieved between observed and estimated PM2.5. The MLA also shows underestimation at hourly scale under very clean conditions (PM2.5 < 10 µg m-3) and overestimation during high loading (PM2.5 > 80 µg m-3). The hourly data also demonstrate high skill in following the diurnal cycle during different seasons of the year. The daily mean PM2.5 (24-hour) values follow day-to-day variability very well, showing high value during winter months (November to February) and lower during other seasons. The trained MLA has the potential to reprocess the MERRA2 timeseries for the region, and the bias corrected data can be used in other applications such as long-term trend analysis and health exposure studies. The MLA can also be applied to GEOS forecasted fields to generate bias corrected air quality forecasts for the region.

Keywords: Thailand, MERRA2, PM2.5, Air quality, Machine learning

Share this article with your colleagues 


Subscribe to our Newsletter 

Aerosol and Air Quality Research has published over 2,000 peer-reviewed articles. Enter your email address to receive latest updates and research articles to your inbox every second week.

Aerosol and Air Quality Research (AAQR) is an independently-run non-profit journal that promotes submissions of high-quality research and strives to be one of the leading aerosol and air quality open-access journals in the world. We use cookies on this website to personalize content to improve your user experience and analyze our traffic. By using this site you agree to its use of cookies.