Articles online

Forecasting Surface O3 in Texas Urban Areas Using Random Forest and Generalized Additive Models

Category: Air Pollution Modeling

Volume: 19 | Issue: 12 | Pages: 2815-2826
DOI: 10.4209/aaqr.2018.12.0464
PDF | Supplemental material

Export Citation:  RIS | BibTeX

To cite this article:
Pernak, R., Alvarado, M., Lonsdale, C., Mountain, M., Hegarty, J. and Nehrkorn, T. (2019). Forecasting Surface O3 in Texas Urban Areas Using Random Forest and Generalized Additive Models. Aerosol Air Qual. Res. 19: 2815-2826. doi: 10.4209/aaqr.2018.12.0464.

Rick Pernak , Matthew Alvarado, Chantelle Lonsdale, Marikate Mountain, Jennifer Hegarty, Thomas Nehrkorn

  • Atmospheric and Environmental Research, Lexington, MA 02421, USA


  • GAM quantitative and probabilistic models were built for six Texas urban areas.
  • A random forest machine learning algorithm was applied for classification.
  • The probabilistic models show no skill.
  • Quantitative and classification models exhibit some degree of success.
  • Predictive success increased in urban areas with less extreme ozone events.


We developed and evaluated three types of statistical forecasting models (quantitative, probabilistic, and classification) for predicting the maximum daily 8-hour average concentration of ozone based on meteorological and ozone monitoring data for six Texas urban areas from 2009 to 2015. The quantitative and probabilistic forecasting models were generalized additive models (GAMs), whereas the classification forecast used the random forest machine learning method. We found that for the quantitative forecasting models, five of the eight predictors (the day of week, day of the year, water vapor density, wind speed, and previous day’s ozone measurement) were significant at the α = 0.001 level for all urban areas, whereas the other three varied in significance according to the location. The quantitative forecasting for the 2016 ozone season agreed well with the associated measurements (R2 of ~0.70), but it tended to under-predict the ozone level for the days with the highest concentrations. By contrast, the probabilistic forecasting models showed little accuracy in determining the probability of concentrations exceeding policy-relevant thresholds during this season. The success rate for the random forest classification models typically exceeded 75% and would likely increase if the training data sets contained more extreme events.


Ozone MDA8 Ozone prediction Generalized additive Models Random forest

Related Article