Rick Pernak , Matthew Alvarado, Chantelle Lonsdale, Marikate Mountain, Jennifer Hegarty, Thomas Nehrkorn

Atmospheric and Environmental Research, Lexington, MA 02421, USA

Received: December 12, 2018
Revised: June 6, 2019
Accepted: October 6, 2019
Download Citation: ||https://doi.org/10.4209/aaqr.2018.12.0464  

Cite this article:
Pernak, R., Alvarado, M., Lonsdale, C., Mountain, M., Hegarty, J. and Nehrkorn, T. (2019). Forecasting Surface O3 in Texas Urban Areas Using Random Forest and Generalized Additive Models. Aerosol Air Qual. Res. 19: 2815-2826. https://doi.org/10.4209/aaqr.2018.12.0464


  • GAM quantitative and probabilistic models were built for six Texas urban areas.
  • A random forest machine learning algorithm was applied for classification.
  • The probabilistic models show no skill.
  • Quantitative and classification models exhibit some degree of success.
  • Predictive success increased in urban areas with less extreme ozone events.


We developed and evaluated three types of statistical forecasting models (quantitative, probabilistic, and classification) for predicting the maximum daily 8-hour average concentration of ozone based on meteorological and ozone monitoring data for six Texas urban areas from 2009 to 2015. The quantitative and probabilistic forecasting models were generalized additive models (GAMs), whereas the classification forecast used the random forest machine learning method. We found that for the quantitative forecasting models, five of the eight predictors (the day of week, day of the year, water vapor density, wind speed, and previous day’s ozone measurement) were significant at the α = 0.001 level for all urban areas, whereas the other three varied in significance according to the location. The quantitative forecasting for the 2016 ozone season agreed well with the associated measurements (R2 of ~0.70), but it tended to under-predict the ozone level for the days with the highest concentrations. By contrast, the probabilistic forecasting models showed little accuracy in determining the probability of concentrations exceeding policy-relevant thresholds during this season. The success rate for the random forest classification models typically exceeded 75% and would likely increase if the training data sets contained more extreme events.

Keywords: Keywords: Ozone MDA8; Ozone prediction; Generalized additive models; Random forest.

Don't forget to share this article 


Subscribe to our Newsletter 

Aerosol and Air Quality Research has published over 2,000 peer-reviewed articles. Enter your email address to receive latest updates and research articles to your inbox every second week.

Latest coronavirus research from Aerosol and Air Quality Research

2018 Impact Factor: 2.735

5-Year Impact Factor: 2.827

SCImago Journal & Country Rank