Xinpeng Wang , Wenbin Sun, Zhen Wang, Yahui Wang, Hongkang Ren

College of Geoscience and Surveying Engineering, China University of Mining and Technology-Beijing, Beijing 100083, China


Received: December 5, 2018
Revised: March 9, 2019
Accepted: April 15, 2019

Download Citation: ||https://doi.org/10.4209/aaqr.2018.12.0449  


Cite this article:

Wang, X., Sun, W., Wang, Z., Wang, Y. and Ren, H. (2019). Meteorological Parameters and Gaseous Pollutant Concentrations as Predictors of Ground-level PM2.5 Concentrations in the Beijing-Tianjin-Hebei Region, China. Aerosol Air Qual. Res. 19: 1844-1855. https://doi.org/10.4209/aaqr.2018.12.0449


HIGHLIGHTS

  • Feasibility of gaseous pollutant as predictors for PM2.5 estimation was confirmed.
  • The underestimation of high PM2.5 concentrations can be improved.
  • We can provide highly accurate maps of PM2.5 distribution.
 

ABSTRACT


Ground-level PM2.5 concentrations—especially those during episodes of heavy pollution—are severely underestimated by mixed-effects models that ignore the effects of primary pollutant emissions and secondary pollutant conversion. In this work, meteorological parameters and NO2, SO2, CO, and O3 concentrations are introduced as predictors to a mixed-effects model to improve the estimated concentration of PM2.5, which is based on the Moderate Resolution Imaging Spectroradiometer (MODIS) aerosol optical depth (AOD). The Beijing-Tianjin-Hebei (JingJinJi) region is used as the study area. The model provides an overall cross-validation (CV) R2 of 0.84 and root-mean-square prediction error (RMSE) of 33.91 µg m–3. The CV R2 and RMSE of the proposed model are higher by 0.11 and lower by 9.16 µg m–3, respectively, than those of a model lacking gaseous pollutants as predictors. The R2 and RMSE of the proposed model increases and decreases by 0.14 and 13.37 µg m–3, respectively, when PM2.5 concentrations exceed 75 µg m–3. The high values predicted for the PM2.5 concentration indicate a drastic improvement in the estimation, and the spatial distribution generated by the model for periods of heavy pollution is highly consistent with that inferred from monitoring data. Thus, the proposed model can be used to generate highly accurate maps of the PM2.5 distribution for long-term and short-term exposure studies and to correctly classify exposure in heavily polluted areas.


Keywords: PM2.5; Aerosol optical depth; Gaseous pollutant; Heavy pollution; Mixed-effects model.


INTRODUCTION


Epidemiological studies have illustrated that PM2.5, defined as particulate matter with an aerodynamic diameter of < 2.5 µm, has adverse effects on human health (Song et al., 2014). Long-term exposure studies have shown that the risks of ischemic heart disease (Crouse et al., 2012), lung cancer (Pope III et al., 2002), and cardiovascular mortality (Pope III et al., 2004) increase by 16%, 8%, and 10%, respectively, with every 10 µg m–3 increment in the PM2.5 concentration. Short-term PM2.5 exposure studies have emphasized that every 10 µg m–3 elevation in the 2-day mean PM2.5 concentration increases the incidence of respiratory disease (Kloog et al., 2012) and myocardial infarction (Zanobetti et al., 2009) by 0.7% and 2.25%, respectively. Each 10 µg m–3 increment in the daily PM2.5 concentration is associated with elevated hospital admission rates and increases the incidence of heart failure (Dominici et al., 2006), acute coronary syndrome (Belleudi et al., 2010), and pediatric asthma (Strickland et al., 2015) by 1.28%, 2.3%, and 1.3%, respectively. The accurate estimation of PM2.5 concentration is a prerequisite of studies on the effects of long-term and short-term PM2.5 exposure on human health.

The PM2.5 exposure level of a certain population has traditionally been estimated on the basis of PM2.5 concentration data acquired through ground monitoring over a certain distance (Laden et al., 2006). However, the sparseness and uneven spatial distribution of PM2.5 monitoring sites will introduce uncertainty to the estimation of PM2.5 exposure and result in the underestimation of health risks (Hu et al., 2014). Remotely sensed data for aerosol optical depth (AOD) have been widely used to estimate PM2.5 concentrations because of their spatial and temporal continuity (Donkelaar et al., 2011). Different statistical models, such as linear regression models (Gupta et al., 2006), generalized linear regression models (Liu et al., 2007; You et al., 2015), linear mixed models (Lee et al., 2011; Xie et al., 2015), geographically weighted regression models (Hu et al., 2014; You et al., 2016), generalized additive models (Paciorek et al., 2008; Strawa et al., 2013), and Bayesian statistical models (Chang et al., 2014; Lv et al., 2016), have been developed for the estimation of PM2.5 concentration from satellite-derived AOD data. Nevertheless, AOD-based methods severely underestimate high PM2.5 concentrations. To illustrate, PM2.5 concentrations exceeding 40 µg m–3 in the United States and 60 µg m–3 in China are severely underestimated by these methods (Liu et al., 2007; Gupta and Christopher, 2009a; Ma et al., 2014; Li et al., 2017).

The underestimation of PM2.5 concentration will introduce uncertainty to long-term and short-term PM2.5 exposure studies. The Beijing-Tianjin-Hebei region (JingJinJi) has a large urban scale and human population and is characterized by high energy consumption (Wang et al., 2016). Thus, residents of this region are exposed to high concentrations of pollutant emissions. In recent years, several incidences of extremely high PM2.5 concentrations that exceeded 500 µg m–3 and that persisted for several days have been reported in the North China Plain, which is represented by the JingJinJi region (Andersson et al., 2015). The underestimation of high PM2.5 concentrations in heavily polluted areas will increase the exposure risk associated with PM2.5 and result in the severe miscalculation of the effects of long-term and short-term PM2.5 exposure on public health. Therefore, improving the accuracy of PM2.5 estimation is crucial for reducing the misclassification of PM2.5 exposure levels and promoting epidemiological research in heavily polluted areas.

Early studies used AOD as the sole predictor of the surface PM2.5 concentration. AOD is the extinction coefficient of light that originates from particle scattering over the entirety of a vertical column. The PM2.5 concentration is defined as the mass concentration of dry particles measured near the surface of the column. Thus, AOD values and PM2.5 concentration are not strictly correlated (Chudnovsky et al., 2013; Li et al., 2015). Therefore, the AOD-based estimation of the PM2.5 concentration may be inaccurate (Saunders et al., 2014). Meteorological parameters (MET), such as wind direction, wind speed, temperature, humidity, and boundary layer height, are used as predictors to improve the accuracy of surface PM2.5 concentration estimation (Paciorek et al., 2008; Liu et al., 2009). The characteristics of surface weather, however, are not the only determinants of air pollution development in the JingJinJi region (Miao et al., 2015). In addition to unfavorable weather conditions, the excessive emission of primary pollutants and the secondary conversion of pollutants to particles are the main causes of heavy-pollution incidences in the JingJinJi region (Sun et al., 2014; Wang et al., 2014). Thus, additional predictors should be introduced to resolve the underestimation of high PM2.5 concentrations.

PM2.5 is a complex mixture of particles that is primarily composed of SO42–, NO3, NH4+, elemental C, and organic C. The source emission of atmospheric pollutants and the conversion of gas pollutants are important sources of PM2.5 (Kloog et al., 2012; Meng et al., 2014). Numerous scholars have improved the accuracy of PM2.5 concentration estimation by introducing point emission and area-source emission data for North America to estimation models (Kloog et al., 2012; Strawa et al., 2013; Lee et al., 2016). Nevertheless, the PM2.5 concentration in North America is lower than that in the JingJinJi region. Thus, the effects of source data on the estimation of PM2.5 concentration have to be determined. The gaseous precursors of water-soluble inorganic salts include NO2 and SO2. These pollutants are homologous to primary pollutant emissions and can be oxidized into SO42– and NO3 in the atmosphere. Atmospheric SO42– and NO3, in turn, are the main sources of fine particulates in the JingJinJi region (Zhang et al., 2007). Song et al. (2015) developed a statistical model for the PM2.5 concentration in Xi’an, China. Their model indicated that the PM2.5 concentration is strongly correlated with the concentrations of the gaseous pollutants (GASs) NO2, SO2, CO, and O3. Thus, these pollutants can be used as auxiliary variables for PM2.5 prediction. Zheng et al. (2016) corrected the model for the estimation of the annual average PM2.5 concentration in the JingJinJi, Yangtze River, and Pearl River Delta regions by introducing the values for the annual average NO2. Nevertheless, their approach is suitable only for the estimation of the long-term effects of PM2.5 exposure.

The mixed-effects model proposed by Lee et al. (2011) has greatly improved the accuracy of PM2.5 estimation by accounting for the temporal heterogeneity of PM2.5 concentrations and AOD values. In this work, we used AOD, MET, and GASs as predictive variables in the construction of a linear mixed model to resolve the underestimation of high PM2.5 concentrations. We take the JingJinJi region as the study area. Our approach increases the accuracy of PM2.5 concentration estimation and reduces PM2.5 exposure misclassification by accounting for the spatial and temporal distribution of PM2.5 concentrations. The rest of this paper is organized as follows: The study area, input datasets, and model structure are described in “Materials and Methods.” The results for the model fitting, cross-validation (CV), and PM2.5 spatiotemporal distribution, and a comparison with other models are presented in “Results and Discussion.” 


MATERIALS AND METHODS



Study Area

The JingJinJi region is located north of the North China Plain, east of Taihang Mountain, and south of Yanshan. Its terrain is elevated in the northwest and depressed in the southeast (Fig. 1(b)). The JingJinJi region has an area of 218,000 km2 and a population of 110,000,000. It is a core economic area in northern China. Energy consumption has gradually intensified in the region with the development of the national economy. The region has experienced several incidences of persistent heavy-pollution weather during which PM2.5 concentrations exceeded 500 µg m–3.


Fig. 1. Study area and the locations of PM2.5 monitoring sites and meteorological stations. (a) Location of JingJinJi in China, (b) Locations of meteorological stations and elevation of JingJinJi, and (c) Locations of PM2.5 monitoring sites.Fig. 1. Study area and the locations of PM2.5 monitoring sites and meteorological stations. (a) Location of JingJinJi in China, (b) Locations of meteorological stations and elevation of JingJinJi, and (c) Locations of PM2.5 monitoring sites.


Input Data


MODIS AOD Data

MODIS satellite data were retrieved from Terra and Aqua, the earth observation system satellites of the National Aeronautics and Space Administration (NASA). Terra and Aqua cross the equator at approximately 10:30 and 13:30 local time, respectively. Collection 6 (C6) is the latest version of MODIS Aerosol Data as of 2014. The Deep Blue algorithm (DB) of C6 has better AOD coverage and higher data quality than Collection 5.1 (Lee et al., 2016). Therefore, the AOD data collected by Aqua C6 DB over the period of January 1, 2014 to December 31, 2014, were used in this work. The data had a resolution of 10 km (MODIS parameter name: Deep_Blue_Aerosol_Optical_Depth_550_land). 


Ground Monitoring Data

The ground monitoring data used in this work included data for PM2.5, NO2, SO2, CO, and O3 concentrations. The hourly ground monitoring data collected from January 1, 2014 to December 31, 2014, were retrieved from the official website of the China Environmental Monitoring Center (CEMC; http://113.108.142.147:20035/emcpublish/) and the Beijing Environmental Monitoring Center (BJEMC; http://zx.bjmemc.com.cn/). CEMC had 80 monitoring sites during the period of January 1 to April 31, 2014. After May 1, 23 monitoring sites were added in Beijing. Thus, CMEC and BJEMC had 103 monitoring sites in 2014. The distribution map of the PM2.5 monitoring sites is shown in Fig. 1(c).

PM2.5, NO2, SO2, O3, and CO data collected at 13:00 and 14:00 were extracted to match the satellite transit time and averaged as the data for PM2.5 prediction. 


Meteorological Data

Hourly MET data collected over the period of January 1, 2014, to December 31, 2014, by 171 meteorological monitoring stations (Fig. 1(b)) were retrieved from the Public Weather Service Center of the China Meteorological Administration. The data included wind direction, wind speed, pressure, temperature, and humidity. Daily 13:00 and 14:00 meteorological data were extracted to match the satellite transit time and averaged as the data for PM2.5 prediction. 


Data Processing and Integration

Grid cells covering the whole JingJinJi region were first constructed with a resolution of 10 km for data integration. The 103 PM2.5 monitoring sites were matched to the corresponding grid cells on the basis of latitude and longitude, and the grid wherein PM2.5 monitoring sites were located was designated as the grid monitoring site. If multiple PM2.5 monitoring sites were located in the same grid, all monitoring site data located in that grid were averaged as the grid monitoring result. A total of 65 PM2.5 grid monitoring sites were finally included. The value of AOD that greater than 0 over a window size of 3 × 3 pixels centered at a given PM2.5 monitoring site was averaged as the matching result when PM2.5 and AOD data were matched. Meteorological monitoring data were assigned to each grid through the nearest-neighbor method to represent the meteorological conditions of the grid monitoring site. NO2, SO2, CO, and O3 were interpolated onto the 10 km grid data through the Kriging method. DEM data with a resolution of 30 m were resampled into data with a resolution of 10 km and matched with the PM2.5 data of the grid monitoring site. In addition, the days with fewer than three matched data records were discarded during the data match process. 


Method

Given the temporal heterogeneity of PM2.5 concentrations, AOD values, and GAS concentrations, a linear mixed model was developed to account for the daily variability in the relationship of PM2.5 with AOD and GASs. Furthermore, given the limited geographical area of the JingJinJi region, PM2.5–(AOD, GAS) relationships were assumed to exhibit minimal spatial variations on a given day, and spatial nonstationary variations on a regional scale were ignored (Lee et al., 2011; Zheng et al., 2016). The linear mixed model that used AOD, GASs, and MET as predictors was designated the “AGM model” and is described below:

 

where PMi,j is the PM2.5 concentration at grid monitoring site i on day j; AODi,j, NO2i,j, SO2i,j, COi,j, O3i,j, PRSi,j, WDi,j, WSi,j, TMPi,j, RHi,j, and ALTi,j are the AOD, NO2, SO2, CO, O3, pressure, wind direction, wind speed, temperature, humidity, and elevation value at grid i on day j, respectively; and β0 and β0,j are the fixed and random intercepts, respectively. βk (k = 1, 2, …, 5) and the βk,j (k = 1, 2, …, 5) are the fixed and day-specific random slopes for the predictors, respectively; εi,j is the error term at grid i on day j; and Ψ is the variance-covariance matrix for day-specific random effects. Fixed effects represent the averaged effects of the predictor on PM2.5 over the entirety of the study period, whereas random effects account for the daily variability in relationships between independent and dependent variables (Hu et al., 2014; Zheng et al., 2016). The model was fitted with AOD and MET (“AM model”) parameters to assess the improvements resulting from the introduction of GASs as predictors. 


Model Validation

A 10-fold CV based on grid monitoring sites was performed to evaluate the performance of the model with the final goal of predicting the grid PM2.5 concentration. During verification, data from 90% of the grid monitoring sites were selected as the training data, and data from the remaining 10% of the grid monitoring sites were used as the validation data. This process was repeated 10 times, and all grid monitoring sites were tested. In addition, the spatial interpolation of GASs has been conducted based on training data. That is, the GASs in the validation data are interpolated from training data. The RMSE, coefficient of determination (R2), slope, and intercept were used to estimate the CV performance of the model. 


RESULTS



Descriptive Statistics

Table 1 shows the descriptive statistics of the independent and dependent variables used in model fitting and validation. In 2014, the maximum PM2.5 concentration in the JingJinJi region was 858 µg m–3 with an annual mean value of 77.3 µg m–3 and a standard deviation of 83.54. The average value and standard deviation of AOD were 0.82 and 0.87, respectively. The average PM2.5 concentration at each monitoring site and the annual mean AOD are shown in Fig. S2. The average PM2.5 concentration in the JingJinJi region increased from the northwest to the southeast under the influence of topography and land-use type. The lowest and highest PM2.5 values of 29 and 139 µg m–3, respectively, were recorded at Zhangjiakou in the northwest and at Xingtai in the south, respectively. The spatial distribution of AOD in the JingJinJi region was consistent with that of PM2.5. Specifically, PM2.5 concentrations were high in areas with high AOD values and vice versa. However, the spatial distribution of PM2.5 and AOD during different seasons contradicted the above situation (Fig. S2). The variable seasonal spatial distribution of PM2.5 and AOD indicates that the relationship between PM2.5 and AOD changed with time.


Table 1. Descriptive statistics of dependent and independent variables (N = 12,955).

The results of the Pearson’s correlation analysis for the PM2.5 and independent variables are shown in Table 2. Correlation analysis revealed that NO2 and CO had the strongest correlations with PM2.5 concentrations, with correlation coefficients of 0.748 and 0.697, respectively. AOD and SO2 had the next strongest correlations with PM2.5 concentrations, with correlation coefficients of 0.6 and 0.565, respectively. The results of correlation analysis illustrate the feasibility of introducing GASs to PM2.5 prediction models as predictors. The multiple collinearity problems that exist among GASs were ignored given that NO2, SO2, CO, and O3 represent different pollutant constituents (Zhang et al., 2007) and PM2.5 concentration prediction is the focus of this study (Reid et al., 2015).


Table 2. Variable correlation statistics (N = 12,955).


Model Fitting and Validation

Fig. 2 shows the scatter plots for the model fitting and CV of the AGM and AM models. The CV R2 values for the AM and the AGM models were 0.73 and 0.84, respectively. The CV RMSE values for the AM and AGM models were 43.07 and 33.91 µg m–3, respectively. The CV R2 and RMSE obtained by the models were lower and higher than those obtained through model fitting, respectively. These results indicate that the proposed model was slightly overfitting. However, when GASs were introduced, the R2 of model fitting and CV increased by 0.12 and 0.11, respectively, and the RMSE of model fitting and CV decreased by 12.54 µg m–3 and 9.16 µg m–3, respectively. These results indicate that the performance of both models was greatly improved. In addition, the scatter plot shows that when the PM2.5 concentrations exceeded 300 µg m–3, the AM model severely underestimated PM2.5 concentrations, whereas the AGM model greatly improved in PM2.5 concentration estimation. The improvement in estimation of high values was reflected by the increase in slope from 0.74 to 0.86 and the reduction in intercept value from 20.11 to 12.15.


Fig. 2. Results of model fitting and CV. The dash line is the 1:1 line as a reference. (a) and (c) are fitting results of the AM and AGM models, respectively. (b) and (d) are CV results of the AM and AGM models, respectively.Fig. 2. Results of model fitting and CV. The dash line is the 1:1 line as a reference. (a) and (c) are fitting results of the AM and AGM models, respectively. (b) and (d) are CV results of the AM and AGM models, respectively.

The performance of the AM and the AGM models in the estimation of high PM2.5 concentrations is compared in Table 3. When the observed PM2.5 concentrations exceeded 0, 35, 75, and 100 µg m–3, the R2 of the AGM model was higher than that of the AM model by 0.11, 0.12, 0.14, and 0.15, respectively, whereas the RMSE of the AGM model was lower than that of the AM model by 9.16, 11.17, 13.37, and 15.8 µg m–3, respectively. High PM2.5 concentrations were associated with high increments in R2 and considerable decrements in RMSE. These associations indicate that high PM2.5 concentrations were associated with the drastic improvement in the performance of the AGM model.


Table 3. Comparison of the performance of the two models at different PM2.5 concentrations.

Seasonal performance statistics are shown in Table 4. The performance of the AGM model for all seasons was better than that of the AM model. The performance improvement was the greatest for the heavily polluted winter, with RMSE reduced by 15.28 µg m–3 and R2 increased by 0.14. The performance improvement was the least for summer, with RMSE reduced by 4.3 µg m–3 and R2 increased by 0.09.


Table 4. Performance statistics of the two models in different seasons.


Signs and Magnitudes of the Model Coefficients

Table S2 and Fig. S1 show the fixed effects and daily variations of slopes and intercepts of the AGM model. The fixed coefficients of AOD and GASs are positive. However, as can be seen from Fig. S1, the slope of these variables varies daily. The slope of AOD varies the most, from –41.35 to 118.41, followed by that of CO, which varies from –15.23 to 77.26. It should be noted that the slopes of the variables are positive on most days, indicating that PM2.5 is positively correlated with these variables most of the time. The daily variation in slope and intercept shows that the relationship between PM2.5 and the predictor varies with time, indicating the importance of temporal heterogeneity in PM2.5 concentration prediction. 


Variation in the Spatial Distribution of PM2.5 during a Typical Heavy Pollution Period

Fig. 3 shows the variation in the spatial distribution of PM2.5 during a typical heavy pollution period from October 6, 2014 to October 12, 2014. The spatial distribution variation of PM2.5 during the heavy pollution process is described in Text S1. This typical heavy pollution process indicates that the spatial distribution of PM2.5 predicted by the AGM model was consistent with that inferred from monitoring data. Thus, the AGM model can accurately reflect the occurrence, diffusion, and disappearance of PM2.5 and provides strong data support for studies on the diffusion of PM2.5 during periods of heavy pollution.


Fig. 3. Grid site monitoring values and prediction results of the AGM model for a period of heavy pollution from October 6 to October 12, 2014. (a1)–(g1) Prediction results of the AGM model. (a2)–(g2) Grid site monitoring results. Given the lack of AOD data for October 12 and the similarity between site monitoring data for October 12 and October 13, the data from October 13 were used instead of the October 12 data for prediction.Fig. 3. Grid site monitoring values and prediction results of the AGM model for a period of heavy pollution from October 6 to October 12, 2014. (a1)–(g1) Prediction results of the AGM model. (a2)–(g2) Grid site monitoring results. Given the lack of AOD data for October 12 and the similarity between site monitoring data for October 12 and October 13, the data from October 13 were used instead of the October 12 data for prediction.


Prediction of Seasonal and Annual Average PM2.5 Concentrations

The spatial distribution of seasonal and annual average PM2.5 concentration is shown in Fig. 4. The JingJinJi region experienced the heaviest pollution during winter. The average concentration of PM2.5 in the JingJinJi region exceeded 120 µg m–3 during winter. Average PM2.5 concentrations in the southeast of Handan and the east of Shijiazhuang, Xingtai, and Handan were higher than those in other areas. In addition to industrial emissions and motor vehicle exhaust emissions, residential coal-fired heating contributes to the high PM2.5 concentrations in these areas. During autumn, heavily polluted areas were mainly located in the south of Beijing; the southeast of Baoding; central Shijiazhuang, Xingtai, and Handan; and other areas. The average PM2.5 concentrations were 90–120 µg m–3 and may exceed 120 µg m–3 in some areas. The lowest average PM2.5 concentration in the JingJinJi region was recorded during summer and was less than 75 µg m–3 in most areas. The average PM2.5 concentration in most areas of Zhangjiakou was less than 30 µg m–3, which meets the secondary PM2.5 concentration standard set by the Ministry of Environmental Protection.


Fig. 4. Comparisons of seasonal and annual average predicted and grid monitoring PM2.5 concentrations. (a)–(e) Prediction results of the AGM model and (f)–(j) Grid site monitoring results.Fig. 4. Comparisons of seasonal and annual average predicted and grid monitoring PM2.5 concentrations. (a)–(e) Prediction results of the AGM model and (f)–(j) Grid site monitoring results.

The average annual PM2.5 concentration in most of the JingJinJi region exceeded the secondary PM2.5 concentration standard. PM2.5 concentrations in Baoding, Shijiazhuang, Xingtai, Handan, and other areas may even exceed 90 µg m–3. Therefore, the overall environmental situation in the JingJinJi region is not optimistic. 


DISCUSSION


In this study, gaseous pollutant data were introduced into a mixed effects model to improve the underestimation of PM2.5 in JingJinJi based on AOD and meteorological data. There may be many causes for the underestimation of PM2.5. One of the reasons may be that there are fewer matching data at high concentrations, as discussed by Liu et al. (2005), while the study by Gupta and Christopher (2009b) showed that a small range of high PM2.5 mass concentrations corresponding to a large range of AOT values may result in an underestimation of high PM2.5 concentrations. For further analysis, the correlation statistics between PM2.5 and predictors at different PM2.5 concentrations are shown in Table S1. The correlation between PM2.5 and AOD decreases as the concentration of PM2.5 increases. Therefore, AOD cannot adequately represent the correlation between high PM2.5 concentrations and independent variables (Gupta and Christopher, 2009b; Liu et al., 2005), resulting in an underestimation of high PM2.5 concentrations.

The development of heavy pollution in the JingJinJi region is closely related to atmospheric pollutant concentrations, atmospheric oxidation, and MET (Wang et al., 2015). Atmospheric pollutants include primary pollutant emissions and converted secondary pollutants. Primary pollutants mainly originate from coal fires and traffic emissions, and GASs such as NO2 and SO2, being homologous to the primary pollutants, can represent primary pollutant emissions to a certain extent (Wang et al., 2014). Secondary pollutants are mainly derived from secondary inorganic salts (SO42– and NO3) in the particulate state (Sun et al., 2014). As an important component of PM2.5, SO42– and NO3 are mainly oxidized by SO2 and NO2 in the atmosphere. The oxidation efficiency of SO2 and NO2 is related to the atmospheric oxidant Ox (NO2 + O3). As an indicator of atmospheric oxidation capacity, a high Ox concentration can promote the secondary conversion of SO2 and NO2 to SO42– and NO3, thereby increasing PM2.5 concentrations. We can draw the conclusion that NO2, SO2, CO, and O3 not only represent the emissions of primary pollutants during heavy pollution formation but are also the precursors and oxidants of secondary pollutant transformation. In addition, GASs have a good correlation with high PM2.5 concentrations, as shown in Table S1, which shows that they can more adequately represent the correlation between high PM2.5 concentrations and independent variables. And the results in Table 3 show that high PM2.5 concentrations are associated with high increments in R2 and considerable decrements in RMSE. Therefore, NO2, SO2, CO, and O3 can be used as auxiliary variables to resolve the underestimation of high PM2.5 concentration and improve the accuracy of AOD-based PM2.5 estimation.

Comparing the RMSE differences between seasons for the AGM and AM models reveals that the performance of the AGM model exhibited the greatest improvement for winter, with an RMSE difference of 15.28 µg m–3. Nevertheless, the AGM model showed negligible improvement in its performance in the prediction of summer PM2.5 concentrations and presented an RMSE difference of 4.3 µg m–3. During winter, the increased prevalence of coal-fired heating increases the emission of PM2.5 and its precursors, such as NO2 and SO2. The formation of fine particulates through the secondary conversion of NO2 and SO2 to SO42– and NO3– is the main cause of heavy pollution in the JingJinJi region during winter (Wang et al., 2016). The introduction of NO2 and other GASs as independent variables enabled the AGM model to fully reflect the correlation between high PM2.5 concentrations and independent variables during winter and improved the performance of the AGM model in the estimation of the winter PM2.5 concentration.

By introducing GASs, the AGM model gives higher CV R2 (0.84) than previously published studies in JingJinJi, e.g., the mixed effects model for Beijing (R2 = 0.79) (Xie et al., 2015) and JingJinJi (R2 = 0.77) (Zheng et al., 2016) and the Bayesian model for northern China (R2 = 0.68) (Lv et al., 2016). However, the RMSE in this paper is higher than the values in those studies, which may be due to the reason that the concentration of PM2.5 was lower in previous studies. The small range of PM2.5 concentrations indicates that the concentrations of PM2.5 are relatively concentrated, which results in the smaller RMSE value. For comparison, we also experimented with the most commonly used machine learning models, and the results are shown in Table 5. The performance of all the models was greatly improved by introducing GASs, which shows the feasibility of introducing GASs as predictors. After introducing GASs, the gradient boosting decision tree (GBDT) performed the best among the three machine learning models, with the CV R2 and RMSE being 0.79 and 38.08 µg m–3, respectively. However, the R2 and RMSE were lower by 0.05 and higher by 6.49 µg m–3, respectively, than for the AGM model, which may be due to the GBDT ignoring the temporal heterogeneity between PM2.5 and the predictors.


Table 5. Performance comparison of the AGM with machine learning models.

Although our model demonstrated drastically improved PM2.5 prediction performance, it continued to exhibit a certain degree of underestimation for the following reasons: First, we used Kriging interpolation to obtain the spatial distribution of GASs in the JingJinJi region; interpolation, however, is associated with a certain degree of uncertainty (Oliver and Webster 1990). Second, a complex nonlinear relationship exists between high PM2.5 concentrations and AOD (Liu et al., 2005; Ma et al., 2014; Zheng et al., 2016). The linear mixed model presupposes that a linear relationship exists between PM2.5 and predictive variables (Lee et al., 2011). This assumption, however, oversimplifies the complex relationship between PM2.5 and the predictors (Reid et al., 2015). Furthermore, in addition to meteorological conditions, primary pollutant discharge, secondary pollutant conversion, dust, and building dust are sources of PM2.5 in the JingJinJi region. However, we ignored the effects of these factors on the development of heavy pollution. Finally, we only predicted PM2.5 concentrations for available gridded AOD values. AOD data are often missing because of the effects of clouds, high surface reflection, and high PM2.5 concentrations (Levy et al., 2010; Tao et al., 2012), and if the missing values are related to high PM2.5 concentrations, the PM2.5 concentration may be underestimated if only the retrieved AOD fields are used (Lv et al., 2016). 


CONCLUSION


The AOD and MET cannot fully represent the correlation between high PM2.5 concentrations and predictive variables. Thus, estimation models that use the AOD and MET as predictors may significantly underestimate the PM2.5 concentration. We improved the accuracy of AOD-based PM2.5 prediction by introducing MET parameters and NO2, SO2, CO, and O3 concentrations as predictors to a mixed-effects model. Our model provided 10-fold CV R2 and RMSE values of 0.84 and 33.91 µg m–3, respectively. The PM2.5 concentrations mapped by our model for heavy-pollution weather and seasons are consistent with those inferred from monitoring data, and our results accurately reflect the appearance, diffusion, and disappearance of PM2.5 during such periods. Therefore, introducing GASs as independent variables improves the prediction of PM2.5 spatial distribution. 


ACKNOWLEDGEMENTS


This work was supported by the National Key Research and Development Program of China (2018YFB0505301) and the National Natural Science Foundation of China (41671383). 



Don't forget to share this article 

 

Subscribe to our Newsletter 

Aerosol and Air Quality Research has published over 2,000 peer-reviewed articles. Enter your email address to receive latest updates and research articles to your inbox every second week.

Latest coronavirus research from Aerosol and Air Quality Research

2018 Impact Factor: 2.735

5-Year Impact Factor: 2.827


SCImago Journal & Country Rank

Aerosol and Air Quality Research (AAQR) is an independently-run non-profit journal, promotes submissions of high-quality research, and strives to be one of the leading aerosol and air quality open-access journals in the world.