Comparison of Satellite-based PM2.5 Estimation from Aerosol Optical Depth and Top-of-atmosphere Reflectance

Aerosol optical depth (AOD) and top-of-atmosphere (TOA) reflectance are two useful sources of satellite data for estimating surface PM2.5 concentrations. Comparison of PM2.5 estimates between these two approaches remains to be explored. In this study, satellite observations of TOA reflectance and AOD from the Advanced Himawari Imager (AHI) onboard the Himawari-8 geostationary satellite in 2016 over Yangtze River Delta (YRD) and meteorological data are used to estimate hourly PM2.5 based on four different machine learning algorithms (i.e., random forest, extreme gradient boosting, gradient boosting regression, and support vector regression). For both reflectance-based and AOD-based approaches, our cross validated results show that random forest algorithm achieves the best performance, with a coefficient of determination (R) of 0.75 and root-mean-square error (RMSE) of 18.71 μg m for the former and R = 0.65 and RMSE = 15.69 μg m for the later. Additionally, we find a large discrepancy in PM2.5 estimates between reflectance-based and AOD-based approaches in terms of annual mean and their spatial distribution, which is mainly due to the sampling difference, especially over northern YRD in winter. Overall, reflectance-based approach can provide robust PM2.5 estimates for both annual mean values and probability density function of hourly PM2.5. Our results further show that almost all population lives in non-attainment areas in YRD using annual mean PM2.5 from reflectance-based approach. This study suggests that reflectance-based approach is a valuable way for providing robust PM2.5 estimates and further for constraining health impact assessments.


INTRODUCTION
Ambient fine particulate matter air pollution (≤ 2.5 µm in aerodynamic diameter; PM2.5) has numerous negative effects on human health, including heart disease, stroke, respiratory diseases, and lung cancer (Burnett et al., 2018). Therefore, it's crucial to obtain accurate ground PM2.5 concentrations to address many environmental and social problems, especially over rapidly growing and energy-intensive regions, such as China. Satellite-based data is an important tool to provide PM2.5 estimates with continuous temporal and spatial coverage (Ford and Heald, 2016), they have been widely used in health impact assessments (e.g., Apte et al., 2015;Chowdhury et al., 2018;Wang et al., 2019).
Most satellite studies about PM2.5 estimate are based on aerosol optical depth (AOD) product (Shin et al., 2019 and references therein). These studies are mainly based on three groups of approaches: statistical methods including machine learning (e.g., Lee et al., 2011;Hu et al., 2017;He and Huang, 2018;Wei et al., 2019a;Xue et al., 2019), chemical transport models (Geng et al., 2015;Di et al., 2016;van Donkelaar et al., 2016), and vertical correction models (Zhang and Li, 2015;Gong et al., 2017;Li et al., 2018;Toth et al., 2019). The performance of each approach is affected by the study area and period, as well as the spatial and temporal resolutions of data (Shin et al., 2019). A common limitation in using satellite data is that AOD is unavailable or unreliable in some situations, such as bright surface Levy et al., 2013). This limitation can be addressed by combining with chemical transport models (Di et al., 2016;Hu et al., 2017), but a nonnegligible discrepancy in aerosol loading between satellite observations and chemical transport models still exists (Liu, 2005;Carnevale et al., 2011;Crippa et al., 2019).
Recently, some satellite studies directly used top-of-atmosphere (TOA) reflectance to estimate PM2.5 and showed a larger spatial coverage compared to that based on AOD (Shen et al., 2018;Liu et al., 2019). These studies were based on machine learning algorithm given that the relationship between TOA reflectance and PM2.5 is nonlinear and complex. They typically used a single machine learning algorithm, and lacked systematic comparisons of the applicability and performance of different machine learning algorithms. Here, we employ four machine learning algorithms to build the relationship between TOA reflectance and PM2.5, and further compare their performance. Furthermore, previous studies have rarely investigated the discrepancy in PM2.5 estimations between AOD-based and reflectance-based approaches (Shen et al., 2018;Liu et al., 2019). Thus, another focus of this study is to examine the PM2.5 discrepancy between these two approaches in terms of the sampling effect and probability density function.
In this study, we adopt reflectance-based approach to estimate PM2.5 by using four different machine learning algorithms, and further compare the performance of each algorithm. Additionally, we examine the discrepancy in PM2.5 estimates between AOD-based and reflectance-based approaches in terms of sampling effect and probability density function. Finally, the difference of population-weighted PM2.5 concentrations between these two approaches is analyzed.

Study Area
The study region is Yangtze River Delta (YRD), which is located in eastern China and geographically includes Shanghai municipality and the three provinces of Jiangsu Zhejiang and Anhui. We choose YRD as the study area since YRD region alone accounts for 2.2% of the national land area, 11.0% of the national population, and 18.5% of the national gross domestic product (GDP) in the year of 2014 (Hu et al., 2018). Additionally, YRD is one of the most heavily populated regions in China (Hong et al., 2019).

Data Sets
2.2.1 Ground-based PM 2.5 measurements Hourly PM2.5 concentration data over YRD (137 stations) in 2016 are obtained from the website of China National Environmental Monitoring Center (CNEMC, http://www.cnemc.cn). PM2.5 mass concentrations are measured with a tapered element oscillating microbalance with an accuracy of ±1.5 µg m −3 for hourly averages . The hourly PM2.5 data are quality assured by CNEMC based on the national industry standard. Hourly measurement < 1 µg m −3 are removed because it is below the instruments' limit of detection (Xiao et al., 2017).

Satellite products
This study uses Level 1B calibrated reflectance and Level 2 aerosol products from the Advanced Himawari Imager (AHI) onboard the Himawari-8 geostationary satellite. These two products with 5 km spatial and 10 min temporal resolutions are obtained from the Japan Aerospace Exploration Agency P-Tree system (ftp://ftp.ptree.jaxa.jp/). The TOA reflectances at three channels (0.47, 0.64 and 2.3 µm) and observation angles (sensor azimuth, sensor zenith, solar azimuth, and solar zenith) are generally used to retrieve AOD based on the dark target algorithm . We extract these TOA reflectances and four observation angles as the main input predictors to estimate surface PM2.5 concentrations. Additionally, AOD retrievals with the highest confidence from AHI Level 2 aerosol product are also used to estimate PM2.5 for comparison. Note that cloud mask from AHI Level 2 aerosol product is applied to select cloudfree conditions. About 58% of the total satellite data are lost due to the cloud-free restriction.
We also select the normalized difference vegetation index (NDVI) as a input predictor since they can reflect land cover information (Shen et al., 2018), The calculation of NDVI is based on the following formula (Liu et al.,

Meteorological data
Following the previous studies (Xiao et al., 2017;Shen et al., 2018;Liu et al., 2019), meteorological parameters associated with surface PM2.5 are selected from the ERA-interim reanalysis. These parameters include the surface atmospheric pressure (P, hPa), total column water (TCW, kg m -2 ), 10 m u-wind (U10) and v-wind (V10) component, air temperature at an altitude of 2 m (T, K), total column ozone (TCO, kg m -2 ), relative humidity (RH, %), and planetary boundary layer height (PBLH, m). These meteorological data have a spatial resolution of 0.75° × 0.75°, and are available at six-hourly intervals except PBLH, which is produced two times daily.

Population data
Population data come from the Gridded Population of the World, Version 4 (Doxsey- Whitfield et al., 2015), which are available from the Socioeconomic Data and Applications Center. This data set has a spatial resolution of 5 km, and we use population estimates for 2000 and 2010 from this data to linearly interpolate estimates for 2016. More details can be found in the web of https://sedac.ciesin.columbia.edu/data/collection/gpw-v4.

Methods
Based on AHI measurements under cloud-free conditions in the daytime, reflectances, NDVI, observation angles and AOD are averaged to hourly mean values for each 5 km grid in YRD. For a given grid, cloud-free values are arithmetically averaged to obtain the hourly means, and the number of these values ranges from 1 to 6. Cloud free grids account for 42% of the total satellite product. All other data are integrated to these hourly AHI grids. Specifically, meteorological data are spatially interpolated to 5 km gridded values using linear method, which means that the interpolated value at a given point is based on linear interpolation of the values at neighboring grid points in each respective dimension. Each hourly grid is subsequently assigned to its temporally nearest value. Ground-based PM2.5 data are collocated to corresponding AHI grids.
This study estimates PM2.5 concentration in two ways: the first one is based on TOA reflectance, NDVI, observation angles, and meteorological parameters (referred to as the reflectance-based model); the second one, for comparison purposes, uses AOD and meteorological parameters (referred to as the AOD-based model). Input predictors for these two models also include geographical locations (latitudes and longitudes) and dummy variables (month, day, and hour of observations) (Hu et al., 2017;Liu et al., 2019). Generally, the structures of these two models are shown in the following equations: R6,Lat,Lon,Time,Angles,NDVI,RH,P,TCW,U10,V10,T,TCO,PBLH) (2) PM2.5 = fAOD (AOD,Lat,Lon,Time,RH,P,TCW,U10,V10,T,TCO,PBLH) where fref() and fAOD() represent the estimation functions for reflectance-based model and AODbased model, respectively. Input predictors for reflectance-based model include TOA reflectances (R1, R3 and R6), geographical locations (Lat and Lon), observation time (Time), observation angles (Angles), NDVI and meteorological parameters (RH, P, TCW, U10, V10, T, TCO and PBLH). Input predictors for AOD-based model include AOD, geographical locations, observation time and meteorological parameters. For these two models, the relationship between PM2.5 and input predictors is nonlinear and very complex. Therefore, machine learning algorithm, which has great advantage for fitting nonlinear and complex relationships, is adopted to represent the estimation functions fref() and fAOD(). In this study, we apply four different machine learning algorithms that introduced later. Detailed structure and function for these machine learning algorithms are very complicated and beyond the scope of this study, and can refer to the corresponding references (Ho, 1995;Breiman, 2001;Friedman, 2001;Smola and Schölkopf, 2004;Natekin and Knoll, 2013;. For a given machine learning algorithm, a grid search approach is performed to find the optimized parameters of the algorithm. Specifically, a grid of parameter values is firstly specified, and then this approach exhaustively considers all parameter combinations. The prediction accuracy for each parameter setting is compared based on the coefficient of determination (R 2 ), and the optimized parameters correspond to the best the prediction accuracy.

Random forest
Random forest (RF) is a set of decision trees and can be used for both classification and regression (Ho, 1995;Breiman, 2001). As a very user-friendly algorithm, RF has only two import parameters to tune: the number of trees to grow (ntree) and the number of predictors randomly sampled as candidates at each split (mtry). Based on the grid search approach that mentioned in the previous section, the best prediction for reflectance-based model is achieved when ntree and mtry are set as 800 and 9, and ntree and mtry are set as 900 and 3 for AOD-based model.

Gradient boosting regression
Gradient Boosting Regression (GBR) is another common ensemble technique of regression. The principle idea of this algorithm is to construct the new base-learners to be maximally correlated with the negative gradient of the loss function, associated with the whole ensemble (Friedman, 2001;Natekin and Knoll, 2013). In this study, we select least square as loss function. By the grid search, four important parameters for GBR are tuned, which are the number of boosting stages to perform (n_estimators), the fraction of samples for fitting the individual base learners (subsample), the maximum depth of the individual regression estimators (max_depth) and learning rate. By using the grid search approach, these four parameters are assigned as 800, 0.5, 7 and 0.1 for reflectance-based model, and 800, 0.5, 7 and 0.05 for AOD-based model.

Extreme gradient boosting
The principle of Extreme Gradient Boosting (XGBoost) is the same as GBR. However, XGBoost has better control over-fitting by using a more regularized model strategy . We use tree based model as booster. Except four parameters of GBR, two additional parameters are also tuned for XGBoost: the subsample ratio of columns when constructing each tree (colsample_bytree) and the minimum sum of instance weight needed in a child (min_child_weight). By using the grid search approach, for reflectance-based model, n_estimators, subsample, max_depth, learning rate, colsample_bytree and min_child_weight are set as 500, 0.95, 9, 0.15, 0.7 and 5, respectively. For AOD-based model, these parameters are 500, 0.7, 9, 0.05, 0.7 and 1.

Support vector regression
As a popular supervised-learning approach, Support Vector Regression (SVR) is widely used in regression estimation, pattern recognition, time series prediction, and probability density functions estimation (Liu et al., 2020). This algorithm is based on Vapnik-Chervonenkis (VC) theory and more details can be seen elsewhere (Drucker et al., 1997;Smola and Schölkopf, 2004). One of the main advantages of SVR is that its computational complexity does not depend on the dimensionality of the input space (Awad and Khanna, 2015). In this study, based on the grid search approach, two important parameters of SVR, i.e., kernel coefficient (gamma) and regularization parameter (C), are adjusted to the optimized values, which are 0.4 and 1 for reflectance-based model and 0.6 and 3 for AOD-based model. For RF, GBR and SVR methods, we use the Python implementation in the package Scikit-learn, and for XGBoost we use the specific Python package.

Model Performance of Different Machine Learning Algorithms
The model performance for different machine learning algorithms is shown in Table 1. For both reflectance-based and AOD-based models, the RF method shows the highest model performance, whereas the SVR method demonstrates the lowest performance. For reflectancebased model, R 2 ranges from about 0.7 to 0.75 for different methods, and is generally 0.1 higher than that for AOD-based model.
For a given machine learning algorithm, the number of samples has an important influence on model performance (Harari et al., 2017). Note that sample size of AOD-based model is only about one-fifth of that for reflectance-based model, partially leading to relatively low R 2 of AOD-based model (Table 1). There are several possible reasons for low sample size of AOD-based model, such as only AOD retrievals with the highest confidence selected, and failed AOD retrievals in some cases for clear sky pixels (YOSHIDA et al., 2018). Further analysis about the sampling issue is shown later.
Using RF method, CV results of reflectance-based model and AOD-based model are shown in Fig. 1. Note that reflectance-based model shows higher values for both R 2 and RMSE compared to AOD-based model, and the former model suffers from a relatively large bias for high surface  PM2.5 concentration (Fig. 1). One possible reason for this large bias is that the spatial resolution of satellite observations (i.e., 5 km) may be not high enough to capture the spatial variability of PM2.5 under severe pollution scenario (Mei et al., 2019). The two models in Fig. 1 generally show an underestimation of PM2.5 concentrations, but both overestimate PM2.5 concentrations at low values (< 50 µg m -3 ). Site-specific performances of reflectance-based model and AOD-based model are shown in Fig. 2. For reflectance-based model, R 2 of all sites range from 0.41 to 0.95, and R 2 generally are close to or higher than 0.8 over northern and eastern YRD. For other regions, R 2 are lower than 0.7 (Fig. 2(a)). These relatively low values of R 2 should not be due to sample size of corresponding sites since the discrepancy in sample size across all sites in YRD is overall small (not shown). Sparseness of sites over southern and western YRD may partly result in low R 2 for these regions. Note that the lowest R 2 of 0.41 corresponds to a special site, which is located on the edge of a lake. The land cover for the corresponding satellite grid is complex, and this condition may potentially cause a large uncertainty for extracting aerosol information from satellite TOA observations. The spatial variation of RMSE shows opposite pattern compared to that of R 2 , and most sites have RMSE values lower than 20 µg m -3 for reflectance-based model ( Fig. 2(b)). The spatial patterns of R 2 and RMSE for AOD-based model are consistent with reflectance-based model (Fig. 2). Overall, by using RF method, the reflectance-based model has high performance on hourly PM2.5 prediction for the most ground-based sites in YRD.

Spatio-Temporal Variations of Model Estimated PM2.5 Concentration
Due to the good model performance, the RF method is applied to predict PM2.5 concentration for the rest of the paper, unless otherwise stated. Fig. 3 shows the spatial distributions of annual and seasonal mean PM2.5 concentrations from reflectance-based model over YRD. The spatial distribution of annual mean PM2.5 concentrations from surface observations is captured well from model-simulated (Fig. 3(a) vs. 3(b)), and shows higher PM2.5 concentrations over the northern region and lower values over the southern region. This spatial pattern is also true in  terms of seasonal mean value (Figs. 3(c)-3(f)). The annual mean PM2.5 concentration over YRD is 52.6 µg m -3 . The highest value is found in winter with an average of 72.83 µg m -3 , followed by seasonal mean in spring (54.9 µg m -3 ) and autumn (47.2 µg m -3 ), and lowest value (31.1 µg m -3 ) is found in summer. These spatial pattern and seasonal cycle are consistent with previous studies (Zheng et al., 2016;Xiao et al., 2017;Liu et al., 2019). Although not shown here, the spatial distributions of seasonal PM2.5 for AOD-based model are similar to that from reflectance-based model except DJF, and the reasons for these results are showed later in the section of the sampling effect.
Taking advantage of usage of geostationary AHI observations, the diurnal cycle of regional mean PM2.5 is shown in Fig. 4. Hourly mean PM2.5, by using reflectance-based model, generally slightly increases in the early morning (8:00-10:00), and then continuingly decreases toward evening in terms of annual mean and seasonal mean values ( Fig. 4(a)). This diurnal variation is also found in Shanghai, Nanjing, Hangzhou and Hefei (provincial capital cities in YRD) from surface-measured studies (San Martini et al., 2015;Zhao et al., 2016). We further compare these results to that from AOD-based model, and find a little change in its diurnal variation. Magnitude of hourly mean PM2.5 in DJF from AOD-based model, however, is much lower than that from reflectance-based model (Fig. 4). This DJF PM2.5 difference is likely due to the difference in the number of pixels between AOD and reflectance observations in YRD.
Furthermore, we compare regional exposure estimates in YRD from reflectance-based and AOD-based annual mean PM2.5 concentrations. Fig. 5 shows the cumulative distribution of annual mean PM2.5, and this figure is calculated as the sum of the population in each pixel which has an annual average concentration at or above each concentration level. For reflectance-based estimates, population-weighted mean of PM2.5 concentrations is 54 µg m -3 , and almost all population lives in non-attainment areas in YRD (blue lines in Fig. 5); For AOD-based estimates, population-weighted mean value is 42 µg m -3 , and a small proportion of population (16%) lives in attainment areas scattered across the eastern YRD (not shown). Interestingly, populationweighted mean of PM2.5 concentrations is slightly higher than annual mean value for reflectancebased estimates, and the reverse is true for AOD-based estimates. These results are due to a combined consequence of spatial distribution of PM2.5 concentration and population density. Specifically, for reflectance-based estimates, PM2.5 tends to be higher over high population density areas (not shown), which is consistent with previous studies Liu et al., 2016;Bai et al., 2019;Lu et al., 2019). Whereas, lower values of PM2.5 from AOD-based model are found over high population density areas, such as the eastern YRD (not shown). These low PM2.5 concentrations based on AOD-based model are likely due to the sampling effect, which will be discussed in the next section. Caution must therefore be taken when assessing health impact of PM2.5 from AOD-based model in YRD.

The Sampling Effect
Given a large discrepancy in PM2.5 concentration estimates in YRD between reflectance-based and AOD-based models , we further examine the potential sampling effect on this discrepancy. Fig. 6(a) shows the annual PM2.5 difference between two models, and finds that annual PM2.5 from reflectance-based estimates is generally higher than AOD-based estimates throughout YRD, especially over the eastern YRD (increases by more than 10 µg m -3 ). After hourly PM2.5 estimates for those AOD pixels with reflectance-based estimates, we recalculate the annual PM2.5 for AOD-based model and the corresponding PM2.5 difference between two models, as shown in Fig. 6(b). This PM2.5 difference is only due to the sample difference, and is almost unchanged compared to Fig. 6(a), which indicates that sampling effect itself can substantially explain the annual PM2.5 difference between two models. It also suggests that whether using TOA reflectance or AOD to represent aerosol information has a small influence on annual PM2.5 estimate.
The spatial distributions of sampling difference between two models are shown in Fig. 7. The number of pixels for reflectance-based model in YRD is more reliable since its spatial distribution   6. Spatial distribution of (a) the difference of annual mean PM2.5 concentrations in YRD between reflectance-based and AOD-based model. Panel (b) is the same as panel (a), but PM2.5 estimates for these AOD samples are replaced by reflectance-based estimates. and seasonal change (left panels in Fig. 7) show opposite patterns compared to that for cloud amount (Zhao et al., 2014). Comparing the number of pixels between the two models (right panels in Fig. 7), the southern YRD exhibits a relatively low difference, which may be due to more dark pixels for successfully retrieving AOD since a large proportion of forest exists in this region (Liu et al., 2003;Song et al., 2018). The annual difference of the number of pixels between the two models ( Fig. 7(b)) mainly arises from the winter contribution (Fig. 7(j)); this sampling difference in winter also results in a relatively high discrepancy in magnitude of PM2.5 estimates between the two models (Fig. 4). The low number of AOD pixels in winter is partly because AOD may fail to retrieve or may be unreliable in the presence of extreme conditions, such as haze and fog events (He et al., 2010;Chen et al., 2017). The counts for AOD and reflectance observations are also low in autumn. This is partly because aerosols are likely misclassified as clouds under the heavy pollution due to extensive crop residue burning in autumn in YRD (Brennan et al., 2005;Zhu et al., 2012).
We further compare the number of pixels for each hour between reflectance-based and AODbased model. As shown in Fig. 8, the number of pixels from reflectance-based model is much higher than those from AOD-based model, irrespective of hour or season. The discrepancy of the number of pixels between two models is largest in winter. Interestingly, for AOD-based model, the diurnal cycle of the number of pixels is consistent with that of regional mean PM2.5 (Fig. 8(b)  Fig. 7. Spatial distributions of (left panels) annual and seasonal number of successful PM2.5 retrievals from reflectance-based model. Right panels show the difference between reflectancebased and AOD-based models. vs. Fig. 4(b)). The reason for this consistency is still unclear and warrant further investigation in the future.
Even though the number of reflectance-based PM2.5 estimates is relatively high, would statistical distribution from these PM2.5 estimates present the real regional result from groundbased estimates? To answer this question, we firstly investigate the sampling effect on the probability density function (PDF) of hourly PM2.5 concentrations using ground-based measurements. Specifically, all ground-based hourly PM2.5 samples are classified into three subsets: all samples during daytime and samples collocated to satellite AOD and reflectance observations. Compared to PDF from all samples, Figs. 9(a) and 9(b) show that the PM2.5 PDF from samples collocated to AOD shifts to lower PM2.5 values, especially in the afternoon. This is likely because AOD may fail to retrieve or may be unreliable under extreme conditions (e.g., haze and fog) (He et al., 2010;Chen et al., 2017), which generally correspond to high PM2.5 values. Although beyond the scope of this study, it would be highly interesting to explore the representativeness of AHI AOD products under different air quality conditions in the future. Fig. 9(b) shows that the PM2.5 PDF from samples collocated to reflectance is very similar to that from all surface-measured samples in the afternoon. Furthermore, when we use PM2.5 predictions, the PDF of PM2.5 estimates from AOD-based model further shifts to lower values (Figs. 9(c) and 9(d)) as the model generally underestimates PM2.5 (Fig. 1), whereas PM2.5 PDF from reflectance-based model is generally close to that from surface-measured estimates. Overall, reflectance-based model is a reliable way to capture the regional PDF of PM2.5 concentrations in YRD.

Discussion
Many factors affect the relationship between TOA satellite observations (e.g., AOD or reflectance) and surface PM2.5. The most important one is usually vertical distribution of aerosols (van Donkelaar et al., 2006), which can be provided by lidar retrievals and further to adjust the relationship (van Donkelaar et al., 2013;Gong et al., 2017). However, space-based lidar (i.e., Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP)) observes very limited horizontal coverage because of CALIOP's near-zero swath and repeat cycle of 16 days (Omar et al., 2009). In spite of this limitation, Toth et al. (2019) still found that CALIOP has some representative skill to estimate PM2.5 within a few hundred kilometers of an observation over the United States (US). Due to the complexity and severity of PM2.5 pollution in YRD (Hu et al., 2014), its spatial representativeness of an observation site is relatively low compared to US, thus we need to further check whether PM2.5 with broad spatial coverage can be reasonably derived over YRD based on CALIOP measurements. Additionally, more factors involved do not always imply higher performance of PM2.5 retrieval model (Hu et al., 2017). For instance, although not shown here, the performance of AOD-based model instead worsens after adding a model input predictor (angstrom coefficient) as indicator of aerosol particle size (YOSHIDA et al., 2018). This counterintuitive result may be due to the low retrieval accuracy of this parameter over land (Wei et al., 2019b). Furthermore, following the method by Bi et al. (2019), we expand this study area to cover YRD area and its surrounding cities, and the model performance based on more ground PM2.5 stations changes a little (not shown), which suggests that the number of ground PM2.5 stations is sufficient enough to retrieve PM2.5 for YRD region.
The model prediction accuracy varies with season. For reflectance-based model by using RF method, the CV R 2 value is the highest in winter (0.75), followed by fall (0.69) and spring (0.60), and the lowest in summer (0.52). By contrast, the highest and the lowest RMSE values are found in winter and summer with values of 22.19 and 11.63 µg m -3 , respectively. The worst performance of RMSE in winter is partly because PM2.5 concentration is the highest in this season (Fig. 4). Additionally, we analyze variable importance for RF method (not shown), and the results show that the top-five important variables are T, TCW, Day, Month and V10, and all meteorological parameters are contained in top-ten important variables. These results reflect that meteorological parameters are more important predictors than satellite parameters for predicting hourly PM2.5. Previous satellite studies mainly used AOD as the main input predictor to estimate PM2.5 concentrations (Shin et al., 2019 and references therein), and they mainly focused on uncertainties associated with AOD, such as uncertainties in AOD among different instruments and product versions. However, our results show that AOD-based model obviously underestimate PM2.5 at high values in YRD ( Fig. 1(b)), even PM2.5 estimates from the model with an ideal performance still fail to capture the regional PM2.5 PDF (Figs. 9(a) and 9(b)). This underestimation is closely related to sampling issue, which is also confirmed by Ford and Heald (2016). On the contrary, reflectance-based model can obtain a relatively large sample size and improve the predicting performance. PM2.5 estimates from this model, however, are still unavailable under cloudy condition and during nighttime. To estimate full-coverage PM2.5, it is likely feasible to combine satellite observations and aerosol reanalysis data (Xiao et al., 2017;Bi et al., 2019), which we plan to perform in the future. Additionally, to obtain more statistically robust results, we also plan to expand our analysis to additional years.
Measurements from polar-orbiting platforms have been widely used to estimate satellitebased PM2.5 (i.e., Ma et al., 2016;Li et al., 2018;Bai et al., 2019), these estimates, however, are retrieved at a fixed observation time and thus have a low temporal resolution (~ days). These estimates cannot provide details information for daily monitoring and tracking of air quality. In contrast, geostationary satellite sensors provide measurements for a specific area with a high temporal resolution at intervals of minutes or hours, which can be very helpful for monitoring in areas with a large variation over time, such as YRD area in Fig. 4. Furthermore, these geostationary measurements would offer a good opportunity to advance health risk assessments for epidemiological studies.

CONCLUSIONS
In this study, Aerosol optical depth (AOD)-based and reflectance-based models are used to estimate ground-level PM2.5 in Yangtze River Delta (YRD), and we further compared performance between these two models by utilizing different machine learning algorithms i.e., Random Forest (RF), Extreme Gradient Boosting (XGBoost), Gradient Boosting Regression (GBR) and Support Vector Regression (SVR). Additionally, we analyzed the spatio-temporal variation of PM2.5 in YRD and population exposure. In order to explain the discrepancy in PM2.5 estimates between AODbased and reflectance-based models, the sampling effect is finally examined.
Results from AOD-based and reflectance-based models both show a clearly diurnal cycle of regional mean PM2.5 in YRD in the daytime: hourly PM2.5 slightly increases in the early morning (8:00-10:00) and then continuingly decreases toward evening in terms of annual mean and seasonal mean values. PM2.5 estimates from reflectance-based model capture well the spatial distribution of annual mean PM2.5 from surface observations, showing higher PM2.5 over the northern region and lower values over the southern region. Our results show that the regional population-weighted annual average PM2.5 concentrations is 54 µg m -3 using reflectance-based estimates, and almost all population lives in non-attainment areas.
The results presented here show that the discrepancy in annual PM2.5 between AOD-based and reflectance-based models is mainly due to the sampling difference, especially over northern YRD in winter. Additionally, by comparing the regional probability density function (PDF) of hourly PM2.5 concentrations from ground, AOD-based and reflectance-based estimates, the PDF of PM2.5 from AOD-based model obviously shifts to lower values and produces a large bias, whereas reflectance-based model can provide a more reliable PM2.5 PDF. This suggests that PM2.5 estimates from reflectance-based model are valuable for evaluating statistical distribution of PM2.5 and further for constraining health impact assessments.