Compact Algorithms for Predicting of Atmospheric Visibility Using PM2.5, Relative Humidity and NO2

Visibility is a key parameter of the atmospheric environment that has attracted increasing public attention. Despite its importance, very few descriptions of methods for predicting visibility using widely available information in the literature exist. In this paper, we derive and evaluate two compact algorithms (Models I and II) for measuring and predicting visibility using records of PM 2.5 , relative humidity (RH) and NO 2 from 16 cities around the world. Models I and II are simplified algorithms derived from Pitchford’s algorithm. Our analysis shows that Model I is more consistent with the observations and can accurately predict changes in visibility. In a separate part of the study, the two algorithms are trained using data sets from individual cities. Better results are obtained when the models are trained with the data from London, Sydney and the Chinese mainland cities. Model II displays broader applicability when it is simulated using a single city’s data set. This study indicates that atmospheric visibility can be well quantified based on measurements of PM 2.5 , RH and NO 2 concentrations


INTRODUCTION
Atmospheric visibility is closely related to daily life.Low visibility can lead to traffic accidents, flight delays and visual impairment, which has attracted more and more public attention.Visibility is easily measured using laser radar, the photograph processing method and aerosol sampling method (Luo et al., 2005), but theoretical algorithms for quantifying and predicting visibility have only rarely received attention.Increased awareness about the negative impacts of visibility on human daily activity motivated the international community to develop new tools for visibility prediction.
Using Koschmieder's formula (Larson and Cass, 1989;Che et al., 2006), σext = -ln0.02/V,horizontal visibility (V, km) can be calculated by the atmospheric extinction coefficient (σ ext , km -1 ).Theoretical descriptions of σ ext have been derived by many people.A simple algorithm for estimating σ ext from measured species concentrations was developed by Malm (1994).However, the accuracy and precision of the algorithm was very low.Later, based on a continuous monitoring campaign of 160 sites through the Interagency Monitoring of Protected Visual Environments Particle Monitoring Network (IMPROVE), a new algorithm for estimating light extinction was developed by Pitchford et al. (2007).The algorithm includes 15 variables (such as small sulfate, large sulfate, small nitrate, large nitrate, small organic mass, large organic mass, soil dust and sea salt).The Pitchford model is more consistent with the atmospheric aerosol literature and reduces bias at extremes of high and low light extinction.The improved performance of this model demonstrates that the prediction of visibility involves using data obtained by monitoring meteorological conditions and airborne pollutants.But there are plenty of variables in the model which contribute to lower applicability.
In this paper, we present two compact algorithms (Model I and Model II) to predict visibility based on data for PM2.5, relative humidity and NO 2 concentration in 16 cities around the world.As PM 2.5 , relative humidity and NO 2 are easy to measure and predict, visibility can also be predicted by measurement of PM 2.5 , relative humidity and NO 2 using the constructed algorithm.It is of great significance for reducing the cost of measuring visibility in the region where only conventional pollutants and meteorological parameters are monitored.

Model Assumption
Koschmieder (Che et al., 2006) established an algebraic relationship between visibility and the light extinction coefficient (σext), which can be described as σ ext = -ln0.02/V= σ sp + σ ap + σ sg + σ ag .The atmospheric light extinction coefficient is the sum of particle scattering coefficient (σ sp ), particle absorption coefficient (σ ap ), Rayleigh scattering coefficient (σ sg ) and gas absorption coefficient (σ ag ).Light scattering and absorption by particles are the main reason for visibility degradation; it is directly affected by meteorological factors and airborne contaminants such as nitrate, sulfate, EC, OC, ammonium salt and secondary organic aerosol (Li et al., 2018;Yu et al., 2019).Particles' contribution to the atmospheric extinction coefficient exceeds 95% (Cao et al., 2012).Rayleigh scattering refers to the scattering of light from air molecules, and it depends on the density of atmosphere.Generally, it was assumed to be a constant value of 0.01 km -1 at sea level (Watson et al., 2002).Gas absorption is mainly contributed by NO 2 which has small effect on the light extinction coefficient.Pitchford et al. (2007) have developed a precise algorithm (Eq.( 1)) for calculating the light extinction caused by different processes: -ln0.02/V = 2.2f  (Taylor and McLenna, 1985;Cao et al., 2005Cao et al., , 2012)).The large and small fractions such as sulfate, nitrate and OM indicate different formation process through dry and aqueous mechanisms (John et al., 1990), which are calculated according to Cao et al. (2012).Component concentrations shown in brackets are in µg m -3 .Despite the complexity of the parameters and manifestations of this equation, it can be summed up in two basic assumptions: 1) The total extinction effect of the atmosphere can be approximated as a combination of the extinction effects of the chemical components in dry air and the hygroscopic growth; 2) the extinction effect of each component follows Beer-Lambert law.However, the use of the algorithms requires many parameters, which hamper its applicability.Therefore, the main purpose of this paper is to simplify Pitchford's algorithm.In Pitchford's algorithm, the light extinction coefficient is the sum of particle scattering/absorption coefficient, Rayleigh scattering coefficient and gas absorption coefficient.Sulfate, nitrate, organic mass, elemental carbon, sea salt and coarse mass were the main hygroscopic aerosol components which contributed to the particle scattering/absorption coefficient.The particle scattering/absorption coefficients of above species were calculated separately.In the simplified algorithms, the hygroscopic aerosol components are a portion of PM2.5 which have been replaced by PM 2.5 .The water growth function [f(RH)] was replaced by aerosol humidification factor (1 -RH/100) b .PM 2.5 is directly multiplied by (1 -RH/100) b in Model I. Regression analysis between PM 2.5 and visibility has indicated that exponential equations can describe their algebraic relationship (Chan et al., 1999;Zhang et al., 2019).So PM 2.5 was exponential conversed by exp(PM 2.5 /e 2 ) in Model II.Rayleigh scattering is a site-specific parameter that contributes less than 2% to the value of the extinction coefficient (Cao et al., 2012).Therefore, this term was omitted.The last term of ρ(NO 2 ) represents the gas absorption effect which is retained in the simplified algorithms.Based on these assumptions, two models were constructed and presented in Eqs. ( 2) and ( 3): ( ) Model II : where PM 2.5 represents the mass concentration of fine particles (in µg m -3 ); ρ(NO 2 ) represents the atmospheric mass concentration of NO 2 (in µg m -3 ); a 1 and a 2 represent the coefficient of the PM 2.5 ; b 1 and b 2 describe the coefficients of humidification factor; c 1 and c 2 represent the coefficient of NO 2 ; e 2 is the correction factor of PM 2.5 ; d 1 and d 2 are the error terms.The unknown coefficients of Model I (a 1 , b 1 , c 1 and d 1 ) and Model II (a 2 , b 2 , c 2 , d 2 and e 2 ) are determined using iterative regression to fit the data from the 16 cities.

Data Collection and Data Processing
In order to derive a robust compact model, data for visibility, PM 2.5 , relative humidity and NO 2 concentrations from 16 large cities around the world (Beijing, Guangzhou, Hangzhou, Ningbo, Xiamen, Shijiazhuang, Chongqing, Shanghai, Pinzhen, Xinbei, London, Sacramento, Toronto, New York, Coyhaique and Sydney) were downloaded from the Center of National Ministry of Environmental Protection of China (MEPC) (http://datacenter.mep.gov.cn/) and the OpenAQ air quality database (https://openaq.org).These data were processed by standardized procedures (Doyle et al., 2002;Che et al., 2008;Ratto et al., 2012): (1) All variable units are converted into standard units; (2) the outlying data points are removed before regression; (3) the visibility values should be greater than 0.3 km, and less than the 99 th percentile of the data; (4) data obtained during abnormal weather episodes such as precipitation, mist and dust storms were excluded.After the screening process, a total of 10,107 points have been retained for regression.

Model Regression
The original data was divided into two parts; the main part of the data set was used for model simulation and the remainder was used to validate the model.First, the unknown coefficients of Model I and Model II were obtained through iterative regression with all of the 16 cities' data.The regression analyses were performed using the 1stOpt1.5 software package (Liu et al., 2018;Marques et al., 2019).1stOpt1.5 is an automatically running optimized regression software which is developed by 7D Soft High Technology Inc. (China).When an equation with undetermined coefficients is inputted to the program, the software will automatically perform iterative regression until a converged solution is obtained thereby determining the best coefficient values.After determining the model parameters, sensitivity analysis was conducted using Monte Carlo random simulation method for the atmospheric visibility estimation.
In order to compare the pros and cons of the two algorithms, the values of Akaike information criteria (AIC) and Bayesian information criteria (BIC) were calculated using Origin 2017.In statistics, the Akaike information criterion is an estimator of the relative quality of statistical models for a given set of data.It is based on the likelihood function.Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models.Thus, AIC provides a means for model selection.Similar to Akaike information criteria, the Bayesian information criterion is also a criterion for model selection among a set of models.The model with the lowest AIC and BIC is preferred (Akaike et al., 1998;McDonald et al., 2016).The calculation formulas for AIC and BIC are shown in Eqs. ( 4) and (5) (Vrieze, 2012): where k AIC and k BIC represent the AIC and BIC values (unitless), respectively; κ represents the number of estimated model parameters (unitless); τ represents the optimized model parameters; ( ) Ŷ l τ is the log of the likelihood of τ given the data of Y; N represents the number of observations (unitless).

Visibility Distribution
The aerosol contribution to "global dimming" was first reported as a strong decrease in visibility up to the middle 1980s.Since that time, visibility has increased over Europe, consistent with reported European "brightening", but has decreased substantially over south and east Asia, South America, Australia, and Africa, resulting in net global dimming over land (Wang et al., 2009;Vautard et al., 2009).
Statistical summaries of the atmospheric visibility, PM2.5, RH and ρ(NO 2 ) are shown in Table 1.During the study period, low visibility (less than 10 km) was observed in Hangzhou (7.86 km), Guangzhou (8.89 km), Chongqing (7.27 km), Shanghai (7.12 km) and Xinbei (6.16 km) with high PM 2.5 levels (36.39-58.73µg m -3 ).It is well known that PM 2.5 is the main factor which contributes to visibility degradation .The visibility in Beijing (11.0 km), Shijiazhuang (10.7 km) and Coyhaique (12.2 km) were above 10 km, which exhibited higher PM 2.5 concentrations compared with those in the low-visibility cites.It indicates that visibility is also affected by other conditions.Visibility varies with the degree of air pollution.Most Chinese cities are facing a downward trend in visibility due to the high PM 2.5 level (Che et al., 2008;Molnar et al., 2008).Some cities such as Pinzhen, Sacramento, New York and Sydney exhibited relatively good visibility (above 15 km).The mean concentrations of PM 2.5 (7.33-16.14µg m -3 ) and ρ(NO 2 ) (7.2-23.9µg m -3 ) were also very low.In London, due to vehicle emissions including a greater fraction of diesel engines, a relatively high NO 2 concentration of 88.9 µg m -3 was observed.However, NO 2 has a very small effect on visibility.On the other hand, the PM 2.5 concentration of 15.1 µg m -3 was very low, and the visibility is also above 15 km.

Model Simulation with Combined Data
The coefficients for the two models were determined using all of the 16 cities' data by Levenberg-Marquardt and Universal Global Optimization methods (convergence criterion: 1.00E-10; maximum iterations number: 1000; repeat number: 30; control iterations number: 20).The running codes for each model were shown in Table 2.Both Model I and Model II achieved convergence after 17 and 40 iterations.The obtained equations are listed in Table 3.The adjusted R 2 of Model I (0.58) is a little higher than Model II (R 2 = 0.57) and the AIC value of 26,909 and BIC value of 26,945 were smaller than the corresponding values in Model II.Preliminary inspection indicated that Model I is the optimal algorithm for visibility prediction.
Sensitivity analysis was conducted using Monte Carlo random simulation method for the atmospheric visibility estimation (Fig. 1).It shows that the combination of most significant contributors, PM2.5 and RH, could account for 97-98% of the visibility degradation.In particular, the contributions of PM 2.5 to the variance of visibility were 72% and 66% from Model I and Model II prediction, respectively.In contrast, the concentration of NO 2 showed a negligible influence on the variance of visibility, with uncertainty contributions of 2% and 3% from Model I and Model II, respectively.The sensitivity analysis indicated that decreasing the PM 2.5 concentration and RH can significantly improve atmospheric visibility.
Model I and II are four-dimensional functions; it is hard to directly draw the function surface.However, we note that NO 2 has a limited influence on visibility; the main contributors   The 1 st column is visibility.b The 2nd column is relative humidity.c The 3 rd column is PM 2.5 .d The 4 th column is ρ(NO 2 ).V: visibility, km; R: relative humidity, %; P: PM 2.5 , µg m -3 ; N: ρ(NO 2 ), µg m -3 .Table 3.The functions of the two models in this study and that used in Pitchford et al. (2007).(1) 0.86 a -a The R 2 correlation coefficient of 0.86 is obtained from reference using IMPROVE particle speciation data.for visibility degradation are PM 2.5 components (Tao et al., 2007).For the purpose of discussion, ρ(NO 2 ) was omitted and the models become three-dimensional.The three-dimensional surfaces of Model I and II are shown in Fig. 2. The surfaces are extended upwards with visibility increasing with decreasing PM 2.5 and humidity.The Model I surface is in the middle of the scattered data points which indicate a satisfactory simulation result.However, it should be noted that the uncertainty contribution of ρ(NO 2 ) from Model II (3%) is slightly higher than that from Model II (2%).Without considering the contribution of ρ(NO 2 ), the surface of Model II deviates to a slightly bigger extent from the dense area compared with the surface of Model I.Both Model I and II have good predictive results for visibilities less than 20 km.Large deviations appear when visibility is higher than 20 km.It can be seen that the models derived from the combined data of 16 cities have certain generality, especially given the different atmospheric environments represented within the data set.An evaluation for the predictive models obtained with combined data by the single-city data was performed, and the result (Table S1) showed that the adjusted R 2 of the two models in Chinese mainland cities (Beijing, Guangzhou, Hangzhou, Ningbo, Xiamen, Shijiazhuang, Chongqing, Shanghai) and London are higher than those obtained in other cities.The prediction results in most cites are satisfied, which suggested the adaptability of these two models.
In comparison, the R 2 in the algorithm of Pitchford (Table 3), with the value of 0.86, was obtained from the IMPROVE particle speciation data, which is higher than the results obtained in this study.The Pitchford algorithm is theoretical prediction of visibility with 15 entangled variables, which may over-describe the system (Kelly et al., 2013).While the algorithms derived in this study only have 3 variables (PM 2.5 , RH and ρ(NO 2 )), the advantage is that there are fewer variables and these variables are easier to obtain.The estimated visibility value is acceptable for many purposes.

Simulation with Single-city Data
The equations of Model I and II were derived with singlecity data using 1stOpt software, following the same protocol.The values of the coefficients for the two models for each city are shown in Tables S2 and S3.The adjusted R 2 (Table 4) of the two models in Beijing, Guangzhou, Hangzhou, Ningbo, Xiamen, Shijiazhuang, Chongqing, Shanghai and London are higher than other cities (Model I: 0.62-0.86;Model II: 0.65-0.87;p < 0.01).The results are comparable to those obtained by Pitchford et al. (2007).Compared with Model I, Model II has higher R 2 in many cities such as Guangzhou, Xiamen, Coyhaique, Xinbei, Pinzhen, Sacramento, Toronto, New York and Sydney.It indicated that Model II has broader applicability when simulated with a single-city data set.Both Model I and Model II have exhibited lower adjusted R 2 (0.19-0.41) in Coyhaique, Xinbei, Pinzhen, Sacramento, Toronto and New York.Using the models to predict the visibility of the above cities is not very accurate.The visibility of these cities might be related to other factors (e.g., unique aerosol composition, and fraction of PM 2.5 in PM 10 ) in addition to PM 2.5 , relative humidity and NO 2 concentration (Park et al., 2018).Besides, it can be found that the two models derived from the single-city data show higher adjusted R 2 , which were better than those obtained with combined data.
Comparisons between model-predicted visibility and observed visibility of these cities are shown in Fig. 3.The results in London and mainland China cities were better than other cities.Xinbei and Pinzhen are located in Taiwan which are near mainland China, but the predicted results were not satisfactory.The prediction results of Sacramento, Coyhaique and Toronto have large deviations due to the small data volume.

CONCLUSIONS
This study proposes two algorithms using PM2.5, RH and NO 2 as independent variables for simulating the visibility.According to the simulation based on the combined data of 16 cities, Model I exhibits slightly better applicability.Model II displays broader applicability when it is simulated using a single city's data set.The model predictions are satisfactory for Beijing, Guangzhou, Hangzhou, Ningbo, Xiamen, Shijiazhuang, Chongqing, Shanghai and London (Model I: R 2 = 0.62-0.86;Model II: R 2 = 0.65-0.87).Lower adjusted values for R 2 (0.19-0.41) are obtained for Coyhaique, Xinbei, Pinzhen, Sacramento, Toronto and New York.
The simulation results confirm that flexible and simpler algorithms can generally produce reliable predictions based on measurements of PM 2.5 , RH and ρ(NO 2 ).As far as we know, this is the first study to propose simplified algorithms with only 3 variables for visibility prediction.Some limitations in the performance should be noted, partly because the amount of data available for some of the cities was insufficient.Increasing the volume of the data set is necessary in order to improve the models' adaptability and correlatability in future work.

Fig. 1 .
Fig. 1.Sensitivity analyses of PM 2.5 , RH and NO 2 on atmospheric visibility predicted by Model I and Model II.

Fig. 2 .
Fig. 2. Three-dimensional surface diagrams of Model I and II based on fits to the data from the 16 cities: (a) Model I, (b) counterclockwise rotation of the Model I surface by 180°, (c) Model II, (d) counterclockwise rotation of the Model II surface by 180°.Scatter points in different colors represent the observed data from different cities.
This work was supported by the National Natural Science Foundation of China (No. U1405235), Science and Technology Plan Project of Ningbo City (No. 2015C110001) and Natural Science Foundation of Ningbo City (No. 2015A610247).

Table 1 .
Sites information and the values of atmospheric visibility, PM 2.5

Table 2 .
The running codes for model regression by 1stOpt1.5 software.

Table 4 .
The correlation coefficients and RSS results of the two models based on city-specific data with observations for each city.sum of squares; N: number of data set samples.