Bu-Yo Kim  This email address is being protected from spambots. You need JavaScript enabled to view it.1, Joo Wan Cha1, Ki-Ho Chang1, Chulkyu Lee2 

1 Research Applications Department, National Institute of Meteorological Sciences, Seogwipo, Jeju 63568, Korea
2 Observation Research Department, National Institute of Meteorological Sciences, Seogwipo, Jeju 63568, Korea


Received: March 10, 2022
Revised: July 19, 2022
Accepted: July 29, 2022

 Copyright The Author's institutions. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited. 


Download Citation: ||https://doi.org/10.4209/aaqr.220125  

  • Download: PDF


Cite this article:

Kim, B.Y., Cha, J.W., Chang, K.H., Lee, C. (2022). Estimation of the Visibility in Seoul, South Korea, Based on Particulate Matter and Weather Data, Using Machine-learning Algorithm. Aerosol Air Qual. Res. 22, 220125. https://doi.org/10.4209/aaqr.220125


HIGHLIGHTS

  • Visibility estimation using weather and PM data and machine-learning algorithms.
  • Comparison analysis of visibility estimated by each machine-learning algorithm.
  • Adoption of optimal visibility estimation algorithm and hyperparameter settings.
  • Comparison analysis between the estimated and observed visibility was performed.
 

ABSTRACT


Visibility is an important indicator of air quality and of any consequent meteorological and climate change. Therefore, visibility in Seoul, which is the most polluted city in South Korea, was estimated using machine learning (ML) algorithms based on meteorological (temperature, relative humidity, and precipitation) and particulate matter (PM10 and PM2.5) data acquired from an automatic weather station, and the estimated visibility was compared with the observed visibility. Meteorological data, observed at 1-h intervals between 2018 and 2020, were used. Through learning and validation of each ML algorithm, the extreme gradient boosting (XGB) algorithm was found to be most suitable for visibility estimations (bias = 0 km, root mean square error (RMSE) = 0.08 km, and r = 1 for training data set). Among the meteorological and particulate matter data used for learning the XGB algorithm, the relative importance of PM2.5 and relative humidity variables were high (51% and 19%, respectively), whereas precipitation and wind speed had the low relative importance (approximately 1%). The estimation accuracy for the test dataset was good (bias = –0.11 km, RMSE = 2.08 km, and r = 0.94); the estimation accuracy was higher in the dry season (bias = –0.06 km, RMSE = 1.79 km, and r = 0.96) than in the rainy season (bias = –0.17 km, RMSE = 2.34 km, and r = 0.91). The results of this study indicated a higher correlation than the results of previous visibility estimation studies. The proposed method promotes accurate estimation of visibility in areas with poor visibility, and thus, it can be used to assess public health in areas with poor air quality.


Keywords: Seoul, Meteorological data, PM10, PM2.5, Visibility estimation, Machine learning, Extreme gradient boosting algorithm


1 INTRODUCTION


Visibility is a measure of the distance at which an object or light source can be identified, and is defined as the distance at which the light intensity is reduced to 5% of the original level (WMO, 2014). Such visibility causes low visibility of several kilometers, depending on precipitation or atmospheric suspended matter. Low visibility can cause economic losses due to road, marine, and air traffic accidents and negatively affect public health and property (Huang and Zhang, 2017; Wu et al., 2020). In particular, particulate matter (PM) released from energy use activities in urban areas, industrial activities, and population growth deteriorate visibility; a decrease in visibility of 6 to 8 km increases the mortality rate associated with heart disease and bronchitis by 2–3% (Ozer et al., 2007; Lee et al., 2015; Jeong et al., 2017). In addition, as changes in visibility are associated with changes in the meteorological parameters and climate in general (Peterson et al., 2019; Li et al., 2020; Zong et al., 2020), visibility can serve as an indicator of past, present, and future air quality improvements (Lee et al., 2014).

Previously, visibility was observed manually by human-eyes, but present observations in most countries include the use of a visibility sensor. Visibility sensors have high precision and accuracy in measuring visibility, but constructing a dense visibility observation network is difficult owing to economic and geographic constraints (Kim et al., 2021b). Therefore, regional visibility is estimated or predicted using numerical prediction models, which overcome these limitations. Fita et al. (2019) predicted the visibility using the K94, RUC, and FRAM-L models, and compared them with the observed visibility. Zong et al. (2020) analyzed the accuracy of visibility data predicted using WRF-Chem. However, although these numerical prediction models are suitable for calculating the spatiotemporal visibility, their accuracy is low (Singh et al., 2018). Therefore, in addition to numerical prediction models, novel methods to estimate or predict visibility using correlation between observed visibility and meteorological variables are being presently used (Bari, 2018; Fita et al., 2019).

Previous studies have shown that among the weather variables, PM, relative humidity (RH), and wind speed (WS) significantly affect visibility changes (Lee et al., 2015; Qu et al., 2015; Kim, 2019). However, visibility is not linearly proportional to these meteorological parameters. The accuracy of visibility estimations based on correlation can significantly change based on meteorological conditions (radiation, turbulence, microphysics, chemistry, and surface conditions) (Won et al., 2020). Therefore, in addition to calculating visibility using linear (Du et al., 2013) or exponential (Ozer et al., 2007; Qu et al., 2015) relationships, visibility is being actively determined using non-linear machine learning (ML) methods (Cornejo-Bueno et al., 2017; Cornejo-Bueno et al., 2020), which exhibit high computational speed and high computational accuracy (Kim et al., 2021b).

In this study, visibility in Seoul, which is the most polluted area in South Korea, was estimated using meteorological (temperature, RH, and precipitation) and particulate matter (PM10 and PM2.5) data, and ML algorithms. Although Seoul comprises only 0.6% of the total area of South Korea, it is home to approximately 18% of the total national population; thus, the energy consumption of this metropolitan city for domestic and industrial applications is higher than that of other cities (Lee et al., 2014). In addition to local air pollution, Seoul has high concentrations of dust and air pollutants that are generated from deserts and other cities in China and Mongolia transported according to weather patterns (Peterson et al., 2019). This perspective is important for conducting research on air quality and data utilization (Yum and Cha, 2010; Lee et al., 2014; Kim and Lee, 2018). This study aimed to determine visibility in Seoul by adopting an ML algorithm optimized for visibility estimation using meteorological and particulate matter data. The proposed method allows accurate determination of visibility without installing a visibility sensor.

 
2 METHODS


 
2.1 Research Data

To estimate the visibility, meteorological and particulate matter data were collected every 1-h from January 1, 2018 to December 31, 2020, from an automatic weather station (AWS; station No. 108, 37.57°N, 126.97°E) of Korea Meteorological Administration (KMA) and an air-quality measurement station (AMS; station No. 111121, 37.56°N, 126.96°E) of the Ministry of Environment (MOE), in Seoul, South Korea. The aerial distance between the two stations is approximately 1.1 km. PM10 and PM2.5 data (µg m–3) were measured using a continuous ambient particulate matter monitor (FH62C14 (Thermo Fisher Scientific Inc., USA) and BAM 1020 (Met One Instrument Inc., USA)) by AMS, and quality- controlled PM data were obtained from Air Korea (www.airkorea.or.kr) (MOE, 2021). Data for air temperature (Ta, °C), dew point temperature (Td, °C), atmospheric pressure (Pa, hPa), RH (%), wind direction (WD, °), WS (m s–1), and precipitation (mm h–1, accumulated over 1-h) were collected at the AWS. Visibility (km) data, measured using an automated synoptic observing system (ASOS), were used to analyze the accuracy of the estimated visibility. Manually observed visibility data acquired from the KMA were converted to sensor-based visibility for the second half of 2017 through an automatic observation pilot operation in 2017. Therefore, to use the same observation method and objective data, data from the last three years since 2018 were used. Further, PM10, PM2.5, Ta, Td, Pa, RH, WD, WS, and precipitation data were used as input data for the ML algorithm, and the existing visibility data were used to evaluate the accuracy of the estimated visibility. In previous studies (Jung et al., 2009; Thach et al., 2010; Wu et al., 2012; Guo et al., 2020), visibility was estimated in dry (less than 60–70% RH) weather conditions on days with no precipitation to exclude deliquescence or hygroscopic growth of PM that occurs under high RH (Guo et al., 2020). However, in this study, all data, excluding erroneous datasets were used to estimate visibility under all weather conditions.

The collected meteorological and particulate matter data were randomly sampled at a ratio of 5:3:2 for training, validation, and testing without replacing the entire data (100%), and each dataset was constructed (Xiong et al., 2020; Kim et al., 2021a). The optimal hyperparameters were set using the training and validation datasets, respectively. The accuracy of the visibility estimated by each ML algorithm was evaluated using the visibility data. Subsequently, the ML algorithm with the highest estimation accuracy was adopted, and the estimation results for the test dataset were compared with the observed visibility data and comprehensively analyzed. A comparison of the estimated visibility (VISest) and the observed visibility by the visibility sensor (VISobs) is shown in Eqs. (1–3). Accuracy was compared using bias, root mean square error (RMSE), and correlation coefficient (r).

Here, N is the sample number.

 
2.2 Machine Learning Algorithms

The ML algorithms used in this study were artificial neural network (ANN), extreme learning machine (ELM), k-nearest neighbor (kNN), random forest (RF), support vector regression (SVR), and extreme gradient boosting (XGB) among supervised learning regression methods. Each hyperparameter of these algorithms was repeatedly grid-searched with fine resolution (Bergstra and Bengio, 2012; Kim et al., 2021a). A brief description of the hyperparameter options for each ML algorithm used in this study are as follows:

 
2.2.1 Artificial neural network

ANN is a single perceptron composed of an input layer, an output layer, and a hidden layer between the two layers (Fig. 1(a)) (Rosa et al., 2020). In this study, the R “nnet” package (Ripley and Venables, 2021a) was used, and the hyperparameters were set as follows: size (number of hidden nodes) = 10, maxit (maximum number of iterations) = 900, and decay (parameter for weight decay) = 0.5.

Fig. 1. Schematic diagram of each machine learning algorithm: (a) ANN, (b) ELM, (c) kNN, (d) SVR, (e) RF, and (f) XGB (Kim et al., 2021a).
Fig. 1. Schematic diagram of each machine learning algorithm: (a) ANN, (b) ELM, (c) kNN, (d) SVR, (e) RF, and (f) XGB (Kim et al., 2021a).

 
2.2.2 Extreme learning machine

ELM is a multi-perceptron composed of an input node, a hidden node, and an output node (Fig. 1(b)). It predicts by applying a weight (w) and bias (b) between the input node and the hidden node and a weight (β) between the hidden node and the output node (Huang et al., 2006). Additionally, the ELM can be predicted by applying weights and biases to input and output vectors using a single hidden layer feed-forward neural network training method, as shown in Eq. (4) (Wang et al., 2021). The number of hidden nodes, which is a hyperparameter of the ELM algorithm, was set to 1000.

 


2.2.3 k-nearest neighbor

kNN determines the k neighbors closest to the query in the data feature space (Fig. 1(c)) and predicts the query using distance-based weights (Zhang et al., 2018). In this study, the R “class” package (Ripley and Venables, 2021b) was used, and the hyperparameter k was set to 12.

 
2.2.4 Support vector regression

SVR determines a hyperplane composed of support vectors that can classify the maximum margin for the distance between vectors (Fig. 1(d)) and returns the data based on the ε-insensitive loss function (Taghizadeh-Mehrjardi et al., 2017). In this study, the R “e1071” package (Meyer et al., 2021) and the radial basis function (RBF) kernel of SVR were used, and the hyperparameters were set as follows: epsilon (ε) = 0.1, gamma (γ) = 0.2, and cost (C) = 3.


2.2.5 Random forest

RF constructs N decision trees by combining randomly selected variables for each node (Fig. 1(e)), and predicts the results by ensembles the results of each decision tree (Wright and Ziegler, 2017). In this study, the R “ranger” package (Wright et al., 2020) was used and the hyperparameters were set as follows: num.trees (number of trees) = 790, mtry (number of variables randomly sampled at each node) = 8, and min.node.size (minimal node size) = 4.

 
2.2.6 Extreme gradient boosting

XGB improves the predictive power by sequential reinforcement learning of a decision tree through boosting (Fig. 1(f)). In this study, the R “xgboost” package (Chen et al., 2022) and the Gaussian distribution function kernel were used; additionally, the hyperparameters were set as follows: n.rounds (maximum number of iterations) = 1080, max_depth (maximum depth of binary tree) = 8, and eta (learning rate) = 0.1.

 
2.3 Time Series of Meteorological and Particulate Matter Variables in Seoul

The daily mean time series showed varying trends in the collected data (2018–2020) (Fig. 2). PM10 and PM2.5 concentrations in Seoul increase in the dry season (December–May) (Hur et al., 2016), during which yellow sand and air pollutants arising from fossil fuel use and industrial activities in China, Mongolia, and surrounding urban areas are carried into Seoul by the westerlies (Ghim et al., 2015; Oh et al., 2015; Jeong et al., 2017; Peterson et al., 2019; Oh et al., 2020; Hur et al., 2021). According to the air pressure pattern, high PM concentration is maintained for long periods, which deteriorates visibility (Lee et al., 2013; Kim and Chun, 2013; Park et al., 2019). In particular, in the case of PM2.5, the higher the concentration, the greater is the decrease in visibility owing to strong scattering in the atmosphere (Ma et al., 2020). PM10 and PM2.5 increases with an increase in surface temperature and WS (Kim et al., 2017; Kim, 2019; Plocoste and Galif, 2021), but decreases due to precipitation in the rainy season (June–November) (Lee et al., 2013; Kim and Kim, 2020). In addition, high RH scatters light from hygroscopically grown PM, which further reduces the visibility (Jung et al., 2009; Lee et al., 2014; Qu et al., 2015; Ma et al., 2020). Ta–Td and RH shows a negative correlation (r = –0.97), and as Ta–Td approaches 0 K, the atmosphere becomes wetter and condensation occurs, which acts as an important factor in the deterioration of visibility (Yu et al., 2019). Therefore, visibility is impaired during periods of high PM (especially PM2.5) concentrations and high RH (low Ta–Td) (Zhang et al., 2010). During precipitation, visibility deteriorates, but visibility is also improved by the cleaning effect of suspended matter in the atmosphere (Founda et al., 2016; Kim et al., 2021b). The occurrence frequency of the monthly mean low visibility in Seoul is shown in Fig. 3. The occurrence frequency of low visibility was high in the dry season when PM concentration was high (< 10 km: 19–37%, < 5 km: 6–15%), and showed a low distribution in the rainy season when the PM10 concentration was low (< 10 km: 7–24%, < 5 km: 3–9%). This pattern was similar to the low visibility frequency distribution observed in Hebei, China, wherein SO2 emissions and the occurrence frequency of low visibility were proportionally related (Fu et al., 2014). In addition, the occurrence frequency of low visibility was the lowest in August, due to multiple precipitation events (approximately 38 cases) with high rainfall intensity (approximately 2.28 mm h–1).

Fig. 2. Time series of daily mean meteorological and particulate matter data in Seoul: (a) Visibility (black), PM10 (red), and PM2.5 (blue); b) Ta (red), Ta–Td (blue), and RH (black); and (c) precipitation (red), wind speed (blue), and wind direction (black arrow).Fig. 2. Time series of daily mean meteorological and particulate matter data in Seoul: (a) Visibility (black), PM10 (red), and PM2.5 (blue); b) Ta (red), Ta–Td (blue), and RH (black); and (c) precipitation (red), wind speed (blue), and wind direction (black arrow).

 Fig. 3. Occurrence frequency of monthly mean low visibility in Seoul (< 10 km: black, < 5 km: gray).Fig. 3. Occurrence frequency of monthly mean low visibility in Seoul (< 10 km: black, < 5 km: gray).


3 RESULTS AND DISCUSSION


 
3.1 Training and Validation Results of Machine Learning Algorithms

Table 1 shows the visibility estimation results for each ML algorithm with hyperparameters optimized using the training and validation datasets. The decision tree-based XGB and RF algorithms performed better than the neural network algorithm-based ANN and ELM, while the kNN algorithm based on the vector distance between data showed the poorest performance. The XGB algorithm showed the best output performance in the training and validation datasets and was the most suitable for estimating the visibility using the present data. In addition, compared to other algorithms, XGB showed very fast learning and predictions (requiring only a few seconds). The results of the XGB algorithm were in good agreement with the visibility observations and 1:1 line (Fig. 4), showing a small difference (bias = 0 km, RMSE = 0.08 km, and r = 1). Therefore, XGB, which showed low computational cost and high accuracy, was selected as the visibility estimation algorithm for this study.

 Table 1. Visibility estimation results of the training and validation datasets for each machine learning (ML) algorithm.

Fig. 4. Scatter plots of the observed visibility (VISobs) and the estimated visibility (VISXGB) by the XGB algorithm for the training dataset. The red line is the 1:1 line. Fig. 4. Scatter plots of the observed visibility (VISobs) and the estimated visibility (VISXGB) by the XGB algorithm for the training dataset. The red line is the 1:1 line.

The relative importance of the input variables for learning the XGB algorithm is shown in Fig. 5. Relative importance indicates the relative contribution of a feature to the predicted result based on impurity variance to each split leaf (data feature) in the process of growing each node of the tree. The relative importance of PM2.5 (51.05%) and RH (18.82%), and Ta–Td (12.18%) variables were the high, while that of precipitation (1.17%) and WS (0.96%) was the low. PM concentration and RH have a significant influence on changes in visibility (Maurer et al., 2019; Kim et al., 2021b). PM2.5 was shown to be the most important variable in estimating visibility because suspended matter of size smaller than PM10 caused larger scattering in the atmosphere and more frequent disturbances in visibility such as haze (Cheng et al., 2017; Ma et al., 2020). RH and Ta–Td act as important factors in reducing visibility by changing the characteristics of atmospheric aerosols and causing condensation in the wet atmosphere (Yu et al., 2019). Julian day reflects the monthly and seasonal periodicity of variations in visibility, and temperature and pressure are related to overall weather (Kim et al., 2021b). The hour variable can also reflect variations in visibility or the daily periodicity of weather variables that affect visibility variations. However, it showed a relatively low importance because visibility varies directly according to variables such as PM2.5 and RH. In the case of wind direction, visibility varies in relation to the inflow of air pollutants and dry or wet air; visibility is improved according to variations in wind speed (Ma et al., 2020). However, in this study, the variation in wind speed was not large and the contribution to the visibility estimation by other variables (such as PM2.5 and RH) were high, indicating its relatively low importance. When precipitation occurs, changes in weather conditions occurred, such as high RH (low Ta–Td) and a decrease in PM concentration, and the visibility deteriorated or improved, thereby reducing the corresponding relative importance. When precipitation occurs, changes in weather conditions occurred, such as high RH (low Ta–Td) and a decrease in PM concentration, and the visibility deteriorated or improved, thereby reducing the corresponding relative importance.

 Fig. 5. Variable relative importance of the XGB algorithm on training.Fig. 5. Variable relative importance of the XGB algorithm on training.

 
3.2 Analysis of the Visibility Estimation Results

The scatter plot of visibility (VISXGB) estimated by the XGB algorithm using the test dataset, and the observed visibility (VISobs) (bias = –0.11 km, RMSE = 2.08 km, and r = 0.94) is shown in Fig. 6. The correlation was stronger than that determined by Du et al. (2013), which assessed the linear relationship between visibility and the meteorological variables in metropolitan areas in China (r = 0.62), Qu et al. (2015) and Won et al. (2020), which assessed the exponential relationship between visibility and PM10 (r = 0.79) and PM2.5 (r = 0.87), and Zong et al. (2020), which used WRF-Chem and ML algorithms (r = 0.42). In addition, the correlation was higher than that determined by Sohn and Kim (2015) (r = 0.71), which estimated the visibility in Seoul. These previous studies estimated visibility during sunny days or during periods of low RH (< 60%). Fig. 7 shows Fig. 6 as a daily mean time series, wherein VISXGB shows a trend similar to VISobs, with a small difference and a high correlation coefficient (bias = –0.11 km, RMSE = 0.68 km, and r = 0.98). That is, the visibility estimated using the ML algorithm and meteorological and particulate matter data in this study could explain approximately 96% of the observed visibility (r2 = 0.96).

Fig. 6. Scatter plots of observed visibility (VISobs) and estimated visibility (VISXGB) for the test dataset. The red line is the 1:1 line.Fig. 6. Scatter plots of observed visibility (VISobs) and estimated visibility (VISXGB) for the test dataset. The red line is the 1:1 line.

Fig. 7. Daily mean time series of observed visibility (VISobs) (black) and estimated visibility (VISXGB) (red).Fig. 7. Daily mean time series of observed visibility (VISobs) (black) and estimated visibility (VISXGB) (red).

Table 2 shows the monthly mean meteorological and particulate matter variables and the visibility estimation accuracy for the test dataset. The estimation accuracy was relatively higher in the dry season (bias = –0.06 km, RMSE = 1.79 km, and r = 0.96) than in the rainy season (bias = –0.17 km, RMSE = 2.34, and r = 0.91). During the dry season, visibility was low in Seoul. The dry season was characterized by higher PM10 and PM2.5 concentrations, higher Ta–Td, lower RH, and less precipitation than the rainy season (Kim et al., 2021b). The visibility estimation accuracy was lower in summer (June–August) than in other seasons (bias = –0.17 km, RMSE = 2.43 km, and r = 0.91) because of multiple precipitation days and strong precipitation intensity, such as in August. Fig. 8 compares the VISobs and VISXGB accuracies for each variable interval for each meteorological and particulate matter variable. As for the accuracy of estimating the visibility for each section of PM10 and PM2.5, the RMSE decreased and r increased as the PM concentration increased. That is, a high PM concentration (especially PM2.5) has a great effect on the decrease in visibility; therefore, the estimated results were highly accurate (Won et al., 2020). RH and Ta–Td variables showed relatively high visibility estimation accuracy, except during very dry or wet weather conditions. In particular, since the scattering characteristics of aerosols in the atmosphere change as the atmosphere becomes wetter, a large error may occur in the visibility estimation (Jung et al., 2009). In the case of wind speed, the visibility estimation error was relatively large at 6 m s–1 or more. The data characteristics of this section had the highest precipitation (0.45 mm h–1) and the lowest PM concentration (PM10: 26.92 µg m–3, PM2.5: 15.23 µg m–3) compared to other sections. Conversely, in the case of precipitation, the difference increased according to the presence and absence of precipitation and the intensity of precipitation, while correspondingly, the correlation coefficient decreased significantly. That is, visibility can improve or deteriorate depending on the precipitation characteristics (type and intensity), which causes difficulties in estimating the visibility (Gultepe and Milbrandt, 2010). Nevertheless, the monthly mean visibility estimation accuracy was as follows: bias = –0.11 km, RMSE = 2.05 km, and r = 0.93. These results showed lower variability and higher accuracy than previous studies that quantitatively estimated precipitation using satellite-based (Nguyen et al., 2021), radar-based (Shin et al., 2019), and numerical model-based (Ko et al., 2020) data using ML algorithms. Therefore, the application of visibility estimations using ML and meteorological and particulate matter data is expected to be high.

Table 2. Monthly mean VISobs, PM10, PM2.5, Ta–Td, RH, WS, and precipitation data and monthly estimation accuracy of VISXGB. Figures in parentheses indicate standard deviations.

Fig. 8. Visibility estimation accuracy for each interval for each meteorological and particulate matter variable (a) PM10, b) PM2.5, (c) RH, (d) Ta–Td, (e) WS, and f) precipitation). The number in parentheses below the interval represents the data ratio (%) to the total data.Fig. 8. Visibility estimation accuracy for each interval for each meteorological and particulate matter variable (a) PM10, b) PM2.5, (c) RH, (d) Ta–Td, (e) WS, and f) precipitation). The number in parentheses below the interval represents the data ratio (%) to the total data.

 
4 CONCLUSIONS


In this study, visibility in Seoul, South Korea, was estimated using meteorological and particulate matter data acquired from the AWS of KMA and AMS of MOE observatory, and using an ML algorithm; moreover, the estimated visibility and visibility observed by the visibility sensor were compared and analyzed. Weather information (temperature, RH, and precipitation) observed by the AWS, and PM10 and PM2.5 observed by AMS data were used. The visibility estimation performance of the ML and XGB algorithms was superior to that of the RF, neural networks (ANN and ELM), and vector distance-based algorithms (kNN and SVR). The relative importance of the variables input in this process was approximately 51% and 19% for PM2.5 and RH, respectively. Conversely, the relative importance of precipitation and WS were the low (approximately 1%). Visibility estimated using the test dataset (bias = –0.11 km, RMSE = 2.08 km, and r = 0.94) showed higher accuracy than the results of previous studies. Moreover, in this study, the meteorological conditions (low RH, no precipitation, and sunny days) were not restricted for visibility estimations, and thus, visibility was estimated for all weather conditions. Although the estimated accuracy of visibility differed based on the month and season, the accuracy of the estimated visibility was high during the dry season (December–May), when the frequency of low visibility was high.

The accuracy of estimating the monthly mean visibility was high (bias = –0.12 km, RMSE = 2.05 km, and r = 0.93); thus, visibility could be estimated with high accuracy using meteorological and particulate matter variables and ML algorithms. Large metropolitan cities with high floating populations, such as Seoul, are extremely sensitive to public health issues related to air quality (Kim and Lee, 2018). However, densely populated high-rise buildings and high real estate prices in these areas increase the difficulty in establishing observation stations to measure visibility. Therefore, the method proposed in this study can assist in visibility estimations in areas where visibility cannot be observed using only the available meteorological and particulate matter data.

 
ACKNOWLEDGMENTS


This work was funded by the Korea Meteorological Administration Research and Development Program “Research on Weather Modification and Cloud Physics” under Grant (KMA2018-00224).

 
DISCLAIMER


The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.


REFERENCES


  1. Bari, D. (2018). Visibility Prediction Based on Kilometric NWP Model Outputs Using Machine-Learning Regression. In Proceedings of the 2018 IEEE 14th International Conference on E-Science (E-Science), Amsterdam, The Netherlands, 29 October–1 November 2018, pp. 1–278. https://doi.org/10.1007/s42452-020-2327-x

  2. Bergstra, J., Bengio, Y. (2012). Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305. https://doi.org/10.5555/2188385.2188395

  3. Chen, T., He, T., Benesty, M., XGBoost contributors (2022). Package ‘xgboost’. R Reference Document, pp. 1–66. (accessed 1 March 2022).

  4. Cheng, Z., Ma, X., He, Y., Jiang, J., Wang, X., Wang, Y., Sheng, L., Hu, J., Yan, N. (2017). Mass extinction efficiency and extinction hygroscopicity of ambient PM2.5 in urban China. Environ. Res. 156, 239–246. https://doi.org/10.1016/j.envres.2017.03.022

  5. Cornejo-Bueno, L., Casanova-Mateo, C., Sanz-Justo, J., Cerro-Prada, E., Salcedo-Sanz, S. (2017). Efficient prediction of low-visibility events at airports using machine-learning regression. Boundary Layer Meteorol. 165, 349–370. https://doi.org/10.1007/s10546-017-0276-8

  6. Cornejo-Bueno, S., Casillas-Pérez, D., Cornejo-Bueno, L., Chidean, M.I., Caamaño, A.J., Sanz-Justo, J., Casanova-Mateo, C., Salcedo-Sanz, S. (2020). Persistence analysis and prediction of low-visibility events at Valladolid airport, Spain. Symmetry 12, 1045. https://doi.org/10.3390/​sym12061045

  7. Du, K., Mu, C., Deng, J., Yuan, F. (2013). Study on atmospheric visibility variations and the impacts of meteorological parameters using high temporal resolution data: An application of Environmental Internet of Things in China. Int. J. Sustain. Dev. 20, 238–247. https://doi.org/​10.1080/13504509.2013.783886

  8. Fita, L., Polcher, J., Giannaros, T.M., Lorenz, T., Milovac, J., Sofiadis, G., Katragkou, E., Bastin, S. (2019). CORDEX-WRF v1. 3: Development of a module for the Weather Research and Forecasting (WRF) model to support the CORDEX community. Geosci. Model Dev. 12, 1029–1066. https://doi.org/10.5194/gmd-12-1029-2019

  9. Founda, D., Kazadzis, S., Mihalopoulos, N., Gerasopoulos, E., Lianou, M., Raptis, P.I. (2016). Long-term visibility variation in Athens (1931–2013): A proxy for local and regional atmospheric aerosol loads. Atmos. Chem. Phys. 16, 11219–11236. https://doi.org/10.5194/acp-16-11219-2016

  10. Fu, G.Q., Xu, W., Yang, R.F., Li, J.B., Zhao, C.S. (2014). The distribution and trends of fog and haze in the North China Plain over the past 30 years. Atmos. Chem. Phys. 14, 11949–11958. https://doi.org/10.5194/acp-14-11949-2014

  11. Ghim, Y.S., Chang, Y.S., Jung, K. (2015). Temporal and spatial variations in fine and coarse particles in Seoul, Korea. Aerosol Air Qual. Res. 15, 842–852. https://doi.org/10.4209/aaqr.2013.12.0362

  12. Gultepe, I., Milbrandt, J.A. (2010). Probabilistic parameterizations of visibility using observations of rain precipitation rate, relative humidity, and visibility. J. Appl. Meteorol. Climatol. 49, 36–46. https://doi.org/10.1175/2009JAMC1927.1

  13. Guo, B., Wang, Y., Zhang, X., Che, H., Zhong, J., Chu, Y., Cheng, L. (2020). Temporal and spatial variations of haze and fog and the characteristics of PM2.5 during heavy pollution episodes in China from 2013 to 2018. Atmos. Pollut. Res. 11, 1847–1856. https://doi.org/10.1016/j.apr.​2020.07.019

  14. Huang, G.B., Zhu, Q.Y., Siew, C.K. (2006). Extreme learning machine: Theory and applications. Neurocomputing 70, 489–501. https://doi.org/10.1016/j.neucom.2005.12.126

  15. Huang, H., Zhang, G. (2017). Case studies of low‐visibility forecasting in falling snow with WRF model. J. Geophys. Res. 122, 12–862. https://doi.org/10.1002/2017JD026459

  16. Hur, S.K., Oh, H.R., Ho, C.H., Kim, J., Song, C.K., Chang, L.S., Lee, J.B. (2016). Evaluating the predictability of PM10 grades in Seoul, Korea using a neural network model based on synoptic patterns. Environ. Pollut. 218, 1324–1333. https://doi.org/10.1016/j.envpol.2016.08.090

  17. Hur, S.K., Ho, C.H., Kim, J., Oh, H.R., Koo, Y.S. (2021). Systematic bias of WRF-CMAQ PM10 simulations for Seoul, Korea. Atmos. Environ. 244, 117904. https://doi.org/10.1016/j.​atmosenv.2020.117904

  18. Jeong, U., Kim, J., Lee, H., Lee, Y.G. (2017). Assessing the effect of long-range pollutant transportation on air quality in Seoul using the conditional potential source contribution function method. Atmos. Environ. 150, 33–44. https://doi.org/10.1016/j.atmosenv.2016.​11.017

  19. Jung, J., Lee, H., Kim, Y.J., Liu, X., Zhang, Y., Hu, M., Sugimoto, N. (2009). Optical properties of atmospheric aerosols obtained by in situ and remote measurements during 2006 Campaign of Air Quality Research in Beijing (CAREBeijing‐2006). J. Geophys. Res. 114, D00G02. https://doi.org/10.1029/2008JD010337

  20. Kim, B.Y., Cha, J.W., Chang, K.H. (2021a). Twenty-four-hour cloud cover calculation using a ground-based imager with machine learning. Atmos. Meas. Tech. 14, 6695–6710. https://doi.org/10.5194/amt-14-6695-2021

  21. Kim, B.Y., Cha, J.W., Chang, K.H., Lee, C. (2021b). Visibility Prediction over South Korea based on random forest. Atmosphere 12, 552. https://doi.org/10.1016/j.atmosenv.2016.11.017

  22. Kim, H.C., Kim, S., Kim, B.U., Jin, C.S., Hong, S., Park, R., Son, S.W., Bae, C., Bae, M., Song, C.K., Stein, A. (2017). Recent increase of surface particulate matter concentrations in the Seoul Metropolitan Area, Korea. Sci. Rep. 7, 1–7. https://doi.org/10.1038/s41598-017-05092-8

  23. Kim, M.J. (2019). Changes in the relationship between particulate matter and surface temperature in Seoul from 2002–2017. Atmosphere 10, 238. https://doi.org/10.3390/atmos10050238

  24. Kim, S., Chun, Y. (2013). Physical and chemical features of Asian dust aerosol mixed with haze during 14–19 March 2009. Asia-Pac. J. Atmos. Sci. 49, 543–550. https://doi.org/10.1007/​s13143-013-0048-4

  25. Kim, S.U., Kim, K.Y. (2020). Physical and chemical mechanisms of the daily-to-seasonal variation of PM10 in Korea. Sci. Total Environ. 712, 136429. https://doi.org/10.1016/j.scitotenv.​2019.136429

  26. Kim, Y.P., Lee, G. (2018). Trend of air quality in Seoul: Policy and science. Aerosol Air Qual. Res. 18, 2141–2156. https://doi.org/10.4209/aaqr.2018.03.0081

  27. Ko, C.M., Jeong, Y.Y., Lee, Y.M., Kim, B.S. (2020). The development of a quantitative precipitation forecast correction technique based on machine learning for hydrological applications. Atmosphere 11, 111. https://doi.org/10.3390/atmos11010111

  28. Lee, J.Y., Jo, W.K., Chun, H.H. (2014). Characteristics of atmospheric visibility and its relationship with air pollution in Korea. J. Environ. Qual. 43, 1519–1526. https://doi.org/10.2134/jeq2014.​02.0066

  29. Lee, J.Y., Jo, W.K., Chun, H.H. (2015). Long-term trends in visibility and its relationship with mortality, air-quality index, and meteorological factors in selected areas of Korea. Aerosol Air Qual. Res. 15, 673–681. https://doi.org/10.4209/aaqr.2014.02.0036

  30. Lee, S., Ho, C.H., Lee, Y.G., Choi, H.J., Song, C.K. (2013). Influence of transboundary air pollutants from China on the high-PM10 episode in Seoul, Korea for the period October 16–20, 2008. Atmos. Environ. 77, 430–439. https://doi.org/10.1016/j.atmosenv.2013.05.006

  31. Li, L., Zhao, Z., Wang, H., Wang, Y., Liu, N., Li, X., Ma, Y. (2020). Concentrations of Four Major Air Pollutants among Ecological Functional Zones in Shenyang, Northeast China. Atmosphere 11, 1070. https://doi.org/10.3390/atmos11101070

  32. Ma, C.J., Lim, C.S., Kang, G.U., Jung, S.A., Jo, M.R. (2020). Visibility degradation and its contributors at an urban site in Korea. Asian J. Atmos. Environ. 14, 335–334. https://doi.org/​10.5572/ajae.2020.14.4.335

  33. Maurer, M., Klemm, O., Lokys, H.L., Lin, N.H. (2019). Trends of fog and visibility in Taiwan: Climate change or air quality improvement? Aerosol Air Qual. Res. 19, 896–910. https://doi.org/​10.4209/aaqr.2018.04.0152

  34. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C.C., Lin, C.C. (2021). Package ‘e1071’. R Reference Document, pp. 1–66. (accessed 1 March 2022).

  35. Ministry of Environment (MOE) (2021). Air pollution monitoring network installation and operation manual. Ministry of Environment, pp. 1–676. (accessed 16 May 2022). (in Korean)

  36. Nguyen, G.V., Le, X.H., Van, L.N., Jung, S., Yeon, M., Lee, G. (2021). Application of random forest algorithm for merging multiple satellite precipitation products across South Korea. Remote Sens. 13, 4033. https://doi.org/10.3390/rs13204033

  37. Oh, H.R., Ho, C.H., Kim, J., Chen, D., Lee, S., Choi, Y.S., Chang, L.S., Song, C.K. (2015). Long-range transport of air pollutants originating in China: A possible major cause of multi-day high-PM10 episodes during cold season in Seoul, Korea. Atmos. Environ. 109, 23–30. https://doi.org/​10.1016/j.atmosenv.2015.03.005

  38. Oh, H.R., Ho, C.H., Koo, Y.S., Baek, K.G., Yun, H.Y., Hur, S.K., Choi, D.R., Jhun, J.G., Shim, J.S. (2020). Impact of Chinese air pollutants on a record-breaking PMs episode in the Republic of Korea for 11–15 January 2019. Atmos. Environ. 223, 117262. https://doi.org/10.1016/j.atmosenv.2020.​117262

  39. Ozer, P., Laghdaf, M.B.O.M., Lemine, S.O.M., Gassani, J. (2007). Estimation of air quality degradation due to Saharan dust at Nouakchott, Mauritania, from horizontal visibility data. Water Air Soil Pollut. 178, 79–87. https://doi.org/10.1007/s11270-006-9152-8

  40. Park, I.S., Kim, H.K., Song, C.K., Jang, Y.W., Kim, S.H., Cho, C.R., Owen, J.S., Kim, C.H., Chung, K.W., Park, M.S. (2019). Meteorological characteristics and assessment of the effect of local emissions during high PM10 concentration in the Seoul Metropolitan Area. Asian J. Atmos. Environ. 13, 117–135. https://doi.org/10.5572/ajae.2019.13.2.117

  41. Peterson, D.A., Hyer, E.J., Han, S.O., Crawford, J.H., Park, R.J., Holz, R., Kuehn, R.E., Eloranta, E., Knote, C., Jordan, C.E., Lefer, B.L. (2019). Meteorology influencing springtime air quality, pollution transport, and visibility in Korea. Elem. Sci. Anth. 7, 57. https://doi.org/10.1525/elementa.395

  42. Plocoste, T., Calif, R. (2021). Is there a causal relationship between Particulate Matter (PM10) and air Temperature data? An analysis based on the Liang–Kleeman information transfer theory. Atmos. Pollut. Res. 12, 101177. https://doi.org/10.1016/j.apr.2021.101177

  43. Plocoste, W.J., Wang, J., Zhang, X.Y., Wang, D., Sheng, L.F. (2015). Influence of relative humidity on aerosol composition: Impacts on light extinction and visibility impairment at two sites in coastal area of China. Atmos. Res. 153, 500–511. https://doi.org/10.1016/j.atmosres.2014.10.009

  44. Ripley, B., Venables, W. (2021a). Package ‘class’. R Reference Document, pp. 1–19. (accessed 1 March 2022).

  45. Ripley, B., Venables, W. (2021b). Package ‘nnet’. R Reference Document, pp. 1–11. (accessed 1 March 2022).

  46. Rosa, J.P.S., Guerra, D.J.D., Horta, N.C.G., Martins, R.M.F., Lourenço, N.C.C. (2020). Overview of Artificial Neural Networks, in: Rosa, J.P.S., Guerra, D.J.D., Horta, N.C.G., Martins, R.M.F., Lourenço, N.C.C. (Eds.), Using Artificial Neural Networks for Analog Integrated Circuit Design Automation, Springer International Publishing, Cham, pp. 21–44. https://doi.org/10.1007/​978-3-030-35743-6_3

  47. Shin, J.Y., Ro, Y., Cha, J.W., Kim, K.R., Ha, J.C. (2019). Assessing the applicability of random forest, stochastic gradient boosted model, and extreme learning machine methods to the quantitative precipitation estimation of the radar data: A case study to Gwangdeoksan Radar, South Korea, in 2018. Adv. Meteorol. 2019, e6542410. https://doi.org/10.1155/2019/6542410

  48. Singh, A., George, J.P., Iyengar, G.R. (2018). Prediction of fog/visibility over India using NWP Model. J. Earth Syst. Sci. 127, 1–13. https://doi.org/10.1007/s12040-018-0927-2

  49. Sohn, K.T., Kim, D. (2015). Development of statistical forecast model for PM10 concentration over Seoul. J. Korean Data Inf. Sci. Soc. 26, 289–299. https://doi.org/10.7465/jkdi.2015.26.2.289

  50. Taghizadeh-Mehrjardi, R., Neupane, R., Sood, K., Kumar, S. (2017). Artificial bee colony feature selection algorithm combined with machine learning algorithms to predict vertical and lateral distribution of soil organic matter in South Dakota, USA. Carbon Manage. 8, 277–291. https://doi.org/10.1080/17583004.2017.1330593

  51. Thach, T.Q., Wong, C.M., Chan, K.P., Chau, Y.K., Chung, Y.N., Ou, C.Q., Yang, L., Hedley, A.J. (2010). Daily visibility and mortality: Assessment of health benefits from improved visibility in Hong Kong. Environ. Res. 110, 617–623. https://doi.org/10.1016/j.envres.2010.05.005

  52. Wang, J., Lu, S., Wang, S.H., Zhang, Y.D. (2021). A review on extreme learning machine. Multimed Tools Appl. https://doi.org/10.1007/s11042-021-11007-7

  53. Won, W.S., Oh, R., Lee, W., Kim, K.Y., Ku, S., Su, P.C., Yoon, Y.J. (2020). Impact of fine particulate matter on visibility at Incheon International Airport, South Korea. Aerosol Air Qual. Res. 20, 1048–1061. https://doi.org/10.4209/aaqr.2019.03.0106

  54. World Meteorological Organization (WMO) (2014). Guide to Meteorological Instruments and Methods of Observation. World Meteorological Organization, Geneva, Switzerland, pp. 1–1128.

  55. Wright, M.N., Ziegler, A. (2017). Ranger: A fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 77, 1–17. https://doi.org/10.18637/jss.v077.i01

  56. Wright, M.N., Wager, S., Probst, P. (2020). Package ‘ranger’. R Reference Document, pp. 1–25. (accessed 1 March 2022).

  57. Wu, J., Fu, C., Zhang, L., Tang, J. (2012). Trends of visibility on sunny days in China in the recent 50 years. Atmos. Environ. 55, 339–346. https://doi.org/10.1016/j.atmosenv.2012.03.037

  58. Wu, X., Wang, Y., He, S., Wu, Z. (2020). PM2.5∕PM10 ratio prediction based on a long short-term memory neural network in Wuhan, China. Geosci. Model Dev. 13, 1499–1511. https://doi.org/​10.5194/gmd-13-1499-2020

  59. Xiong, Z., Cui, Y., Liu, Z., Zhao, Y., Hu, M., Hu, J. (2020). Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation. Comput. Mater. Sci. 171, 109203. https://doi.org/10.1016/j.commatsci.2019.109203

  60. Yu, H., Li, T., Liu, P. (2019). Influence of ENSO on frequency of wintertime fog days in Eastern China. Clim. Dyn. 52, 5099–5113. https://doi.org/10.1007/s00382-018-4437-3

  61. Yum, S.S., Cha, J.W. (2010). Suppression of very low intensity precipitation in Korea. Atmos. Res. 98, 118–124. https://doi.org/10.1016/j.atmosres.2010.06.006

  62. Zhang, Q.H., Zhang, J.P., Xue, H.W. (2010). The challenge of improving visibility in Beijing. Atmos. Chem. Phys. 10, 7821–7827. https://doi.org/10.5194/acp-10-7821-2010

  63. Zhang, S., Cheng, D., Deng, Z., Zong, M., Deng, X. (2018). A novel kNN algorithm with data-drive k parameter computation. Pattern Recognit. Lett. 109, 44–54. https://doi.org/10.1016/j.​patrec.2017.09.036

  64. Zong, P., Zhu, Y., Wang, H., Liu, D. (2020). WRF-Chem simulation of winter visibility in Jiangsu, China, and the application of a neural network algorithm. Atmosphere 11, 520. https://doi.org/​10.3390/atmos11050520 


Share this article with your colleagues 

 

Subscribe to our Newsletter 

Aerosol and Air Quality Research has published over 2,000 peer-reviewed articles. Enter your email address to receive latest updates and research articles to your inbox every second week.

6.5
2021CiteScore
 
 
77st percentile
Powered by
Scopus
 
   SCImago Journal & Country Rank

2021 Impact Factor: 4.53
5-Year Impact Factor: 3.668

Aerosol and Air Quality Research partners with Publons

Aerosol and Air Quality Research partners with Publons

CLOCKSS system has permission to ingest, preserve, and serve this Archival Unit
CLOCKSS system has permission to ingest, preserve, and serve this Archival Unit

Aerosol and Air Quality Research (AAQR) is an independently-run non-profit journal that promotes submissions of high-quality research and strives to be one of the leading aerosol and air quality open-access journals in the world. We use cookies on this website to personalize content to improve your user experience and analyze our traffic. By using this site you agree to its use of cookies.