Impact of Climate Change and Air Pollution Forecasting Using Machine Learning Techniques in Bishkek

During recent years, severe air-pollution problems have garnered worldwide attention due to their effects on human health and the environment. Air pollution in Bishkek, Kyrgyz Republic, is an ever-increasing problem with little research conducted on the impact of air pollutants on public health. We evaluate the performance of several machine learning algorithms applied to air quality and meteorology datasets and compare prediction accuracies of Bishkek air quality given its significant public importance. Data on 16 synoptic atmospheric process were collected by Kyrgyzhydromet from 2016 to 2018 and used to train and build a forecasting model. The model was then tested using data collected in 2020. Climate change in Bishkek and the impact on air pollution was assessed via the frequency of days characterized by daytime temperature inversions and air stagnation. Atmospheric stability increased from 2015 to 2020 with ongoing climate change leading to more temperature inversions. About 80%–90% of days with temperature inversions are associated with winter heating seasons and these numbers increased two-fold during the past 5 years. The impact of lockdown during COVID-19 (22 March–11 May 2020) on air quality in Bishkek is also shown. During the lockdown period, CO, NO, NO2, SO2, and PM2.5 decreased by 64%, 1.5%, 75%, 24%, and 54%, respectively, compared to concentrations of these pollutants in 2019. Where identified, emissions from vehicles make up a significant part of the air pollution.


INTRODUCTION
Air pollution is one of the key environmental problems of densely populated and large cities. To identify the major factors contributing to air pollution and counteract these in a timely manner, there is a need for continuous monitoring of the air environment. Atmospheric pollution in Bishkek has become one of the most significant environmental threats in Kyrgyzstan. Even with the absence of large-scale industrial pollution, Bishkek is at the top of world rankings for air pollution. For example, in November 2019, according to "Empowering the World to Breathe Cleaner Air" (IQAir, 2021), Bishkek was ranked as having the worst air quality worldwide. According to the World Air Quality Index (AQI), as of January 14, 2020, Bishkek ranked ninth in the world for air pollution with an AQI indicator of 179 (Isakova, 2020).
The Government of Kyrgyzstan (KyrgyzHydroMet, 2021) is therefore paying increased attention to monitoring of PM2.5 concentrations and has established about 50 air quality monitoring sensors throughout Bishkek to assess PM 2.5 . Additionally, the Government plans to establish four air quality monitoring stations to measure related atmospheric pollutants, e.g., CO, SO 2 , NO 2 , and NO.
Nitrogen oxides (NO x ), including nitrogen dioxide (NO 2 ) and nitrogen oxide (NO) act on the circulatory system leading to oxygen deficiency and directly affecting the central nervous system, resulting in a constant decrease in immunity (Ahmetshina, 2015). The main anthropogenic sources of NO x are high-temperature combustion of fossil fuels (natural gas, coal, gasoline, fuel oil) at thermal power plants, industrial plants, and from automobiles (Potehina, 1984;Rabinovich, 1978).
Carbon monoxide (CO) and sulfur dioxide (SO 2 ) are highly toxic. SO 2 and CO enter the atmosphere mainly from the incineration of solid waste, exhaust gases from vehicles, emissions from industrial enterprises, and fossil fuel power stations (Ahmetshina, 2015;Potehina, 1984).
Formaldehyde (HCOH) exerts general toxic effects (damage to the central nervous system, liver, and kidneys), is a strong irritant/allergen, and induces mutagenic effects. It is formed by incomplete combustion of liquid fuel (primarily from vehicles) and as a result of a chain of chemical reactions of hydrocarbons with NOx. Formaldehyde concentrations increase significantly near highways in the summer months where solar radiation intensity is high (Ahmetshina, 2015;Potehina, 1984).
The maximum permissible concentrations of these air pollutants in Kyrgyzstan are given in Table 1 (KyrgyzHydroMet, 2021). In Table 1, PDK (daily average) is the average daily allowable concentration limit (the concentration of a pollutant beyond which it is considered dangerous to health).
The main sources of air pollutants in Bishkek include emissions from motor vehicles, coal-fired power plants, and heating in private homes by burning coal and other solids. Unfortunately, it is impossible to accurately determine the volume of emissions from these sources due to limited observations and data. In this study, attempts were made to estimate the contribution of motor vehicle emissions to air pollution in Bishkek.
Additionally, climate change and urban growth impact air quality. Conditions in modern industrial cities create their urban climate, which by the nature of climate impacts on the environment, are attributed to the mesoscale climate phenomena. The release of large amounts of heat in cities alters the gas and aerosol composition of the air; these and other factors lead to increased air temperature and the formation of so-called heat islands (Landsberg, 1981). Kuznetsova et al. (2004) showed that the strongest urban influence on the formation of "heat islands" is in winter. In winter, large vertical temperature gradients develop more slowly, so isothermal or inversion conditions at relatively high altitudes can exist throughout the day (Moore, 1975).
The formation of temperature inversions is attributed to thermal and dynamic processes. Thermal processes include radiation, orographic effects, advective influences, and spring snow inversions. Dynamic processes include subsidence inversions, turbulent inversions, and frontal inversions. In Bishkek, all types of inversions are observed, but orographic ones prevail.
The synoptic situation reflects the entire atmospheric process spectrum comprised of complex characteristics of various meteorological parameters. Data on the frequency of occurrence, power, and intensity of inversions indicate that their formation is largely associated with large-scale atmospheric processes (Bezuglaya et al., 1984). Assessing the relationship between the synoptic atmospheric situation and the concentration of pollutants enables the prediction of environmental impacts in urban areas (Bezuglaya, 1991). For Bishkek City, this is the first investigation to assess the impact of climate change by analyzing the trend of the increasing number of days with temperature inversions and changes in the frequency of synoptic situations. The only air pollution investigation conducted in Bishkek has been a quantitative assessment and prediction of the degree of ambient air benzo(a)pyrene pollution (Vasil'kova et al., 2007). Thus, little progress has been made in assessing comprehensive air pollution in Bishkek. Nevertheless, the emergence of new research and technologies to analyze and predict air quality can be used to solve air pollution problems in Bishkek. For example, in developed countries machine learning (ML) technologies have been employed to predict air pollution in cities (Boonphun et al., 2019;Chang et al., 2020;Junuz, 2018;Nabavi et al., 2019). Czernecki et al. (2021) mentioned that using ML based models can be cheaper and more accurate than the presently used, computationally demanding, numerical weather prediction models with chemical modules. Baklanov et al. (2017) used the Environment -High Resolution Limited Area Model to simulate air pollutants over Denmark, and (Nerobelov et al., 2021) obtained good simulations using the Weather Research and Forecasting-Chemistry (WRF-Chem) model over Saint Petersburg, Russia. Zakarin et al. (2021) applied a similar modelling approach in Almaty, Kazakhstan. To use deterministic physicochemical models, a large amount of input data on sources and emissions is needed, which are not recorded in Bishkek. Therefore, it was decided to create an air quality prediction system based on ML technologies.
Currently, there are many studies on the use of ML technology for predicting air pollutants. For example, the PM2.5 and PM 10 forecasting system in the Polish agglomerations (Czernecki et al., 2021), the PM 2.5 hourly forecast in Santiago de Chile (Perez and Menares, 2018), the PM 2.5 forecasting system in Tehran (Karimian et al., 2019), and in Shanghai, China, ML technology is used to improve PM 2.5 forecasts using WRF-Chem model simulations (Ma et al., 2020). Most of these studies focus on PM 2.5 and PM 10 particles, but our research evaluates the possibility of creating a prediction system for all measurable air pollutants in Bishkek. This is the first study devoted to the problem of forecasting and monitoring of air quality in Bishkek.
We analyzed unique cases in Bishkek when there were no motor vehicles except for ambulances in the city. Lockdown during the COVID-19 pandemic made it possible to estimate the contribution of emissions from motor vehicles in the city. Additionally, other investigations have assessed impacts of the recent lockdowns related to the COVID-19 pandemic on air quality (Bacak et al., 2020;Bidleman et al., 2012;Brauer et al., 2021;Ho et al., 2021;Hörmann et al., 2021). Based on these knowledge gaps in Bishkek, as well as the new technology opportunities, our aim is to analyse the impacts of climate change and the effect of the COVID-19 lockdown on air pollution in Bishkek and build a predictive model using ML technologies.

Study Area
The study region is Bishkek city, the capital of Kyrgyzstan, with a total area 127 km 2 (Fig. 1); Kyrgyzstan has a total area of 199,951 km 2 . Because 94% of Kyrgyzstan is mountainous, substantial topographic variability exists throughout the country. Among countries of Eastern Europe and Central Asia, Kyrgyzstan is the third most vulnerable to effects of climate change (Clare et al., 2018).
Bishkek is located at the foot of the Kyrgyz Ala-Too mountains (one of the mountain ranges of the inner Tien Shan), at an altitude of 700-900 m a.s.l. Because the city is in a basin, there are few windy days, which tend to restrict air circulation throughout the Bishkek region. Another complicating factor is that the air temperature in and above the city is about 5°C higher than in the surrounding atmosphere, creating a heat island effect. This effect blocks weak winds from outside, which prevail in the cold season. The climate of the city, being sharply continental, is determined by latitude and altitude, the considerable distance from oceans, local orographic effects, and the circulation of the atmosphere.
The observation network of Kyrgyzhydromet in Bishkek includes: one automated/manual meteorological station, seven manual air quality monitoring stations, one automated air quality monitoring station installed in 2016, and 50 observing PM2.5 sensors installed in January 2021.

Materials
In this research, we used ground meteorological station data from 1981 to 2020, automated air quality monitoring station data from 2016 to 2020, manual air quality monitoring station data from 2015 to 2020, and synoptic process data from 2015 to 2020. All data were collected by Kyrgyzhydromet.

Meteorological Station Data
Long-term datasets of Bishkek meteorological stations including monthly average air temperature, were obtained from archives of Kyrgyzhydromet. Datasets to train and test machine learning algorithms were obtained from the automated meteorological station from 2016 to 2020. These data include 10-min wind speed (WS) and direction (WD) at 10 m above ground level, pressure (PS), air humidity (RH), air temperature at 2 m above ground level (T2m), surface temperature (TS), UV-A, UV-B, net radiation (NR), precipitation (PR), and wind gust direction (GD).

Air quality monitoring station data
The data from 2016 to 2020 used to train and test the machine learning algorithms were obtained from the automatic air quality monitoring station (KyrgyzHydroMet, 2021). These data include 20-min NO, CO, NO 2 , SO 2 , PM 2.5 and PM 10 . Observations of PM 2.5 were corrected with data obtained from AirNow (2021) for the period 2019-2020. To analyze air quality changes from 2015 to 2020, data from seven manual measurement points were used. These measurements included NO, NO 2 , SO 2 , HCOH, and NH 3 .

Data on temperature inversions and synoptic processes
Vertical sounding of the atmosphere has not been conducted in Kyrgyzstan. However, Kyrgyzhydromet has used vertical sounding data from Taraz city and Almaty in Kazakhstan since 2015 to interpolate and record the temperature of the atmosphere at the geopotential height of 850 hPa (T850). By using temperature observations at a height of 2 m and T850 data, the difference was calculated and the temperature inversion for each day from 2015 to 2020 was identified.
To analyze the impact of the synoptic situation on air quality in Bishkek, synoptic conditions were identified, and daily data were digitized from the Kyrgyzhydromet archives (2015-2020).

Methods
The study used statistical analyses and machine learning technologies, a type of artificial intelligence methodology, to build a predictive model for air quality. Synoptic methods (Savichev, 1980) were used to analyze atmospheric processes. A Mann-Kendall Trend Test (Mann, 1945) was used to determine whether significant trends were present in various time series data. Specifically, annual air temperature at Bishkek meteorological station from 1981 to 2020 was analyzed.

Time series analysis
For PM 2.5 , PM 10 , NO, NO 2 , SO 2 , HCOH, and NH 3 , the number of days with daily averages exceeding the maximum permissible value (PDK) (Government decree on approval of the National report on the state of the environment of the Kyrgyz Republic for 2011-2014, 2016) were calculated from 2015 to 2020. Additionally, the number of days which experienced inversions in this period were assessed.
To analyze the effect of synoptic situations on air quality in Bishkek, the frequency of combinations of synoptic processes was calculated. The main combinations and definitions are described in the work (Isaev et al., 2020) and are shown in Table 2.

Implementation of Machine learning algorithms
Complete deterministic physicochemical models exist to predict air pollution. However, applying these models requires large amounts of initial input data on sources and emissions, which are not recorded in Bishkek. Therefore, it was decided to use data from one station in Bishkek that measured air pollutants and apply machine learning technologies to build a forecasting model.

Synoptic process code
Synoptic process type 1 South Caspian cyclone 2 Murghab cyclone 3 Upper Amu river cyclone 4 Wide discharge of warm air 5 Northwest invasion 6 Northern cold invasion 7 Wave activity 8 Sedentary cyclone over Central Asia 9 Southwestern periphery of the anticyclone 9a Southeastern periphery of the anticyclone 9b Southern periphery of the anticyclone 10 Western invasion 11 Summer thermal depression 12 Low gradient field of the high pressure 12a Low gradient field of the low pressure 13 Warm sector of the front 13a Pre Frontal position 14 Western cyclone 15 Diving cyclone 16 Cyclone over Central Asia, Cyclone over Kazakhstan Special Issue on Air Quality in a Changed World: Regional, Ambient, and Lasso Regression (LaR), Linear Regression (LR), which are described in detail by (Bolourani et al., 2021).

Forecasting system
When developing the air quality forecasting system in Bishkek, outputs from the Kyrgyzhydromet hydrodynamic atmospheric model were used as the forecast meteorological parameters. This model was adapted for the territory of Kyrgyzstan (Isaev et al., 2017(Isaev et al., , 2015.

Performance analysis
The frequency (P) of synoptic processes was calculated using: where, m is the number of certain synoptic situations within a given observation period and N is the total number of all synoptic stations during this period.

Climate Change and Air Quality in Bishkek
Based on the Mann-Kendall Trend Test, a significant (p = 0.03) increase in mean annual air temperature was observed from 1981 to 2020 at the Bishkek meteorological station (Fig. 2). According to these long-term data, the average annual temperature increase in Bishkek changes and affects the local atmospheric circulation.
We assess the impact of climate change by analyzing the increase in the number of days with temperature inversions in the year (Table 3) and for the heating period November-February (Table 4). The increase in the number of days with temperature inversions led to an increase in the number of days exceeding PDK for the year (Table 3) and for the period November-February (Table 4).
The number of days with inversions during the past 6 years is increasing (Table 3), more than doubling from 2016 to 2020. Additionally, the number of days exceeding the average daily allowable rates of various air pollution indicators has increased. However, in 2020, there was a decrease in days in which the average daily PDK was exceeded for NO2, NO and NH 3 (Table 3).   2015  22  0  73  87  78  1  --2016  23  0  96  87  97  0  79  73  2017  26  0  95  91  93  0  100  87  2018  44  0  92  86  92  2  54  52  2019  39  3  95  90  95  10  -61  2020  43  18  90  90  85  12  -93 The number of days which experienced air contaminates above PDK levels is shown for the heating period from November to February from 2015 to 2020 (Table 4). These results indicate that approximately 80%-90% of the heating period exceeded PDK values, particularly for NO 2 , NO, and HCOH. Two explanations for this exceedance are: (1) temperature inversions typically occur in these months; and (2) this is the height of the heating season when most fuel is burned. Basically, the residential neighbourhoods of Bishkek and the thermal power station are heated with coal, which is a major source of air pollution.

COVID-19 lockdown effects to air quality in Bishkek
As previously mentioned, in 2020 there was a decrease in days in which the average daily PDK was exceeded for NO 2 , NO and NH 3 (Table 3). Given the main emission source for these pollutants is automobiles, this decline in 2020 is attributed to the lockdown during the COVID-19 pandemic. During this lockdown, the use of cars was limited and there were practically no cars in the city except for rescue services. The temporal average daily concentrations of these atmospheric pollutants during the lockdown are compared with values in 2019 for the same period (Fig. 3). The lockdown was officially conducted from March 22 to May 11, 2020. Based on daily values of air pollution indicators, there was a decrease in concentrations of CO, NO 2 , and PM 2.5 during lockdown (Figs. 3(a), 3(c), and 3(e)). In Bishkek, on February 7 and 9, 2019, extreme values of PM 2.5 were observed, where the average daily values reached 181-188 µg m -3 . These days have been replaced in the graph with a monthly average value to analyze the trend of PM 2.5 during the lockdown (Fig. 3).
Based on data from Kyrgyzhydromet, during the periods from March to May 2020 and 2019, temperature regime was similar. Average monthly temperatures in Bishkek in 2019 were: March +10°C; April +13°C; and May +18°С, and in spring 2020 these average monthly temperatures were +8°C, +14°C, and +19°C, respectively. Hence, the emissions from private homes and the heating plant were similar 2019 and 2020 from March through May. According to the air pollution data during lockdown (March 22-May 11, 2020), the average daily concentrations of CO, NO, SO2, and PM 2.5 decreased by 64%, 1.5%, 75%, 24%, and 54%, respectively, compared to 2019 (Fig. 3). This also highlights the significant contribution of emissions from motor vehicles on CO, NO 2 and PM 2.5 concentrations.

Analysis of Synoptic Processes
The frequency of occurrence of synoptic processes for 2015-2020 during the heating season (November-February) was calculated (Table 5). During these months, days with temperature inversions and synoptic situations were noted and recurrences of synoptic situations were calculated. The largest frequency of processes was associated with codes "9 and 9b" -i.e., anticyclone periphery, in which almost 50% of all temperature inversions occurred and where low gradient fields of high pressure are characteristic conditions for the formation of these temperature inversions. The combination of a "heat island" in this growing metropolis, increase in average annual air temperature, and changes in climate and general circulation of the atmosphere create favorable conditions for the formation of inversion layers in the atmosphere. All these processes together contribute to the deterioration of air quality in Bishkek.

Performance Comparison of ML Approaches
To analyze the relationship amongst the atmospheric variables, Spearman's correlation coefficient was computed (Table 6).
To build a system for predicting air pollution in Bishkek, ML approaches were evaluated to describe the relationships between air pollutants and meteorological parameters. ML algorithms can themselves analyze incoming information, look for explicit and hidden patterns in these data, and thus they represent an extremely powerful modeling approach that facilitates the reproduction of extremely complex dependencies. In the machine learning (ML) approach employed, for the Table 6. Correlation coefficients for comparisons between various pollutants and atmospheric variables. Coefficients in bold are significant at the p-level 0.05. All atmospheric pollutants are rather strongly correlated with T2m and TS. Pollutants with the overall highest correlations with the range of atmospheric variables are PM2.5, CO, NO 2 , and SO 2 , implying that these pollutants originate from the same sources.  parameters outlined in Table 7, 30% of the data were allocated for the test and 70% for training. The calculated evaluation metrics indicate that the best results were obtained for RFR with HPT and XgbR with HPT. Also noteworthy is that XgbR with HPT requires more computer time to calculate compared to RFR with HPT.

Developing a 24 h Air Pollution Forecasting Model
Using the Random Forest Regressor (RFR) ML approach with Hyperparameter Tuning (HPT), we created a system for forecasting air pollution with a lead-time of 24 h. This new system has been introduced into the operational practice of Kyrgyzhydromet. A forecasting example for PM 2.5 is shown in Fig. 4. Due to the limited computing resources of Kyrgyzhydromet, currently only PM 2.5 is forecasted.

CONCLUSIONS
Air pollution is a typical problem accompanying rapid urbanization and Bishkek is a prime example of this problem. In recent years, urban air quality deteriorated and the government has begun to pay special attention to this problem. Therefore, this study focused on practical solutions to these significant problems by analyzing the impacts of climate change, the COVID-19 lockdown on air quality in Bishkek, and the problem of forecasting.
We found a significant increase in the average annual temperature in Bishkek during the past 40 years, the related formation of heat islands, and the manifestation of these effects on an increase in days with inversions of atmospheric temperature. About 80%-90% of days with temperature inversions are associated with heating seasons and days with excess levels of PDK.
During the lockdown period, CO, NO, NO 2 , SO 2 , and PM 2.5 decreased by 64%, 1.5%, 75%, 24%, and 54%, respectively, compared to concentrations of these pollutants in 2019. These results point to the significant contributions that emissions from motor vehicles have on air quality.
An increase in synoptic processes favourable for the formation of temperature inversion layers is shown. Anticyclone and anticyclone periphery, in which almost 50% of all temperature inversions occurred, and low gradient fields of high pressure are characteristic conditions for the formation of these temperature inversions.
Of the ML models tested, ML algorithms that combined RFR with HPT and XgbR with HPT were superior. Based on RFR with HPT, an operational air quality forecasting system was created in Bishkek and introduced into standard practice of Kyrgyzhydromet.
Monitoring is also important for assessing the air quality situation and developing forecasts; such activities have been limited in Bishkek until 2021. Noteworthy is that the service life of the 50 sensors is only 2 years, and it is necessary to increase the monitoring potential of Kyrgyzhydromet. Additionally, Bishkek does not conduct vertical measurements of the atmosphere profile and the temperature inversion estimates determined using measurements from neighbouring countries may not be accurate.
In the future, we plan to conduct numerical analyses using other ML and deep ML approaches to increase the accuracy of predictions. It is also advisable to resume vertical sounding of the atmosphere to calibrate existing hydrodynamic models of the atmosphere. Such measurements will lead to increased accuracy in air quality forecasts.