Fires Impact on Air Quality: Extensive Analysis of Practical Indicators

The work aimed to build the best possible machine learning model predicting the concentration of selected air pollutants and evaluate model accuracy on the days with fires in a station vicinity. The underestimation of a pollutant concentration that coincides with fire would indicate its impact on air quality. Over 1353 thousand cases of fires in Poland and data from 410 air quality stations were analyzed (from 2012–2021). Models for prediction of NO 2 , NO x , PM 10 , SO 2 , and BTEX (benzene, toluene, ethylbenzene, m,p-xylene, and o-xylene) concentrations were built for the carefully selected station (rural background; Borówiec). The accuracy of models was checked as a function of distance from the fire and validated with the dispersion of plumes emitted during big landfill fires in 2018. The share of underpredicted concentrations of PM 10 , benzene, toluene, and ethylbenzene on days when fire appeared in a range 30 km from the air quality station was significantly higher than the model performance. The concentrations of PM 10 , SO 2 , and BTEX, during plumes from landfill fires, were underestimated at least 30% of the duration of exposure. Hence, it was shown that it is very likely fires' contribution to air pollution can be evaluated using machine learning model misclassification.

Waste fires are a significant problem in developing countries (Sharma et al., 2022) as well as developed countries (Bihałowicz et al., 2021a).In Poland, the problem of landfill fires was growing in the 2010's reaching 79 fires covering more than 301 m 2 or 1501 m 3 in 2018 (KG PSP, 2020).The most generic pollutant emitted in every fire was PM10, which was the most generic pollutant for all types of waste (Bihałowicz et al., 2021a), as presented in (Bihałowicz et al., 2021c).The 79 fires a year means that landfill was burning more often than every 5 days.It is estimated that the landfill fires emitted an additional 5.3 ± 0.6 Gg of PM10, i.e., 2.17% of national emission.The summary of the emission of air pollutants in Poland is yearly provided in the Informative Inventory Reports (KOBIZE, 2020).Although these reports divide emissions into over 120 sectors including even specialized industry contribution, the fires are not mentioned as one of the sources of air pollution.
In this context, it is important to identify periods when the air quality is affected by fires.One of the ways of evaluating the contribution of sources of air pollution is the modeling of air quality, from the source-receptor approach (Weil et al., 1992) through time-series modeling (Shahriar et al., 2021) and machine learning (ML) models of concentration of PM2.5 (Bihałowicz, 2022;Shahriar et al., 2021), the concentration of tomorrow PM10 based on random forest (Jeong et al., 2020), toluene, ethylbenzene, and xylene wet depositions for extreme gradient boosting (Stojić et al., 2019) ending source apportionment evaluation and air quality management (Mach et al., 2021;Thunis et al., 2019).The model of concentration of PM10 (Jeong et al., 2020) basing on four meteorological variables (air temperature, relative humidity, wind speed, boundary layer height) had very good results except the coastal zone which implies that models should be developed locally.The model of PM2.5 (Shahriar et al., 2021) used four meteorological variables: relative humidity, rainfall, temperature and wind speed, however, used also air quality data (SO2, NOx) to predict PM2.5 concentrations.The model of toluene, ethylbenzene, and xylene (Stojić et al., 2019) was based on rain intensity, wind speed and direction, pressure, humidity, and temperature.
The quantitative contribution of fires to air pollution is difficult to evaluate, however, there is research in which its impact is quantified (Kollanus et al., 2016;Lehtomäki et al., 2018;Soares and Sofiev, 2014).It was shown that the problem of air pollution caused by landfill fires in Poland is significant on the sub-continental scale (Bihałowicz, 2021;Bihałowicz et al., 2021c), while the impact of all fires on air quality was not evaluated.This study aims to identify whether there are periods when the impact of fires can be observed on air quality stations.The simple "looking for the peak" is not enough in the case of Poland, since the emission structure is complex (Bun et al., 2019;Pyta et al., 2020;Rogula-Kozłowska et al., 2014, 2013a, 2019a).The concentrations in air are dependent mainly on two sources: transport and individual heating.The impact of the transport on air quality dominates in warm periods while in cold periods the impact of individual heating overwhelms other sources (Dzikuć et al., 2017;Nidzgorska-Lencewicz and Czarnecka, 2015;Podstawczyńska and Chambers, 2019;Rogula-Kozłowska et al., 2013b;Rozbicka et al., 2020).Apart from sources of origin, the air quality depends also on the meteorological conditions which determine plume transport and dilution (Jeong and Park, 2013;Megaritis et al., 2014;Oleniacz et al., 2016).The investigation of the trends of concentrations should be done including the interference of other parameters.The point is to determine whether concentration at a given moment is typical for such set of conditions, even if it is not a "peak" concentration.In order to include meteorological variables, an ML approach based on Gradient Boosting Machine models (GBM) (Malohlava and Candel, 2020) in H2O Auto ML (H2O.ai, 2021;LeDell and Poirier, 2020) and Python (H2O.ai,2020; Stetsenko, 2020) is applied.The concentration of modelled pollutant can be predicted using GBM.The impact of sources of pollution on concentration can be observed if the predictions of GBM do not match the measured concentration.The impact of the fire can be identified if the GBM prediction is lower than actual and there is a spatiotemporal coincidence with the fire occurrence.

Selection of the Location
The current standard of fire inventory in Poland was fully introduced in early 2010's.Between the 1 st April and 30 th October 2021 database has 1,353,475 fire records, which will be analyzed in the work.The entry contains, among others, information about the size of fire, duration, and location.Prior to choice of modelling location, this database was analyzed using geographic information system (GIS) software QGIS (QGIS, 2021) together with the database of 410 air quality stations (GIOŚ, 2021).These stations are classified, according to (EC, 2008), into four types: urban, suburban, rural and rural background.The representativeness of each type of station are not strictly defined and are few km 2 , some tens of km 2 , some hundreds of km 2 , 1,000 to 10,000 km 2 respectively.Thus, for every station we created circular buffers, assuming radius (i.e., distance from station) 1.5 km, 5 km, 17 km, and 50 km respectively.In these buffers we analyzed total number and temporal distribution of fires for each station.Since the air pollution model using machine learning has to be developed at the air quality station, where the impact of some fires was investigated, we were examining the dispersion of the PM10 from all big and very big waste fires in 2018 (Bihałowicz et al., 2021c) simulated using HYSPLIT (Stein et al., 2015) and.These fires are about 15% of all fires in Poland (KG PSP, 2020) but if we consider only the biggest fires in Poland, this share increases to more than 23%.Thus, the sample of landfill fires in Poland is representative of all fires and the dispersion from these fires can be used for selection air quality station.The map of the polygons with the increase of concentrations of PM10 was included in GIS analysis.The aim was to find the stations which were exposed for a 1-h average increase in concentration by > 100 µg m -3 or > 10 µg m -3 .We found out that concentrations only on 5 stations were in the range of impact of concentrations > 100 µg m -3 , i.e., station PL0529A Sosnowiec located at Lubelska Street, PL0567A Katowice, Plebiscytowa/A4, PL0273A Skawina os.Ogrody, PL0184A Częstochowa Baczyńskiego Steet, PL0008A Katowice, Kossutha, and PL0552A Trzebinia, os.Związku Walki Młodych.The location of the stations with the map of dispersion of PM10 from landfill fires is provided in Supplementary Fig. S1.All of these stations are measuring air pollutants presented in Supplementary Table S2.One could expect that PL0567A is the best station for evaluation of landfill fire impact on air quality since according to Bihałowicz et al. (2021c) two fires induced concentration increase by more than 100 µg m -3 and four fires by more than 10 µg m -3 .Nonetheless, the station is located 10 m from the lanes of the European route E40, which at that point is a motorway, used for international transit and local transport in Silesia agglomeration, hence, it was not considered since the high impact of transport emission (GIOŚ, 2021).The remaining three stations are also urban stations (GIOŚ, 2021) according to the European Commission Directive on ambient air quality (EC, 2008), which are representative for a few km 2 .Focusing on stations where the increase of concentration caused by landfill fires (sample representing around a quarter of all fires in 2018) was higher than 10 µg m -3 we found out that there were no stations affected five times or more, 9 stations were in the range of impact four times each, 20 stations three times, 61 stations two times, and 87 stations once.Among 9 stations impacted four times, all the stations were urban except the PL0573A Borówiec, Drapałka Street, which is a rural station, representative for some hundreds of km 2 (EC, 2008;GIOŚ, 2021).Apart from the concentrations of NOx, NO2, PM10, SO2 (further in work called basic air pollutants), there are measured at the station also additional pollutants, i.e., benzene, toluene, ethylbenzene, m,p-xylene, and o-xylene (BTEX).The basic pollutants are emitted in many situations like individual heating, transport, industrial production while emission of BTEX is not so common, and as possible sources of BTEX in air fires of waste are mentioned (U.S. EPA, 1998EPA, , 1993aEPA, , 1993b)).Therefore, as the location of test the air quality station in Borówiec was chosen.

Data
The data measured at Borówiec station are available in the database of the Chief Inspectorate for Environmental Protection in Poland (GIOS) (GIOŚ, 2021).The BTEX (except ethylbenzene and m,p-xylene), are measured at Borówiec since 1 st January 2016; for this reason in the work five years, from 1 st Jan. 2016 to 31 st Dec. 2020, are analyzed.The air pollutants concentrations were measured with a 1-h averaging period hence for the purpose of the study, daily average concentrations were calculated.The daily averages used in this work were calculated as an average of values provided by GIOŚ only if the daily time coverage was > 75%, otherwise, the value was left as NA (not available).The summary of parameters measured at Borówiec station with annual time coverages data is presented in Table 1.
The meteorological data in Poland are gathered and analyzed by the Institute of Meteorology and Water Management -National Research Institute (IMGW-PIB) (IMGW-PIB, 2021).The system meteorological observations are based on 3 types of stations: synoptic, climate and precipitation.The highest number of parameters is measured on the synoptic station.The closest to Borówiec synoptic station is Poznań synoptic station, located 20 km northwest.In the work 37 daily parameters observed between 1 st Jan. 2016 and 31 st Dec. 2020 at Poznań station were used, whose descriptions and units are provided in Supplementary Table S1.The data about the fires in Poland were taken from the National Headquarters of the State Fire Service of Poland which provides evidence of all incidents (SWD-EWID) in which State Fire Service units take part (KG PSP, 2019).A more detailed description of data structure and collected parameters is provided in (Bihałowicz et al., 2021a).The database, as in case of air quality, was also limited to period from 1 st Jan. 2016 to 31 st Dec. 2020.Moreover, only data about fires bigger than 71 m 2 or 351 m 3 were used in the work (according to SWD-EWID nomenclature, small fires were excluded).As a part of the validation of the models, the quantitative data about the spatial dispersion of PM10 emitted in the biggest fires of waste reported in SWD-EWID in 2018 were used (Bihałowicz et al., 2021c).

Machine Learning
The air pollution assessment can be treated as a classification problem.The concentrations, target variables in Table 1, are continuous, hence data points were divided into three classes.Since the concentrations are site-specific, we decided to divide days according to the local distribution of concentrations.We used boxplot-like division, i.e., the day was assigned to Class A if concentration on this day smaller than the first quartile of the whole range of data, cA < Q1.Similarly, Class B assignment was made if the concentration was between first and third quartile, Q1 ≤ cB < Q3, and Class C was made if concentration was higher than the third quartile cC ≥ Q3.This procedure was repeated for each target variable separately (with the separate determination of quartiles for each pollutant).If the concentration of the pollutant was not measured on a given day, the concentration class is left as NA.The ML models are created for each target variable separately, i.e., the classification model of NOx concentration has only 37 meteorological parameters as inputs, none of the air quality variables are input variables.
The choice of ML package was preceded with tests and evaluating of possibilities of modeling of air quality using Orange (Bihałowicz, 2022;Demšar et al., 2013), scikit-learn (Feurer et al., 2020;Pedregosa et al., 2011), MLJAR (Płońska and Płoński, 2021).As the ML suite, an H2O software package (H2O.ai,2021) was chosen.The H2O provides the automated machine learning models building (AutoML) which was used in this work.Due to the results of the test with all the abovementioned software, the gradient boosting model for multinomial classification was chosen.The tree based gradient boosting was used in this work (H2O.ai,2022).The principle of gradient boosting algorithm is to improve classification tree by creating new tress which classify better at the point where the initial was not performing well (Hastie et al., 2009).The metrics assessing the quality of the models was the logarithm of loss function.In case of all models the boosting had learning parameter (shrinkage parameter) 0.1.To determine best performing model parameters, trees of depth from 3 to 17 were analyzed.The boosting algorithm creates new trees basing on the previously created trees, the number of trees was limited to 10000 but the new trees were not created in the boosting algorithm if the improvement of the next tree was smaller than reciprocal of square root of number of non-empty values.Models created for each target variable used multinomial distributions.The models in this work were validated using 5-fold cross-validation.Gradient boosting models are successful in modeling atmosphere-related phenomena (Cheng et al., 2021;Liu et al., 2020Liu et al., , 2021)).

General Description of the Results
The data about the meteorological conditions were used to build ML classification models of the concentration of air pollutants.The AutoML was looking for the best model according to the logarithmic loss value (logloss) (H2O.ai,2021).The higher value of the loss is, and hence its logarithm, the worse is model.The procedure was conducted separately for all investigated pollutants.All models share the same variables-37 meteorological parameters.The values of parameters indicating the performance of best-logloss models are provided in Table 2.The model with the lowest mean-per-class error was PM10, close to 0.25, the logloss and RMSE was smallest for the classification of C6H6.The model with poorest values of all three parameters was the classification of concentration of o-xylene.
In Fig. 1 the heatmap of the relative importance of variables, normalized to the most important variable per each model is provided.The number of variables is limited so that the four most   S1.
important variables are shown for each model.The heatmap including all variables is provided as Supplementary Fig. S1.Since the concentration of pollutants is related to wind, and more general, plume dispersion, one could anticipate that the highest importance has FWS (average wind speed).In fact, for the NO2, NOx, C7H8, C8H10, and m,p-C8H10 the most important variable was FWS (Clarkson et al., 1996;Grundström et al., 2015;Kelessis et al., 2006;Kourtidis et al., 2002;Zhang et al., 2018).The FWS was also important variable for the o-C8H10, however, not so important as SUN.In the case of PM10 most important was MIST, and for C6H6 TAVG.All these modes have few most important variables while for SO2 many variables are relatively important, the most important is the temperature at ground level TSOI.The models of the C8H10, m,p-C8H10, and o-C8H10 have similar most important variables while PM10, SO2, and C6H6 are different from both BTEX and NO models.

Basic Air Pollutants
The best performing GBM models of classification of concentration were best in the classification of PM10, then NO2 and NOx, and significantly worse SO2.The performance of these models can be is expressed by the confusion matrix in Table 3.The concentrations are in order A < B < C, as it was described in Section 2.3.The error rate of the PM10 model is only 11/1769 = 0.63% while for the remaining models it is of the order of a few percent with the highest rate for the NOx model 141/1808 = 7.8%.The precision of the model in each class is the number of points correctly assigned over the total number of points assigned to this class.The recall of the model in each class is one minus error rate.
In all models of NO2, NOx, PM10 similar set of variables is in the four most important for each model.For NO2 they are FWS, TSOI, MIST, and RH, for NOx FWS, MIST, TSOI and CLCV, for PM10 MIST, FWS, TAVG and RH while for the model of SO2 they are TSOI, TAVG, IZL and WVP.The partial dependence plots of models' classes on these variables are provided in Supplementary material.In the case of FWS (for NO2 Supplementary Fig. S3(a), for NOx Supplementary Fig. S4(a)) the highest mean response in the class of highest concentrations C is observed for FWS < 2 m s -1 , i.e., days without wind, the highest response in Class B (intermediate concentrations) is observed for FWS around 4 m s -1 while the Class A (lowest concentrations) has highest mean responses for FWS > 6 m s -1 .The same pattern of dependence was observed for the PM10 model Supplementary Fig. S4(b), however, the slopes of mean response plots are lower and interval endpoints are not sharp.The reported dependences of NOx and PM10 concentrations on wind speed also provide similar conclusions (Grundström et al., 2015;Jones et al., 2010).The second variable is TSOI (Supplementary Fig. S3(b) for NO2, Supplementary Fig. S4(c) for NOx, Supplementary Fig. S6(a) for SO2).The dependence of SO2 model has slightly different character, since intervals with for temperatures between -5°C and 15°C (SO2: < 15°C) and for class of high concentrations (C) for TSOI < -5°C (SO2 < 5°C).This can be justified by the increase of demand on individual heating during the cold days.The residential buildings are located on the ground hence the most crucial for the thermal comfort of people living there is not the average air temperature TAVG, but the lowest ground temperature TSOI.The TAVG is not representative for individual heating since the daily temperature can affect the average, while the TSOI is the daily lowest reading of temperature at 5 cm above ground.The data about the fuel consumption for individual heating could be evaluated basing on electricity household consumption and natural gas consumption.
The electricity consumption is affected by the numerous of electric devices which are also powered from this source, while the natural gas household consumption is only affected by gas cookers whose consumption is much lower than consumption of boilers (Aras and Aras, 2004;Błaszczak et al., 2016;Jedlikowski and Englart, 2018;Majewski et al., 2018).There is a negative correlation between ambient temperature and natural gas consumption (Hribar et al., 2019;Sabo et al., 2011;Szoplik, 2015) and the similar dependence of consumption of temperature is observed for solid fuels as coal (Li et al., 2016).Thus, the heating induced emissions to the atmosphere are also negatively correlated with temperature.The dependence on time of mist is not so clear for NO2 and NOx (Supplementary Fig. S3(c) and Fig. S4(b), respectively), the highest responses for Classes A and B are for MIST < 7.5 h, while the Class C has highest mean response for MIST > 7.5 h.It can be interpreted that for longer times of mist the Class C has higher responses than for shorter, while both Classes, A and B, are sensitive for shorter times of mist.In case of PM10 the variability of mean response is much higher, Supplementary Fig. S5(a), as it is the most important variable for best performing model of basic air pollutants.As MIST → 24 h the mean response for Class C tends to 0.8, while mean response for Classes B and A tends to their maximum as MIST → 0 h, however, at MIST < 10 h for Class B, and MIST < 5 h for Class A.
This dependence shows that meteorological phenomena classified at station as mist is in fact drop in the visibility (Li et al., 2019;Tsai et al., 2007) which is proven to be dependent on the concentration of PM10 (Huang et al., 2016;Majewski et al., 2021Majewski et al., , 2015)).The partial dependence of models on RH, CLCV shows that there are very low changes in mean response curve for each class.The curves are rather constant, without slopes and extrema, therefore impact of related variables cannot be explained in the same way as it was in case of MIST.
To evaluate the impact of fires on the measurement on station Borówiec we compared the actual and predicted by model class of concentration at the station during the fire reported in SWD-EWID and plotted it against distance from the fire.As the resulting plot of the share of under-and overestimated concentrations on days with fires as a function of distance from the fire.The distance between fire and Borówiec was calculated using QGIS.The plot is shown in Fig. 2. The share of underestimated and overestimated reaches the model performance at the distance around 25 km for NO2, 18 km for NOx, 70 km for PM10, and 40 km for SO2.For the range where the station can be representative models for all pollutants except PM10 under-and overestimates the concentration.It may suggest that the concentration of NO2, NOx, SO2 at this site is affected also by other factors which influence the concentrations stronger than the emission from fires.Such sources can be emissions from transport and individual heating.Both, NO and NO2, are produced during the combustion, and one of the primary sources of these oxides is the combustion in car engines (Klejnowski et al., 2013;Rogula-Kozłowska et al., 2019b).There were no cases of overestimation of PM10 concentration on days with fires in the neighborhood of air quality station.It may suggest that the model of concentration class of PM10, which is according to its training confusion matrix the best, properly found the influence of the fire-underpredicted concentration.Hence, the aim is to find are there any different air pollutants that can be treated.

BTEX
The GBM model of concentration of benzene at the air quality station in Borówiec is wellperforming.The confusion matrix shown in Table 4 top left indicates that model error rate of the model is only 8/1422 = 0.56%.The precision and recall of the model are also provided in Table 4.The classification of concentration mostly depends on four variables TAVG, MIST, TSOI and FWS.The main difference of benzene model to other models is the fact that the most important variable is TAVG.The dependence on value of TAVG for particular classes is not as clear as dependence of TSOI, which exhibits the same pattern related to individual heating as for basic air pollutants.The mean response for class of highest concentrations C is highest for low TAVG TAVG while the pattern of dependence for class of intermediate concentrations B and low concentration A is varying at high TAVG values and the intervals where mean response is highest.In general, the partial dependence (Supplementary Fig. S7) on MIST, TSOI and FWS presents similar patterns for Classes A, B, C as for basic air pollutants, only the values and intervals where the mean response is highest are slightly shifted.The point which makes model benzene different from models of NOx, NO2, and SO2 is the dependence on fire occurrence.The share of under-and overpredicted concentrations as a distance from fire occurrence is shown in Fig. 3(a).There are no overpredicted concentrations by the model for days with fires, which appeared even 60 km from the air quality station.There are cases of underprediction of concentration, which are much often than model performance underestimate rate.It may be evidence that fires that are in the representative area of air quality stations have an impact on the concentration of benzene.The analysis of all fires in the range where station representativeness is decreasing shows that up to 160 km from the station is 22 times more probable that in the day with fire the concentration is underestimated than overestimated Supplementary Fig. S12.The confusion matrix suggests that this ratio should be more than 3 times lower.We can conclude that fires cause errors in modeling concentrations of benzene at Borówiec air quality station.It can be the reflection of possible sources of benzene in the atmosphere (Rogula-Kozłowska et al., 2020;U.S. EPA, 1998) are fires of tire shreds, biomass burning, agricultural film fires, and forest wildfires.
The toluene model confusion matrix (Table 4,top right) shows that this model is less precise, especially in classification of Class B than model of benzene.Similarly, as in other models the toluene model's four most important parameters include FWS, MIST, TSOI and this time PATM (Supplementary Fig. S8).The patterns of FWS and MIST are similar to already discussed while pattern of TSOI is different than in all previously discussed cases since although the general pattern is roughly similar, the variability of the mean response is much lower and allow only for rough identification of intervals with the highest mean response for each class.The atmospheric pressure is a parameter related to other meteorological phenomena, like the passing of weather fronts, change of temperature, and relative humidity.The partial dependence on PATM shows complex relation shown in Supplementary Fig. S8.
One of the potential sources of emission toluene in the atmosphere is firefighting at solid waste disposal sites (Łukawski, 2019;Rogula-Kozłowska et al., 2020;U.S. EPA, 1993a) hence we expected that also fires, in general, affect the concentration of toluene.The share of under-and overestimated concentrations of toluene are shown in Fig. 3(b).The model underestimates the concentration if the fire appears in range to approx.16-18 km from the air quality station.Additionally, the overestimation does not take place if fires are up to 12 km.The performance of the model suggests that the number of overestimations should exceed the number of underestimation while in the range of almost 30 km it is the opposite.The analysis of fires up to 160 km from Borówiec shows that for days with fires up to 130 km the ratio of underestimated to overestimated is greater than one, especially up to 60 km when it is significantly greater than one (Supplementary Fig. S13).
The performance of the ethylbenzene model was slightly worse than benzene and toluene according to Table 1, however, the values presented in Table 4 bottom, the confusion matrix, precision, and recall show that performance is good.The model's confusion matrix shows that it tends to underestimate since the sum above the diagonal of the confusion matrix is lower than below the diagonal.The four most important variables for this model are FWS, SUN, WVP, and TMAX (Supplementary Fig. S9).This is the first discussed model where the dependence on the time of the sun plays important role in the classification of concentration.The ethylbenzene is photodegrading in the air (Banton, 2014) which explains dependence on SUN.The SUN also is one of the most important variables for models of m,p-xylene and o-xylene.Xylene is also subject of photolysis into less toxic compounds (Kandyala et al., 2010).The model of ethylbenzene is underpredicting the concentration for fires up to 12 km from Borówiec air quality station Fig. 3(b).Among sources of ethylbenzene in atmospheric air are incomplete combustion and burning of waste (Banton, 2014).Hence, it is justified that changes in air quality induced by fire are the errors in the prediction of ethylbenzene concentration class.
The models of xylenes concentration were least performing among models of BTEX.Moreover, there were no cases of underestimation of concentration for days with fires in the range of representativeness of Borówiec station.The possible explanation is that set of meteorological variables is not an appropriate set for predicting the concentration of xylene and other factors, which are not included in the models are governing the concentration of xylene.

Concentration during Landfill Fires -Validation
We validated the predictions of the models of PM10, C6H6, C7H8, and C8H10 based on the calculated dispersion of PM10 from landfill fires in Poland in 2018 (Bihałowicz et al., 2021c).The Borówiec air quality station was in the range of impact of plume increasing 1-h average concentration of PM10 by > 10 µg m -3 four times.It was on the 24 to 28 April, 25 to 31 May, 27 to 30 May, and 16 to 18 December 2018.The regions, where 1-h average concentration increased by more than 10 µg m -3 and the location of the Borówiec station are shown in Fig. 4. In the validation we wanted to include PM10, benzene, toluene, and ethylbenzene, nevertheless, during the landfill fire in Zgierz on 25-31 May, the Central Contamination Analysis Center of Polish Armed Forces found in benzene, toluene, styrene, methylstyrene, sulfur dioxide, ethylbenzene, and o-xylene (Cygańczuk et al., 2020).For this reason, we included in the analysis also the validation of SO2 and xylene concentration.The rates of the number of days with underpredicted concentrations in days when plume from the given fire was at the Borówiec are provided in Fig. 4. The fire on 24-28 April was a fire of general waste on landfill.The plume with increase of 1-h average concentration of PM10 from this fire was 5 days at the Borówiec station (Bihałowicz et al., 2021a(Bihałowicz et al., , 2021c)).The concentrations Fig. 4. The location of Borówiec air quality station and plumes of four landfill fires which caused increase in 1-h average concentration of PM10 by more than 10 µg m -3 with the underprediction rate on days when plumes from landfill fires were at the Borówiec air quality station.class of PM10 and SO2 were underestimated.The fire on 25-31 May induced underestimation of PM10, C6H6, C8H10, m,p-and o-C8H10, however, the concentration class of PM10 was once during this fire overestimated.The third fire was on 27-30 May, the subset of previous days, thus the same pollutants' misclassification was observed.The last landfill fire which impacted Borówiec was in December when the concentrations of pollutants are high due to individual heating demand (Błaszczak et al., 2016;Rogula-Kozłowska et al., 2014, 2013a, 2013b).It can be a reason that only concentrations of C7H8 and m,p-, and o-C8H10 were only once underestimated, moreover, the concentration of m,p-C8H10 was once overestimated.It is important that the concentrations of xylenes are underpredicted during every landfill fire.It is plausible that landfill fires are characterized by the underestimation of both m,p-xylene and o-xylene concentration.All the rates presented in Fig. 4 significantly exceed the underestimation rates which would be caused by the performance of the models.The landfill fires are the reason for underestimation of concentration class on these days.

CONCLUSIONS
The impact of fires on air quality can be evaluated with machine learning models.The GBM classification models based on 37 meteorological variables works well for NO2, NOx, PM10, and BTEX.The introduction of more variables than in previous studies of ML modeling of air quality and their appearance in four most important variables shows that use of synoptic station data is important for air quality modeling.The models of air quality, although the precision of models was 0.89 for worst model, and in general was around 0.95, reaching even 1.0, have some misclassification.The analysis of the distance and time of underestimation with fire incident data (spatiotemporal coincidence) showed that fires in the range of representativeness of air quality stations cause underestimations.For model of PM10 and for model of C6H6 there were no overestimations of concentration class for fires in range up to 30 km from air quality station, while for C7H8 and C8H10, models were underestimating for fires up to 12 km from station.The landfill fires, which constituted a quarter of very big fires in Poland in 2018, emitted plumes which could be measured four times at Borówiec air quality stations.The concentrations of PM10 and SO2 were underestimated for more than 60% of time when plume from first validated landfill fire could be measured.In case of next three analyzed landfill fires PM10 and C6H6 were underestimated for non-winter landfill fires, and remaining BTEX were underestimated at least one during these fires.Since it was shown that commonly measured air pollutants are also object of significant local impact, the BTEX, especially benzene, seems to be a marker of impact of fire on air quality.This work provides an example which can be adapted to different rural background stations.The approach has potential to be applied in an identification of fires impact on air quality stations worldwide, after tuning to the local meteorological conditions.The adaptation needs the integration with database of fires and validation set for which dispersion was evaluated.

Fig. 1 .
Fig. 1.The heatmap of variable importance in best performing GBM models of concentration classification.The values are normalized for each model (row) and variables are in columns.The names of variables are FWS -Average wind speed, MIST -mist duration, TSOI -minimal ground temperature, SUN -sun duration, TAVG -average temperature, PATM -atmospheric pressure, RH -relative humidity, WVP -water vapor pressure, TMAX -maximum temperature, TMINminimum temperature, CLCV -cloud cover, RAIN -precipitation, IZL -isotherm low, according to Supplementary TableS1.

Fig. 2 .
Fig. 2. The share of under-and overestimated concentrations at days with fire as a function of distance from the Borówiec air quality station, top left (a) NO2, top right (b) NOx, bottom left (c) PM10, bottom right (d) SO2.

Fig. 3 .
Fig. 3.The share of under-and overestimated concentrations at days with fire as a function of distance from the Borówiec air quality station, top left (a) benzene, top (b) toluene, middle (c) ethylbenzene, bottom.

Table 1 .
The air quality parameters measured at Borówiec station which were used in work with annual time coverage of data.

Table 2 .
The mean per class error, logarithmic loss value (logloss), and root mean square error (RSME) for the best performing models of classification of pollutant concentration.

Table 3 .
Confusion matrix of models with a classification error rate, recall, and precision for each class, of NO2 model, top left, of NOx model, top right, of PM10 model, bottom left, of SO2 model, bottom right.
highest responses for different classes are overlapping what can be caused by high relative importance of few first variables.The highest mean response for low concentrations (A) can be observed for highest temperatures, over 15°C (for SO2 5°C), for intermediate concentrations(B)

Table 4 .
Confusion matrix of the model of C6H6 (top), C7H8 (middle), C8H10 (bottom) with classification error rate, recall, and precision for each class.