Analysis of the Effect of the Truck Strike and COVID-19 on the concentration of NO x and O 3 in the Metropolitan Region of the Vale do Paraiba, São Paulo, Brazil

The daily diurnal data pattern of nitrogen oxides (NO, NO 2 ) and ozone (O 3 ), temperature, relative humidity, pressure, wind direction and speed and solar radiation were studied from 2017 to 2020 within a period of 21 days in two towns in Paraiba Valley: São José dos Campos (SJC) and Guaratinguetá (GRT). In 2018, there was a truckers' strike in Brazil and in 2020 a partial lockdown was imposed in response to the coronavirus pandemic; in this study, Machine Learning techniques and a multivariate statistical analysis were conducted to compare these different periods. During both 2018 and 2020, there was a reduction in the NO and NO 2 concentrations, (particularly NO), which is a primary pollutant during peak hours of vehicular traffic; this was notably the case in 2018 owing to the truckers´ strike. Through an application of the Tukey test, a comparison was made between the NO, NO 2 and O 3 data which showed that there was a similarity in each element of the dataset on a decreasing scale, however they continue to be statistically significant. Regarding the Principal Component Analysis (PCA), this procedure identified the first major component for both towns in the entire study period and explained around 42% of the data and the proper interconnections between the data, with a strong positive influence of O 3 concentrations, temperature (T), wind speed (WS) and solar radiation (SR). In addition, when analyzing data by means of the Boruta algorithm, there was a considerable difference in the variables that influence O 3 concentrations, with GRT showing NO 2 and relative humidity, while SJC, NO 2 and global solar radiation were the most important variables for feature selection.


INTRODUCTION
Brazil faces a number of regional environmental problems, including the concentration of industrial plants, high traffic density, wildfires and densely populated areas (Gonçalves et al., 2017;Souza et al., 2017;Souza et al., 2014;Souza et al., 2015).The indices for these factors in the South-East region, in particular the State of São Paulo, are the highest in the country (IBGE, 2019).The towns of São José dos Campos (SJC) and Guaratinguetá (GRT) located in the metropolitan region of Paraiba Valley (MRPV) were chosen for a case study.
According to the U.S. EPA, the presence of NOx in the atmosphere is the main indicator of anthropogenic sources of air pollution, such as motor vehicles and energy generators (U.S. EPA, 2018).On a mass basis, the amount of pollutants emitted per year caused by mobile sources is 59.0 and 76.1% for GRT and SJC, respectively (CETESB, 2018).Tropospheric O3, on the other hand, is a secondary toxic pollutant, that is harmful to human and plant health and caused by the greenhouse effect.It is formed as a result of photochemical reactions between NOx and volatile organic compounds (VOCs), in the presence of sunlight (Chiquetto et al., 2021;Seinfeld and Pandis, 2016).
Previous studies on changes in air quality and vehicular activity have reached different findings regarding their effects on pollutant concentrations.Decrease in pollutants concentrations like NO, CO, BC, PM and O3 was observed during the decrease in vehicle traffic in Spain, Italy, Israel and India (Sharma et al., 2020;Basagaña et al., 2018;Meinardi et al., 2008;Levy, 2013).Increases in O3 concentration were found during the COVID-19 lockdown periods in Brazil, Spain, Chile, Indonesia and India (Mahato et al., 2020;Nakada and Urban, 2020;Siciliano et al., 2020;Tobías et al., 2020;Morales-Solís et al., 2021;Rendana, 2021).Recent study by Naqvi et al. (2021) showed a moderate positive correlation between cases and mortality by COVID-19 and NO2 concentrations, while O3 showed a weak to moderate positive correlation.This shows that populations living in adverse conditions are more susceptible to infections and mortality due to COVID-19 (Coccia, 2021).Thus, emission patterns and concentrations of pollutants are complex, and further studies are needed to understand these relationships.
Atmospheric chemistry is a complex scientific area and the relationship between the variables involved is highly non-linear.Machine Learning and Multivariate statistical techniques have been employed to discover potential hidden relationships in big data sets, as paired correlations and time series graphs do not provide visual or behavioral patterns.New spatio-temporal knowledge about the environment obtained from complex relationships can be used in the workflow of the process of forecasting and monitoring atmospheric pollutants.They can also be used as input data for other processes and models in order to improve the performance of the accuracy of the results, such as decreasing the overall computational cost by enabling streaming analyzes (Schultz et al., 2021;Amuthadevi et al., 2021).Machine learning models have been used to understand the chemical composition and spatiotemporal characteristics of the variation of air pollutants (Xiao et al., 2020;Oduber et al., 2021;Govender and Sivakumar, 2020;Binaku and Schmeling, 2017;Khatri and Hayasaka, 2021).Enthusiasts about artificial intelligence and machine learning have shown a rejuvenating interest, raising discussions about relevant applications to solve specific problems of data analysis, numerical modeling and post-processing, mainly due to the statistical and probabilistic approach that these methodologies provide (Schultz et al., 2021).
From May 21 to June 1, 2018, there was a national truck drivers' strike in Brazil, caused by demands by the SCC (truck drivers' union) for a reduction in fuel prices and improvements in working conditions for the sector.This industrial action paralyzed almost the entire circulation of heavy vehicles throughout Brazil.The year 2020 was characterized by the implementation of social distancing measures in response to the COVID-19 pandemic.In the case of the State of São Paulo, place where this study was carried out, the suspension of public services in commercial establishments began on March 16 (the adoption of these measures naturally led to a similar reduction in road traffic in 2018), and only began to be eased at the beginning of June.
Our aim in this study is to focus on air quality by adopting a multivariate statistical approach as a means of analyzing meteorological and gaseous pollutant data between May and June, in each year from 2017 to 2020, with 2018 and 2020 being years with the reduction in vehicular traffic, due to the truck drivers' strike and the partial lockdown due to the pandemic caused by the COVID-19 virus, respectively.

Description of the Sites
The MRPV, where this study was carried out, was based in an important industrial and technological zone, which is a key factor in the growth of the economy in Brazil (contributing approximately 5% of the Brazilian GNP in 2016).It is located on the axis between the two largest cities of Brazil, São Paulo and Rio de Janeiro, and is thus densely populated.
Hourly averages of NO, NO2 and O3, T, SR, RH, WS and WD were obtained from May 21 to June 1 in the years 2017, 2018, 2019 and 2020, making a total of 48 days with 29,184 hourly averages, for the two towns under study (SJC and GRT).
The SJC station (23k 408858 7431443 UTM) is located in an "urban district" with an extensive commercial area, although most of it is still residential.The station is located 1000 m from the Presidente Dutra Highway, where there is a with low impact of primary pollutants, such as the GRT station that is located within the State University of São Paulo, which comprises a total area of 175 thousand m 2 , with 14 thousand m 2 being built up and the rest a green area (23k 480385 7478395 UTM) (Fig. 1).

Description of Local Meteorological Conditions
According to the Technical Bulletin of Synoptic Analysis of the National Institute for Space Research (INPE -http://tempo.cptec.inpe.br/boletimtecnico/pt),South America was under the influence of the High of South Pacific Subtropical (HSPS), the High of South Atlantic Subtropical High (HSAS) and the Intertropical Convergence (ITCZ) during the study period.The Subtropical Jet (JST) was also present in parts of the study period, with the exception of the year 2017, which had no observation.The JST favored the passage of low convection cloud bands over part of Brazil, including the study region.The years 2018, 2019 and 2020 were also marked by the presence of anticyclonic circulations close to the study region that hinder the formation of clouds and contribute to the low RH (average 3% lower than 2017), as can be seen in Table 1.

Dataset Treatment
The first step was to confirm that there was no rainfall in the dataset, also confirmed by the local synotic situation of the periods.The daytime (approximately 6 AM to 6 PM) and nighttime (approximately 6 PM to 6 AM) data were handled in the same way.
After the stage of filtering and validating the collected data has been completed, the analysis of the descriptive statistics of the data begins, with a record of the minimum, maximum, averages, medians and quartiles, as well as the amount of missing data.
After providing an initial overview by means of descriptive statistics, the equation for the Pearson correlation coefficient (Friendly, 2002) was calculated, to determine the correlation between all the variables (Asuero et al., 2006).The data are displayed in the form of correlation matrices, so that all the data can be used for the study, without removing outliers, to ensure the extreme values can be determined precisely.
The Tukey HSD test is a useful tool for comparing datasets observations.The Tukey HSD test is used to test the significance of differences between sample means.All pairwise differences are tested while the probability of making one or more Type I errors (also called false positives) is controlled (Härdle and Simar, 2015;Montgomery and George, 2002).It was applied between the two towns for the 3 pollutants and showed the statistically significant differences between the datasets for each of the years, meaning that the differences between the GRT and SJC data are not random, so the observations have non-similar characteristics of their own.
The Principal Component Analysis (PCA) simplifies the complexity of the dataset variables with high dimensionality by reducing dataset, projecting them geometrically in lower dimensions called principal components (PCs) valuing the highest intra-group variance.It is also able to briefly explain the results and relationships between the variables over the years (Lever et al., 2017).SJC and GRT can be described differently from the result of their PCs.
Finally, the Boruta algorithm was used, which is a classification method derived from the Random Forest methodology.Its objective is to reflect the importance of the variables in the dataset in terms of the value of the target variable, which in this study was O3, to perform a selection of features that can be used by a prediction algorithm, for example (Kursa and Rudnicki, 2010).Initially, the method duplicates the dataset and shuffles the variables in each column, in a paired way.The method then calculates a series of polynomial regressions and increases/decreases the value of the "small value" variables and accumulates the variation in the O3 concentration and is thus able to rank the importance of all the variables in the O3 formation.

RESULTS AND DISCUSSION
Descriptive statistics for SJC and GRT with values from the lower and upper outliers, as well as from the first and third quartiles and from the median and mean of NO, NO2 and O3, can be analyzed from different perspectives, as shown in Fig. 2. Nitrogen oxides are characterized by more pronounced median values around lower values for the years 2018 and 2020 for GRT and SJC, respectively, which in addition is considered as a chemical response the decrease in emissions evidenced by both periods, due to the truck drivers' strike and the partial lockdown due to the pandemic caused by the COVID-19 virus, respectively.
The years 2017 and 2019, on the other hand, present a more unequal distribution of values, that is, a greater distribution of the variance around mean.NO has a similar distribution for the years 2017 and 2019 in the violin chart for SJC and GRT and similar characteristics for the years 2018 (the truck drivers' strike) and 2020 (the partial blockade).
NO2 values vary for GRT and SJC.In the case of the GRT, the distribution is approximately similar for each of the four years studied; they all have a median close to 10 µg m -3 , but with well Fig. 2. Violin plot for O3, NO and NO2 for GRT and SJC.It is possible to verify the data distribution, a holistic view of outliers, averages and dispersion.All SJC distributions present a greater dispersion, and consequently reaching higher values.This may be due to the variation in the size of the city and, consequently, greater exposure to quantitative variation in values.distributed data.When compared to SJC, the distribution profile in general is more restricted, with a median around 20 µg m -3 for all years.However, there is a clear disparity in the year 2020 compared to other years due to a more pronounced volume of distribution around the mean.Indicative of the decrease in values above that, more pronounced than in 2018.This difference may have occurred because the truck drivers' strike showed a decrease in heavy vehicle traffic and later a gradual decrease in light vehicle traffic.In contrast to the lockdown that brought traffic to an abrupt halt.
However, when observing the values for O3, it is observed that this had a median between 20 and 40 µg m -3 for GRT and values between 25 and 50 µg m -3 for SJC.O3 also showed a more uniform distribution between the lower and upper outliers, with more accentuated bimodality for SJC.Not suffering such marked differences as those mentioned above.Below, the Tukey test is performed to verify if there is really a statistically significant difference between these data, with main attention to O3, which does not present a visual difference as marked as the other compounds.
The average hourly concentrations are shown in Fig. 3.In general, the behavior of NO and NO2 is remarkable since there are two pronounced peaks, located at times characterized by heavy traffic, between 7-10 h and 20-22 h.Between 11 am and 4 pm there is a decrease in NOx owing to the reduction in vehicular traffic and also because this is the time when there is the greatest consumption of these pollutants due to the formation of O3, which has higher concentrations between 12-16 h, with a maximum at 15 h as a result of its close relationship with solar radiation.Corroborating the behavior that can be observed in several studies previously mentioned in several cities around the world, each one with its specific variations.
Therefore, in Fig. 2 and Fig. 3 for 2018 (the truck drivers' strike) and 2020 (the partial blockade), there is a reduction in NO and NO2 concentrations, particularly for NO.This is a primary pollutant, during peak periods of heavy traffic when there is the highest emission of these pollutants, and which is more accentuated in 2018 because of the truckers' strike.An increase in O3 concentrations are also observed due to the decrease in concentrations of primary pollutants, which are the main ways of chemical consumption.Such a decrease in NO2 and increase in O3 was also observed in similar studies carried out during the period of COVID-19 lockdown (Dantas et al., 2021;Morales-Solís et al., 2021;Naqvi et al., 2021).In the study by Morales-Solís et al. (2021) in 16 cities in central and southern Chile comparing the period between March and May 2020 with the same corresponding months during 2017-2019, significant decreases were observed in 4 cities where NO2 data were available, between -27% and -5% for this pollutant; while significant increases in O3, between 1% and 4%, were found in 4 of the 5 cities.Local meteorological variables did not show significant changes between the two periods for this work.For Brazil, a decrease of around 1% in the optical density of atmospheric aerosols was also observed (Naqvi et al., 2021).
The Tukey test was conducted with a confidence interval of 95%, as shown in Fig. 4, so that the concentration of the key pollutants in the four years of study could be compared.In these graphs, the further away the horizontal line is, (which compares a couple of years), from the dotted vertical line, the greater the difference in the dataset.In the case of O3, the biggest differences can be observed for the two towns between the years 2017-2018, followed by the years 2018-2019 and 2017-2020, which suggests that the events in 2018 (the truck drivers' strike) and 2020 (the partial blockade -COVID19) affect statistically the O3 concentrations.In the case of NO, the differences between the two towns were not so sharp (i.e., the differences in the coordinate axis), and only the different pairs of years (2018-2019 and 2019-2020) can be considered, which are again two pairs of atypical years.In the case of NO2, the two towns also behave in a similar way with differences between the datasets between 2017-2020, 2018-2019 and 2019-2020, once again showing the significance that can be attached to the aforementioned events.An interesting observation, with regard to all the pollutants in the two towns, is that in the case of the years 2018 (the truck drivers' strike) and 2020 (the partial blockade -COVID19), (the years of the events under study), the comparison between the data shows that the dataset is similar, on a decreasing/increasing scale of similarity: NO, NO2 and O3.
By grouping the entire dataset, including pollutants and meteorological data, it was possible to create a Pearson's correlation matrix coupled with a hierarchical cluster analysis (dendrogram) based on Euclidean distances (Berthold and Höppner, 2016), as shown in Fig. 5.In the case of Fig. 4. Tukey plot 95% family-wise confidence level for SJC and GRT.When the value zero is present between two pairs, the difference between them is not significant, the years 2018 and 2020 have the highest number of pairs with significant values.
both towns, positive correlations can be observed between T, O3 and SR and between NO and NO2, which is also corroborated by the dendrograms at the right of the matrices.This confirms that the O3 formation results from photochemistry and there is little possibility that this pollutant has been transported over long distances.Ozone, on the other hand, shows negative correlations with RH, NO and NO2, and this once again corroborates the fact that the formation of ozone is the result of reactions of NOx with VOCs (not measured here) with sunlight.The wind direction has little influence on the formation of pollutants, which suggests there is little likelihood of NO being transported in SJC (correlation of 0.24).
We observed significant improvements in air quality considering reductions in monitored air pollutants in areas highly influenced by vehicular traffic (NO and NO2).Intense reductions in the concentration of air pollutants were found during 2018 (the truck drivers' strike) and 2020 (the partial blockade -COVID19).During these periods, vehicle traffic decreased considerably in all areas analyzed, improving air quality.
Since the data were analyzed from the standpoint of the complete dataset, as in the previous tests, in this stage a minimum dataset will be considered for the Principal Component Analysis (PCA).In this classification, the missing data in each line of observations were removed, and this applied to all the variables.Out of a total of 1824 observations and 11 variables from each town, 1426 and 1445 observations remained for SJC and GRT, respectively.PCA was applied based on the Kaiser Rule, 3 PC (Principal Component), 2 PC, 3 PC and 2 PC were retained for 2017, 2018, 2019 and 2020, respectively for SJC; for GRT, 3 PC were retained for all the years.All the PCs were responsible for about 70% of the accumulated variations of the original data.The main loading values of the components are shown in Tables S1 to S4 in which the eigenvalue and the cumulative  ,T and SR and between NO and NO2 and negative correlations between T and RH and between O3,NO and NO2.percentage of variance explained for each PC, are also shown.Only the loading values that were ± 0.500 or more, were interpreted.
The years 2018 and 2020, are characterized by a decrease in concentrations of 70% for NO in SJC and 58% in GRT, caused by the truckers´ strike in 2018 and the partial lockdown imposed by the coronavirus pandemic in 2020, when compared with the average for the years 2017 and 2019 when there was neither a strike nor a pandemic.There was also a decrease of 20% of NO2 in SJC and 22% in GRT and an increase of 24% of O3 in SJC and 2% in GRT, since the NO is a primary pollutant and NO2 either primary or secondary, but mainly secondary.In a study by Morales-Solís et al. (2021), a 55% increase in NOx was evidenced during the COVID-19 pandemic lockdown period for the central and southern urban regions of Chile.Likewise, an increase between 18 and 43% in O3 levels.Overall, the impacts of the strike on NO2 are more complex than on primary pollutants (CO and NO), as demonstrated by recent studies that investigated changes in pollutant emissions during the shutdowns caused by the COVID-19 pandemic (Kanniah et al., 2020;Muhammad et al., 2020;Nakada and Urban, 2020).Further studies must be carried out to understand this pollutant, by taking account of the meteorology and atmospheric chemistry.
Increases in O3 were also found in a study conducted in Rio de Janeiro during the 2018 truckers´ strike (Dantas et al., 2019).Similar situations were also encountered during the COVID-19 lockdown periods in Brazil, Spain and India, which altered vehicle emissions in a comparable way (Mahato et al., 2020;Nakada and Urban, 2020;Siciliano et al., 2020;Tobías et al., 2020).The processes for the formation and consumption of O3 are highly non-linear, since their concentrations are dependent on the availability of sunlight, the NOx/VOCs ratio and the speciation of VOCs (mixture of reactivity).Although we do not have access to VOCs data, clearly NOx emissions decreased, which certainly changed the NOx VOC ratio.Since MRPV is a NOx-saturated environment, the reduction in the O3 concentration will depend on the decrease in the VOCs concentration.However, a reduction in NOx concentrations leads to an increase in O3 concentrations, as was seen for the town of SJC during the strike and pandemic period, when there was a decrease in NOx and an increase in O3, as was also found by Morales-Solís et al. (2021) for the blockade period in Chile, for example.Thus, the decrease in NO, which reacts more quickly with O3, together with the increased availability of sunlight, may have played a decisive role in the increase in O3 observed in the events (Alvim et al., 2018).This was also studied to explain the high O3 levels during the weekends in Rio de Janeiro (Geraldino et al., 2020).
In general, based on the PC analyzes between the years and between both cities, it is possible to notice that all the first components have a strong positive influence of O3 concentrations, temperature (T), wind speed (WS) and solar radiation (SR).And negative variation for the concentrations of NO2, NO and relative humidity (RH) (Fig. 6).WS has already been associated with greater dispersions and consequently a lower rate of infections by COVID-19 given by a previous study carried out by Coccia (2021).That same study also demonstrated how the increase in the number of individuals infected by COVID-19 is directly related to high rates of pollution, which interact with viral agents.
The second component is marked by the positive influence of the concentrations of NO and NO2 and the negative influence of pressure.Except for the years 2017 and 2020, only for the city of GRT, which presented a positive/negative variation in wind speed (WS)/RH for 2017 and RH/WS for 2020.The third component is mainly marked by the influence of the wind direction (WD).
The first PC (PC1) for SJC (Fig. 6(a1) and Table S1) for 2017 explains 58.2% of the original data and mainly corresponds with O3 (0.897), T (0.839), RH (-0.824) and SR (0.715), as these variables have the highest loading values.The positive sign of the loading values suggests a similarity in the behavior, but with an inverse pattern with regard to humidity.As the main pathway for O3 production is NO2 photolysis, PC1 can be taken as a measure of the air processed during the middle of the day and in the afternoon.With regard to the hours when there are the highest concentrations of O3, this characteristic can be validated through the values of negative loading of NO2 and NO (-0.452 and -0.584, respectively).From 12:00 to 16:00, the local temperatures increase (great positive weight), which leads to the formation of O3 that occurs soon after the emission of primary pollutants (VOCs, NOx) and also the availability of sunlight, because at this time too there is a reduction in NOx concentrations owing to the formation of O3.The loading values for PC2 suggest there is a relationship between NO (0.583) and NO2 (0.652), which is described in their behavior while O3 (0.067), tends to zero, and shows no influence.PC2 can thus be regarded as the description of the nighttime behavior of the atmosphere.It is also the time when O3 tends to have the lowest values because of the absence of sunlight; this prevents the photolysis process from taking place, and the atmospheric boundary layer is smaller and more stable as well as having a greater pressure owing to lower temperatures.At night, O3 is no longer formed as there is a lack of sunlight; this the time when O3 is consumed by NO2 and forms NO3, which will form N2O5 (Abdul-Wahab et al., 2005).The other components can be analyzed in a similar way and a more detailed description can be found in the Supplementary Material.
When all the years are considered, Fig. 7 shows the individual contribution made by the main variables to the two first dimensions.In the case of SJC, O3, RH, T and P make the main contribution, while in the case of GRT it is made by O3, RH, T, P and SR.
Given the importance of O3, (as discussed earlier), the classification provided by Boruta's algorithm with respect to the other variables is shown in Fig. 8 with O3 as the target variable.SJC and GRT show significant differences, but NO2 is important for both.In the case of SJC, there is a notable level of importance for NO2 and SR of around 50; for GRT, RH is approximately 60, whereas NO2 is equivalent to SJC.This variation may be due to the different geographical location of both towns and the wind direction factor (not shown) may explain why there is a considerable difference.

CONCLUSIONS
This study allowed us to compare the behavior of the NO, NO2 and O3 pollutants during periods with and without strikes and periods with a pandemic and without a pandemic.It was also possible to correlate the variables measured by two automatic air quality monitoring stations in the towns of SJC and GRT that belong to RMVP and thus assist in understanding the interrelationship of the variables, in a synergy between the chemistry of the atmosphere and statistical tools.
During years 2018 (the truck drivers' strike) and 2020 (the partial blockade -COVID19), there was a reduction in NO concentrations of 70% and 58%, for SJC and GRT respectively.In the case of GRT, owing to both a) the truckers' strike in 2018 and b) the partial lockdown imposed by the coronavirus pandemic in 2020, there was a 20% reduction in NO2 in SJC and 22% in GRT and a 24% increase in O3 in SJC and 2% in GRT, when these periods are compared with the years 2017 and 2019.The meteorological variables showed little variation during the years of study that could have affected the studied concentrations.
The PCA provides an effective reduction in the amount of data and ensures a secure relationship between the variables.In the case of both towns during the 4 years under study, PCI accounts for around 42% of the original data.PC1 is a measure of air masses during the day that are under the influence of O3 concentrations, (T, RH and SR in all cases), whereas PC2 is influenced by the concentrations of NO and NO2, without interference of the O3 loading values.PC2 also shows signs of the influence of P and residual relations with the WD, especially when describing the behavior of NO and NO2.This is because O3 tends towards zero in PC2.PC2 can thus be a description of nocturnal behavior, which is a time that has lower O3 values because of the lack of photolysis and higher P values (i.e., a lower absolute value).PC3 in general only showed a positive correlation with P or a negative correlation with WS.It did not show a close relationship with the other components, considering the other loading values.Some variations can be observed, but they are linked to specific processes that caused changes in the values, such as meteorological events, for example.However, in general, the loading values for the towns of SJC and GRT were close and had significant patterns, as well as average values around an average central value.Several relationships between pollutants and meteorology have been found and are useful in understanding the evolutionary behavioral pattern of pollution over time and makes it possible to make meteorological forecasts about in a predictive manner in the variability of air pollution.

Fig. 1 .
Fig. 1.Location of the two studied sites GRT and SJC at MRPV.

Fig. 3 .
Fig. 3. Diurnal pattern of O3, NO and NO2 for GRT and SJC.O3, NO and NO2 have different diurnal patterns and are well established in the literature.The increase in O3 concentrations and decrease in NO and NO2 can be observed for 2020 and 2018.

Fig. 5 .
Fig. 5. Pearson correlation matrices for SJC and GRT during 2017-2020.Positive correlations were observed between O3, T and  SR and between NO and NO2 and negative correlations between T and RH and between O3, NO and NO2.

Fig. 6 .
Fig. 6.Main PCA dimensions for (a) 2017, (b) 2018, (c) 2019, and (d) 2020, for SJC and GRT.Positive and negative influences can be observed in the first three dimensions of the first PC.Both cities show similar behavior for all years.

Fig. 7 .
Fig. 7.The main contributions made by the top 10 variables for dimensions 1 and 2 all over the years.

Fig. 8 .
Fig. 8. Importance of all the variables for the O3 formation estimated by the Boruta algorithm, for the two cities.The existence of the singularity between sites is important for computational modeling to present values closer to reality.

Table 1 .
Average values for the variables during 2017-2020 for SJC and GRT.