Spring Aerosol in the Urban Atmosphere of a Megacity: Analytical and Statistical Assessment for Source Impacts

In the complex situation with the plurality of emissions, the important research task of assessing the air quality and potential sources through aerosol composition analyses remains for Moscow’s megacity environment. The light absorption, PM10 mass concentration, aerosol composition, and meteorological parameters in this urban background were measured during spring 2017, a period characterized by significant changes in the air temperature, mass advection, and solar radiation. The organic and elemental carbon (OC and EC) and 76 organic compounds, e.g., alkanes, polycyclic aromatic hydrocarbons (PAHs), oxidized PAHs, hopanes, anhydrosugars, polyols, primary and secondary saccharides, and HULIS, as well as 13 ions, including K+, a marker of biomass burning, have been quantified to determine the carbonaceous and inorganic chemical profiles of the aerosol. The correlation between the absorption Ångström exponent (AAE) and the levoglucosan concentration reveals the relative contributions of agricultural fires and residential biomass burning (BB) nearby to the urban aerosol composition. Combining detailed analytical and statistical approaches, we have identified and analyzed the specific chemical compounds that most accurately represent the variability of the aerosol composition. Principal component analysis (PCA) highlights the main factors for marker species related to gasoline/diesel traffic, BB, biogenic activity, and secondary formation in the atmosphere. Distinguishing the BB-affected periods allows us to evaluate daily changes in the aerosol composition in relation to the transported air masses and detected fires in the areas surrounding Moscow.


INTRODUCTION
The physical and chemical processes of particulate matter (PM) accumulation in the atmosphere depend on environment, anthropogenic and natural emissions, and transport of air masses. In order to take actions to reduce an exposure to air pollution, it is essential to know the sources and activities contributing to a level of pollution. Following a meta-analysis of recent studies, six major source categories for PM can be defined in the urban environment: traffic, resuspension of of crustal/mineral dust, industrial sources, sea/road salt, biomass burning (BB), and atmospheric formation of secondary inorganic aerosols (Belis et al., 2013).
Chemical characterization of aerosol composition is a powerful tool for source apportionment (Gu et al., 2013;Chen et al., 2014;Li et al., 2018). However, due to the molecular complexity of PM, the chemical speciation of classes and individual organic compounds associated to the various sources is challenging. Identification of chemical compounds can act as molecular markers in the polluted atmosphere (Gu et al., 2013;Li et al., 2018). This approach is based on composition profiles for urban fossil fuel (FF) sources including gasoline-and diesel-powered vehicles (Kotianová et al., 2008), motor oil (Schauer et al., 2002), and heating plants (Bi et al., 2008). The relevance of BB emissions to residential heating in Europe cities (Pietrogrande et al., 2011;Nava et al., 2015) and long-range transport from wildfires to urban aerosol composition (Diapouli et al., 2014) is supported by organic and inorganic characterization. Humic-like substances (HULIS), originated in secondary reactions of volatile organic compounds (VOCs) emitted from both anthropogenic and biogenic sources (Kuang et al., 2015), attract an attention due to their effects on the radiation budget (Park and Yu, 2016).
Optical aerosol characteristics such as spectral absorption are useful for identifying the source impact on aerosol composition (Kirchstetter et al., 2004;Diapouli et al., 2017;Healy et al., 2017;Popovicheva et al., 2017). Biomass burning can produce light-absorbing aerosols that exhibit much stronger spectral dependence than high-temperature combustion of fossil fuels, such as diesel/gasoline of transport emissions.
Numerous and changeable aerosol sources as well as the diversity of molecular constituents in the organic and inorganic fraction of PM poses the need for the using of chemometric tools to provide the statistical analyses. Multivariate principal component analysis (PCA) was recently applied to analyze the FF and BB sources' contribution to urban particleassociated organic compounds (Pietrogrande et al., 2011;Li et al., 2018). However, due to a limited number of organic compounds characterized in those studies, a series of uncertainties remains and stimulates the further developing of the combined analytic and statistic approach. The extension of PCA analyses by including the light absorption data for absorption Ångström exponent (AAE) as an optical marker for identification of the BB impact on aerosol composition can be a promising approach.
High population and a wide range of activities in a megacity lead to large-scale ecological impact which requires assessment by advanced aerosol characterization and statistic approaches. At present, the evaluation of PM composition in many megacities is performed (Cheng et al., 2016). Moscow is the largest megacity, and generally represents a typical urban area. In Moscow, mass concentrations of fine particulate matter with a diameter of less than 10 µm (PM10) were found to be comparable or slightly higher than in other big European cities (around 20-30 µg m -3 in yearly average) and lower than in Asian megacities (50-100 µg m -3 ) (Cheng et al., 2016). Temporal variability of PM 2.5 , the aerosol optical depth (AOD), temperature, humidity, and wind speed have been considered at the urban background site (Gubanova et al., 2018). However, no aerosol characterization approaches and source assessment analyses were applied for Moscow urban aerosols yet. Only case during extreme smoke event was recorded when a huge increase of PM and black carbon (BC) mass concentrations as well as the AOD, carbon-containing compounds, and BB markers have demonstrated the wildfire smoke pollution on urban environment (Chubarova et al., 2012;Popovicheva et al., 2014).
This work develops the advanced analytical and statistical approach for the comprehensive characterization of multicomponent aerosols in the Moscow urban background in the complex situation of the plurality of emissions in a megacity. Spring is considered as the multisource season when the impact of both urban combustion sources and agriculture/residential fires in surrounding areas could be significant in accordance of biogenic activity. A large set of organic and ionic compounds characterizes the variability and source-dependent composition of aerosols. Both optical and chemical markers as well as meteorological data are applied for the identification of BB-affected periods. PCA analyses identifies a number of emission sources impacted the aerosol composition in the Moscow background.

Measurement Campaign and Air Mass Transportation
The PM sampling setup was installed on the roof of the Meteorological Observatory of Moscow State University (MO MSU), located at the territory of the MSU campus, southwest of Moscow city center (Fig. 1). MO MSU is located on Vorobievy Gory ("Sparrow Hills"), in an area which is well ventilated with no industrial facilities or major roads located nearby. Therefore, MO MSU can be characterized as an urban background site. Sampling of PM10 was performed on 47 mm quartz fiber filters (preheated at 600°C in advance) in 24 h intervals from 8 p.m. Measurements of meteorological parameters were performed by the MO MSU meteorological service. At the same time PM 10 mass concentrations were continuously monitored by the State Environmental Protection Institution, "Mosecomonitoring," using a tapered element oscillating microbalance (TEOM 1400a; Thermo Environmental Instruments Inc., USA).
Measurements were performed between 17 April and 25 May of 2017. In this period backward air mass trajectories (BWTs) were generated, using the NOAA HYbrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model of the Air Resources Laboratory (ARL) (Stein et al., 2015) with the coordinate resolution equal to 1° × 1° of latitude and longitude. The potential source areas were investigated using 24 h BWTs for air masses at 500 m heights above sea level (a.s.l.). Fire information was obtained from Resource Management System (FIRMS), operated by the NASA/GSFC Earth Science Data Information System (ESDIS). The daily fire maps were related to the computed trajectories. A number of fires which could affect air masses was calculated as a sum of fires occurred at distance 0.5° on both latitude and longitude from BWTs. The number of fires passed by BWTs was estimated as the sum of amounts of all fires caught on the BWT points or in their neighborhood of no further than 0.5° along the latitude and longitude.

Optical Measurements
An off-line examination of light attenuation on quartz filter samples was performed using the multiple-wavelength light transmission instrument (transmissometer) described in Kirchstetter et al. (2004) and Popovicheva et al. (2017). The intensity of light transmitted through quartz filters was measured at seven wavelengths from the near-ultraviolet to near-infrared spectral region. At least five different areas of the sample filter were analyzed in order to assess the possible heterogeneity of the sample. The light absorption was approximated by light attenuation (ATN) caused by the particle deposit defined as follows: where I O and I is the light intensity transmitted through unexposed and exposed parts of the filter, respectively. The dependence of ATN on the wavelength λ was parameterized using a power law relationship: where the AAE (absorption Ångström exponent) is a measure of the spectral variation of aerosol light absorption.

Analytical Chemistry Methods
A DRI 2001 Analyzer was used for the determination of the organic carbon (OC) and elemental carbon (EC), according to the IMPROVE_A protocol (Chow et al., 2007). In the first heating stages, OC was thermally desorbed from the filter under a flow of helium with controlled temperature ramps. In the second stage, 2% O2 is introduced with carrier gas, the original EC component plus any remaining pyrolyzed OC formed during the first heating stage were oxidized and desorbed. Correction of pyrolysis charring from OC is made by continuously monitoring the filter reflectance via an He-Ne 632.8 wavelength laser throughout an analysis cycle.
The in situ derivatization thermal desorption-gas chromatography and time-of-flight mass spectrometry (IDTD-GC-TOFMS) was used, as developed by Orasche et al. (2011). Hydroxyl and carboxyl groups of compounds such as anhydrous sugars were targets of the silylation derivatization procedure. Punches of quartz fiber filters were spiked with two isotope-labeled internal standard mixtures consisted of 1) fifteen deuterated PAHs, two deuterated oxy-PAHs and four deuterated alkanes and 2) 13 C 6 -levoglucosan (Omicron Biochemicals, USA), 13 C 6 -vanillin (Larodan, Sweden) and D 31 -palmitic acid (CIL, USA). The analytical precision of all studied analytes were below 17% within a calibration range, from 22 pg for abietic acid up to 342 ng for levoglucosan. Limits of quantification (LOQs) for PAHs were between 1 pg for fluoranthene and 8 pg as well as 17 pg for levoglucosan. Following organic species were quantified: thirteen PAHs, eight oxy-PAHs, eight hopanes, fifteen alkanes, three anhydrosugars, dehydroabietic acid methyl ester and dehydroabietic acid, 1,8-naphthalic anhydride and 1,8-naphthalaldehydic acid, and nicotine.
Humic-like-substance-bound OC (HULIS-C) was quantified by simultaneous analysis of two fractions with different (pH-dependent) solubility and thus with different molecular weight ranges (Limbeck et al., 2005;Feczko et al., 2007). Extracted samples were injected to the flow system of the total organic carbon measurement (FI-SPE-TOC). Eluent 1 (0.01 M HNO3) forwarded the sample to the anion exchanger (SAX) micro-column. HULIS were eluted from SAX by Eluent 2 (0.06 M HN 4 OH) and were introduced into a catalytic oven (900°C). Humic acid (HA; Fluka, ~20% of ash) was used for calibration. The water-soluble HULIS-C (HULIS-C-WS) and alkali-soluble HULIS-C (HULIS-C-AS) were measured separately for each sample. The limit of detection (LOD) was 17 ng m -3 mainly due to a complex isolation procedure.

Moscow center
suppressor and a conductivity detector. Calibration for each ion was performed using external standards diluted from stock solutions (Merck). Filter blanks were measured and subtracted. Nitrite and nitrate (NO 2and NO 3 -) were measured but due to usage of nitric acid during the sample preparation the quantification was not conducted. The limits of detection varied between 2 and 20 ng m -3 and were lowest for lithium, sodium, and chloride and highest for organic ions, due to poor separation of those species.
Fourteen water-soluble saccharides (including polyols, anhydrosaccharides, primary and secondary saccharides) were analyzed from the effluents collected after the C18extraction step of HULIS by high-performance anion exchange chromatography with pulsed amperometric detection (HPAE-PAD), as described by Iinuma et al. (2009). The effluents were injected directly to a Dionex ICS-3000 system, equipped with a CarboPac MA1 column. Quantification was done using external standards (Fluka; Merck). The LOD was 1 ng m -3 for most compounds. Galactosan, sucrose, and cellobiose showed poorer resolution resulting in higher LODs (3-6 ng m -3 ). Because the GC-MS method was more sensitive than HPAE-PAD, data for three anhydrosugars were taken from GC-MS. On average, the results for anhydrosugars obtained with HPAE-PAD and GC-MS showed a 20% deviation for levoglucosan and 30% for mannosan.

Correlation and Principal Component Analyses
Multivariate principal component analysis is one of the most common multivariate explorative techniques which decomposes the data matrix and concentrate the source of variability into the first few principal components (PCs; Wold et al., 1987). By autoscaling, all data are mean-centered and then divided by the standard deviation of the variables. From PCA analysis, scores and loading plots are obtained, they allow an easy visualization of samples and variables. Hotelling analysis calculates the covariance ellipsoid corresponding to 95% confidence level. Data outside of the ellipsoid are considered as outliers and discarded from further analyses. PCA correlation loading plot contains two ellipses that indicate how much variance is taken into account, so the quantity of information each variable can explain. The outer ellipse is the unit circle and indicates 100% explained variance. The inner ellipse indicates 50% of explained variance.
PCA decomposes the data matrix of a set of observation variables (chemical compounds) into a set of values of linearly independent PCs, which represent the compounds of the biggest explained variability. PCs subsequently interprets as potential source factors. In our approach we would like to partially reduce the total amount of variables in the dataset of chemical compounds in order to minimize the redundant information and to try to maintain only the significant information for the optimal description of the samples. For that purpose, we restrict the number of variables choosing only representative compounds for the entire dataset of a given chemical class, the ones which are able to describe it in the most appropriate way. PCA is performed with all classes of organic and inorganic compounds found in spring aerosols while we take into account variables with the highest loadings. These are the most informative variables to explain the dataset. The considered variables have been evaluated together with the highest analytical validity of the corresponding chemical compounds.
A number of statistical techniques, such as LDA (linear discriminant analysis) and PMF (positive matrix factorization), requires an exact ratio between the number of samples (or observations) and variables. In the case of PCA there is no strict rule about the number of samples and variables to be taken into account. However, in order to show the differences in the calculation results, we verify a number of variables in PCA calculations further.

Meteorology, Light Absorption, and Air Mass Transportation
April and May in the Moscow area are typical spring months when temperature, biological and vegetation activity is increasing. From 29 April until 2 May the ambient temperature approached an abnormally high level for this season, + 24°C ( Fig. S1). During three days in April (25, 29, and 30) and twelve days in May (3, 5-7, 13-15, 18, 22, 24, 26, and 27) the solar shortwave radiation was very high and exceeded the 90% quantile of the maximum hourly doses in summer 2017 which is 24 MJ m -2 (Fig. S1). During indicated days we expect the most favorable conditions for photochemical aerosol generation and agricultural activities in the south of Russia and Moscow's surrounding areas.
The spectral dependence of the aerosol light attenuation is well approximated by a power law equation (Eq. (2)), providing the estimate for absorption Ångström exponent, as shown in Fig. 2. Variation of the AAE during the whole sampling period exhibits the range from 1.03 to 1.95 (Fig. 3). From 18 to 22 April air masses were transported from the north and passed the numerous agriculture fires close to Moscow (Fig. S2), the AAE higher than 1.4 was observed (Fig. 3). On 23 and 24 April the maximum of precipitation was occurred, up to 14 mm, the AAE was dropped. Since 25 April it became higher again, well in correlation with a large number of fires observed in the south of Moscow (Fig. S2). Fig. 4 shows the BWTs arrived on 26 April and 2 May from areas of intensive agriculture fires in the south of Russia and western Europe. High AAE value observed from the light absorption measurements also supports the effect of BB (Fig. 1). We should note that the period from 30 April-5 May coincides with a holiday period in Russia when warm weather stimulated the intensive residential activity around the Moscow city, such as garden cleaning, grass burning, and barbecue. After 4 May the direction of air mass transportation changed back to the Arctic region, relating to a temperature decrease (Fig. S1). On 5 May the AAE dropped to 0.97 and did not exceed 1.3 anymore.
Light absorption by particulates emitted from fossil fuel combustion sources exhibits relatively weak wavelength dependence with the AAE close to 1, indicating that black carbon is the dominant absorbing aerosol component (Kirchstetter et al., 2004;Popovicheva et al., 2017). While biomass burning aerosols are distinguished by stronger wavelength dependency, showing AAE of about 2.5 for wood  smoke (Kirchstetter et al., 2004). This indicates that brown carbon (BrC) associated with high-weight OC, in addition to BC contributes significantly to the measured light absorption in ultraviolet and visible spectral regions. Observations performed in urban environment show the variation of AAE indicating the impact of BB on urban aerosol, the AAE level above 1.3 was suggested to identify periods most affected by biomass burning  termed further as "BB-affected periods."

PM Mass and Aerosol Composition
PM10 mass daily concentrations show a strong variation from the lowest of 8 µg m -3 to the highest value of 63 µg m -3 , on average 22 ± 16 µg m -3 (Fig. 5). Maximum was recorded on 30 April due to air mass advection from south and southeast regions of Russia. Main constituents of spring aerosols are OC and EC followed by inorganic ions (Fig. 5). OC concentrations follow the PM 10 trend featuring an average OC concentration of 5.6 µg m -3 . The highest values are observed Average EC is 1.9 µg m -3 , ranging from 0.3 to 3.9 µg m -3 . In urban environment the ratio of EC/TC characterizes the impact of combustion emission to carbonaceous aerosol composition. During sampling period of our study, it approaches 20%. OC/EC ratio in the urban environment OC can be comparable to EC, dominated by vehicle emissions (Samara et al., 2014). Increased ratios of OC/EC between 4 and 15 are associated with wildfires in different regions over the world Jung et al., 2016) due to the high OC content of BB emissions, especially from the smoldering combustion phase (Kalogridis et al., 2018). In our study average OC/EC is 2.1 at low AAE and increases to 30% during days of higher AAE. The maximal OC/EC reaches 3.8 (Fig. 5) in the days of air mass transportation from agriculture-fire-affected regions. Moreover, meteorological conditions, such as high temperature and solar radiation (Fig. S1) favor the formation of secondary organic aerosols due to condensation of heavy hydrocarbons, acid-catalyzed reactions, and oxidation in smoke plumes (Alves et al., 2010), that influences the OC/EC ratio as well.
Daily variability of individual organic compounds is shown in Fig. 6 (32abS) and 22R-17α(H),21β(H)-bishomohopane (32abR). They show the concentration minimum from 6 to 12 May. Maximum concentrations for n-alkanes with carbon number from 20 to 34 are found for the compounds C20 and C29. Nicotine is a marker of tobacco smoking; it shows relatively high variability.
Levoglucosan (Lev) is widely deployed as a BB marker for both near-source smoke (Engling et al., 2006) and fire long-range transportation (Fu et al., 2010), resulting from thermal decomposition of cellulose. During smoldering Lev can reach up to 30% of PM mass (Kalogridis et al., 2018). In our study Lev varies from 16 to 281 ng m -3 having its maximum on 21, 25-29 April and 5 May, at the same days when a maximum of the fire numbers and AAE is observed (Fig. S2). Thus, AAE and Lev can act as optical and chemical marker, respectively, identifying BB-affected days as 20.04, 21.04, 25.04, 29.04, 1.05, 3.05, and 4.05, as shown in Fig. 3. Dehydroabietic acid (DEH-AC) is a tracer of coniferous wood burning (Simoneit et al., 1999), its concentration is not increased in the BB-affected period. The Lev/Man ratio during BB-affected days is significantly higher than during FF period (24 and 16, respectively). The aerosol composition during other days was influenced mostly by urban sources of fossil fuel combustion, further we term these days as "FF period." The Lev concentration level was less than in BBaffected period, in the range from 20 to 75 ng m -3 , once up to 120 ng m -3 (Fig. 3). Here we note that Lev concentrations larger than 30 ng m -3 are reported for European sites affected by wood burning in summer and winter Yttri et al., 2011).
Time trends of anhydrosaccharides and polyols are related to each other and show three characteristic periods (Fig. 7). Until 25 April the concentrations of anhydrosugars showed a strong variability with high peaks from day to day (up to factor of 5), whereas polyols remain constantly low indicating the clear BB influence. This situation changed in the period from 27 April till 5 May, when both anhydrosugars and polyols followed the same rising trend. This situation reflects the co-existence of BB and bioaerosols, which was already observed during springtime in other studies . This can be explained by a significant abnormal temperature increase in these days in relation with intensive biological activities (Fig. S1). Moreover, the co-existence of polyols, primary and secondary sugars with anhydrosugars may be explained by suspension of soil and plant debris in the heat wave of wildfires (Medeiros et al., 2006). From 6 May, Lev and Man decreased eventually, pointing to a reduction of the BB influence. Polyols remained at the same level indicating the spring impact of biological sources as fungal spores, bacteria and plant pollen, which seems reliable due to enhanced air temperature (Fig. S1).
The total ionic concentration is found on average 2.2 µg m -3 and follows the trend of OC and EC concentrations (Fig. 5). The concentration of ions varied strongly from day to day (Fig. 6), the dominant ionic species are SO 4 2-, followed by Ca 2+ , NH 4 + , and Na + .
Analyses of the aerosol concentrations averaged according to FF and BB-affected periods show that the main difference is visible for OC concentrations, which are 2 µg m -3 higher during the days of BB impact (Fig. S4(a)). The absolute EC 1 1 / 4 1 3 / 4 1 5 / 4 1 7 / 4 1 9 / 4 2 1 / 4 2 3 / 4 2 5 / 4 2 7 / 4 2 9 / 4 1 / 5 3 / 5 5 / 5 7 / 5 9 / 5 1 1 / 5 1 3 / 5 1 5 / 5 1 7 / 5 1 9 / 5 2 1 / 5 2 3 / 5  values are almost identical, but in relation to the sum of the main constituents, the contribution of EC is higher during the FF period. Most inorganic ions are slightly higher for BBaffected periods: The total sum concentration is 2.4 µg m -3 in comparison to 2.0 µg m -3 . The Na + concentration shows the highest relative increase during the BB-affected period (by 50% and by 0.12 µg m -3 in terms of mass concentration). Additionally, K + correlates with other saccharides (e.g., trehalose and polyols) and with other ions, which indicates a multisource character of K + (e.g., soil, biogenic aerosol and BB). The spread between BB-affected and FF periods can be observed for median Lev and ranges from 71% to 45%. The contribution of inositol is higher during the BB period (14%) than during FF period (10%). Against the expectations that HULIS can act as BB marker, higher HULIS-C concentrations are observed during FF period. The share of HULIS-C in OC was much higher during the FF period ( Fig. S4(b)) but recent studies (Kuang et al., 2015) denied that HULIS can be related to vehicle emissions.

Source-related Composition
At present, in Moscow megacity twenty-six gaseous and particulate pollutants (PM10 and PM 2.5 ) are under continuous measurements (Mosecomonitoring, 2017). Around 630 industrial enterprises of various branches of mechanical engineering and metal working, power engineering, chemistry and petrochemistry, light and food industry, production of construction materials (including 30,000 stationary emissions sources) are registered. Around 50% of all pollutant emissions from industrial sources are emitted by enterprises producing and redistributing energy, gas, and water. All gaseous automobile transport exhaust composes 95% of total city emissions (Mosecomonitoring, 2017). Aerosol emissions are produced by FF combustion (of gas in industry and energy production as well as of diesel and gasoline by transport systems). The absence of BB-based residential heating (due to city-wide central heating systems) distinguishes Moscow from other European megacities.
PAHs and their derivatives are produced by incomplete combustion of organic material mostly arising from anthropogenic emissions and wildfires. PAHs vary significantly in urban environment and are mainly influenced by transport-related gasoline, diesel and fuel oil combustion as well as domestic emissions (Ravindra et al., 2008;Ladino et al., 2018). Due to the different stability of PAHs against degradation the atmospheric aging can influence their ratios. BaP, sumB, IND, and BaA are prominent in emissions from non-traffic sources (natural gas combustion and domestic heating plants) while BgP is specific for gasoline and diesel exhausts (Pietrogrande et al., 2011). Several of these compounds have proven to be mutagenic and/or carcinogenic (Pedersen et al., 2009). The correlation matrix for PAHs highlights that PYR, FLU, BaA, CRY, sumB, ACE, BeP, BaP, IND, and BgP are well correlated with each other (R > 0.8) confirming their common source. PER, DiBaA, ANT, COR, and RET are not well correlated with the rest of PAHs, which indicates different sources.
Oxy-PAHs are emitted from primary sources or formed by atmospheric reactions between PAHs and atmospheric oxidants such as NOx and O 3 (Bandowe et al., 2014;Lee et al., 2018). Naphthalic anhydride (NAP-AN) and xanthone indicate the increased reactivity of PAHs adsorbed on particles exposed to atmospheric oxidants (O 3 , OH and NO 2 /O 3 mixture) (Ringuet et al., 2012), probably stimulated by photochemical activity leading to oxidation of PAHs. PCA analyses showed that the major sources of oxy-PAHs and PAHs are vehicle emissions and biomass burning (Lee et al., 2018) which can be taken to represent secondary organic aerosol (SOA) formed by photochemical reactions in the atmosphere.
Hopanes are used as markers of traffic emissions because engines that use lubricating oil emit hopanes (Lin et al., 2010). n-Alkanes are emitted by both biogenic and anthropogenic sources which can be differentiated according to n-alkane carbon numbers (He et al., 2006;Ladji et al., 2009). The most abundant n-alkanes in traffic emission are C20-C30 mainly from lubricating oil and fuel with a maximum for C25 for gasoline-powered vehicles and C20 for heavy-duty diesel trucks (Kotianová et al., 2008). While the heavier n-alkanes (> C27) and sugars are mainly emitted from biogenic sources or due to incomplete biomass combustion (Rissanen et al., 2006;Fu et al., 2009). A good correlation between anhydrosugars is observed in wood combustion as well as in ambient air affected by BB (Caseiro et al., 2009;Lee et al., 2010;Reche et al., 2012).
The amount of ions is not only driven by primary gaseous emissions, but also by changes of air temperature, humidity and in air mass transport processes (Stelson and Seinfeld, 1982). Organic ions (formate and lactate) can be associated with natural sources like soils and natural forest emissions but also traffic (Khare et al., 1999, and literature cited therein). Sea salt (or de-icing salts) and soil mineral dust are represented within ions like Na + , Cl -, Ca 2+ , Mg 2+ , and K + . Latter one is also known to be present in BB emissions and was used as a marker in different studies (e.g., Pachon et al., 2013, and literature cited therein).

Correlation Loading Analyses of PM Compounds
For a deeper understanding of the role of PM compounds as source markers in Moscow urban background in spring, the combined analytical and statistical approach is applied. For the analytical chemistry part, it is suggested to take into account the compounds with the lower analytical uncertainty in the quantification. Correlation analyses estimates how strong they correlate with others and highlight the highest correlations. The correlation matrix uses the Pearson's correlation coefficient as a measure of the linear relation between two variables. It is suggested to take in account the variables with the highest explained variance. Fig. S5 show the correlation loadings for all classes of quantified compounds. For PAHs it shows three different positions of variables: In the upper right part close to the outer ellipse, that represents 100% of the total explained variance. In the middle of the two ellipses, and out of the inner ellipse, that represents 50% of the total explained variance. From the first group of variables with the highest explained variance, the following analytes with lower analytical uncertainty have been chosen: BaP, BgP, SumB, IND and ACE. The same approach has been used to choose FLU from the second group and RET from the third group. As well COR has been chosen as marker for possible different sources.
The correlation matrix for oxy-PAHs highlights well a correlation (R > 0.8) between 11HBaFone, 11HBbFone, 11HBcFone, and 7HBdeAone while a lower correlation factor is found for 9HFLUone, xanthone, ANQ-DO, CPPH-O, NAP-AC and NAP-AN. The correlation loading of o-PAHs shows a cluster of chemical compounds quite close to the outer ellipse that represents 100% of the total explained variance containing 11HBaFone, 11HBbFone, 11HBcFone, 7HBdeAone and NAP-AN. To perform the final PCA from the first group of variables with the highest explained variance, the analyte with the lower analytical uncertainty has been chosen: NAP-AN. By the same approach CPPH-O and xanthone have been chosen.
The correlation matrix highlights well a correlation (R > 0.8) between 29ab, 30ab, 31abS, 31abR. The correlation loadings of PC1 vs. PC2 show three clusters containing 31abS, 31abR, 30ab, and 29 ab; 32 abR and 32 abS; and Tm and Ts, respectively. 29ab and 30ab are chosen as representative compounds for hopanes because the concentrations of others are negligible. To perform the final PCA, from the groups of variables with the highest explained variance, analytes with the lower analytical uncertainty have been chosen: 29ab and 30ab.
The correlation matrix shows well a correlation (R > 0.7) between alkanes from C22 to C33, which is confirmed by the correlation loading. A significant correlation is also shown between C20 and C21, and between C21 and C22 while there is a low correlation (R < 0.4) between C20 and C22. C34 does not show any correlation with any other alkane. To perform the final PCA, C20, C23, C24, C25, C27, C29, C32, and C33 have been chosen as the representative alkanes.
Correlations are observed between Aol, Mol, Fru, Glu, Tre, and Suc. Cellobiose (Cel), Eryth, xylitol (Xol), and galactose (Gse) are present in very low concentrations, and they are not significantly correlating with other sugars, beside a significant relation between xylitol and fructose. Strong and multiple correlations between polyols, primary and secondary saccharides point to the fact that these sugars are related to each other. These sugars were reported to originate from biological aerosols: Fungal spores, bacteria and plant debris, as compiled by Caseiro et al. (2007), are often observed in rising amounts during spring season (Yttri et al., 2007). This points to a rising activity of biological sources, like plant pollen, bacteria, and fungal spores. This can be due to the vicinity of the MO MSU location to the botanic garden of the MSU campus. To perform the final PCA, Xol, Aol, Mol, Glu, Suc and Ino have been chosen. Lev, Man, and Gal in the correlation loading are quite distant indicating a clear difference in source. For the final PCA we take into account all BB markers, namely Lev, Man, Gal, and DEH-AC.
Because for both HULIS-C fractions many values are found below LOD no significant correlation of HULIS-C and other compounds is recognized. In the correlation loading analysis HULIS-C fractions are situated outside the inner ellipse which means that this fraction does not contribute significantly to the explanation of the variance. Due to this fact, HULIS is not used in the final PCA.
The correlation loadings of ions show a first cluster composed of SO4 2and NH 4 + , a second cluster composed of K + , Mg 2+ , Ca 2+ , For, and Na + (Fig. S5(g)). These two clusters are close to the outer ellipse that represents 100% of the total explained variance. In between two ellipses, in different position in the plot, stand BO 3 3and Lac. Lying on the inner ellipse and completely out of it, that represents 50% of the total explained variance, and in different position in the plot, are Cl -, Li + , PO 4 3-, and Cl -. To perform the final PCA, from the groups of variables with the highest explained variance, analytes with the lower analytical uncertainty have been chosen: SO 4 2-, Cl -, K + , Na + , Ca 2+ , FOR, and BO 3 3-.
For more detailed description of correlation loadings for quantified compounds see "Supplemental Materials."

PCA for Aerosol Composition and Source Identification
PCA is a common method used in order to describe the impact of source categories in urban environment (Pietrogrande et al., 2011;Lee et al., 2018). The approach developed in this study has formally considered not only number of compounds but also their relations to sources which is different for different compounds in the same class of compounds. Thirty-four representative compounds of various classes of organics (PAHs, oxy-PAHs, hopanes, alkanes, and sugars including Lev, Man and Gal) and seven inorganic ions have been chosen for final PCA, based on their relevance as molecular markers and with a purpose to identify the various sources. Nicotine is added as well.
Four significant PCs describe 78% of the explained variance (EV) contained in the dataset. The loading factors with associated variance are reported in Fig. 8(a). The PC1 (EV = 50%) presents the loading Factor 1, it contains variables of highest variability with absolute values for positive and negative variability (> 0.6) for almost all PAHs except COR and RET, oxy-PAN except CPPH-o, hopanes, alkanes except C20, ions except Na + , Cl -, PO 4 3-, and sugars except Man and Lev. The PC1 loading contains almost all the variables describing different emission sources: combustion, natural and biological sources, all presenting strong positive loadings (≥ 0.5).  PAHs in Factor 1 indicate emissions from various FF, natural gas combustion and heating plants. C21-C28 are associated with a combination of 4-5-ring PAHs, which are predominant in the emissions from gasoline-and dieselpowered vehicles. C25 confirms the high impact of gasoline-powered vehicles, also supported by 29ab and 30ab from traffic emissions. This observation corresponds well to the present situation in the megacity relating to an increasing fraction of modern vehicle fleets comprised of gasoline direct injection vehicles (Zimmerman et al., 2016). Oxy-PAHs such as 1,8-naphthalic anhydride were found in southern European cities as the compounds associated with secondary organic aerosol formation (Alves et al., 2017). NAP-AN and xanthone in Factor 1 can indicate presence of SOA in the Moscow atmosphere.
K + in Factor 1 exhibits the highest variability in contrast to other BB markers (Lev, Man, Gal) and similar to DEH-AC, which presence in the air is an indicator of forest fires and conifer wood domestic burning. SO 4 2and NH 4 + represent well the secondary inorganic aerosols. The high input of Ca 2+ and Mg 2+ , as well as the presence of K + is a hint that PC1 describes also crustal dust sources, like abrasion of soils or construction activities. All sugars are of the similar highest variability demonstrating the same biogenic emission source. This is also confirmed by high loadings of higher alkanes C29, C31, also known as plant waxes (Kotianova et al., 2008). Thus, Factor 1 is interpreted as a mixed factor identifying the sources of traffic, biomass burning, secondary inorganic aerosol, crustal dust and intensive photochemical and biogenic activity. The PC2, explaining 14% of the total variance, shows strong positive loadings (≥ 0.4) for the most of PAHs, o-PAHs, hopanes, and alkane chains ( Fig. 8(b)), explaining emissions from FF combustion. Loading Factor 2 presents variables of highest variability for RET, CPPH-O, and C20. Retene is derived by degradation of specific diterpenoids from conifer trees. Therefore, it is a major product of pyrolysis of conifer wood (Vicente et al., 2011). Eicosane C20 (n-alkane with lower carbon number) characterizes heavy-duty diesel truck emissions. In Moscow the diesel truck entry is allowed only after the middle of the night that presently limits its impact to total emission. Thus, marker significance in Factor 2 is an indicator of wood combustion/forest fires and heavy-duty transport impacts. The PC2 with strong negative loadings (≤ 0.4) for all ions and sugars explains the natural source.
The PC3 explains 8% of the total variance (Fig. 8(c)). Loading Factor 3 presents the main markers of BB emissions, Lev, Man, Gal, DEH-AC and K + , all showing strong positive loadings (≥ 0.3). Other variables with highest positive variabilities are benz[a]anthracene, nicotine and coronene. COR is produced in the petroleum-refining process of hydrocracking and shows the highest negative variability. Na + and Clfrom aged marine aerosols, also Mg 2+ and K + show enhanced loadings in Factor 3. The molar ratio of Clto Na + is on average 0.3 indicating a strong depletion of Cl - (Li et al., 2016) which is in accordance with the fact that Moscow lies around 1000 km away from the sea coast. It is possible that some NaCl can originate from the resuspension of de-icing salt, which is used in Moscow megacity but the current factor does not point to other components, which can be expected along with resuspension of soil. PO 4 3indicates a specific tracer for plastics burning (Simoneit et al., 2005). Thus, Factor 3 indicates the impact of smoking, local waste burning, and strongly aged marine aerosols, strongly influenced by transport over the land. Highest variability of the PC4 with loading Factor 4 with only 6% of explained variance is presented by C20 from sources which impact is the lowest (Fig. 8(d)).
Scores plot of PC1 (EV = 50%) vs. PC3 (EV = 8%) shows the variation of aerosol chemistry during the sampling period ( Fig. 9). 29 April and 1 May present the highest concentrations of all chemical compounds confirming the strong impact of various sources during holidays. Days of 20.04, 21.04, 25.04, 29.04, 1.05, 3.05, and 4.05, indicated by both AAE and Lev marker, are prominent by positive PC3, clearly determining the similarity of aerosol composition during BB-affected period.
Since some statistical techniques assume the PCA performance with the less number of variables in comparison with observations we perform additional calculations with a reduced number of variables, see the results in "Supplemental Materials."

CONCLUSIONS
A comprehensive approach has been developed for air quality assessment in the environment of a megacity. First, we characterize the aerosol composition in the megacity of Moscow using data on the carbon content, organic compounds, BB molecular tracers, and inorganic ions obtained from April till May, a season exhibiting significant changes in the air temperature, mass advection, and biological activity. The OC, EC, and ions accounted for a large percentage of the total PM10 concentrations. A wide range of optical and analytical tools is applied to quantify organic and inorganic compounds in the aerosol. The optical and chemical markers of the absorption Ångström exponent (AAE) and the levoglucosan concentration, which are strongly correlated, indicate the relative contributions of agricultural fires and residential BB near the city to the urban aerosol composition.
The aerosol compounds of the lowest analytical uncertainty and the highest explained variance are the preferred variables in our suggested analytical approach, which enhances the data quality from an analytical point of view in addition to indicating the applicability of principal component analysis (PCA). Such an approach permits us to select variables that describe the impact of various sources in Moscow during spring. Statistical analysis of the correlation loadings reveals a number of compounds that represent the aerosol in terms of chemical composition and suggest different emission sources. PCA highlights possible emissions from gas combustion, gasoline-powered and heavy-duty Scores transport, agricultural and residential fires, and biogenic activity. It also emphasizes the formation of secondary organic and inorganic aerosol and the occurrence of photochemical processes during periods of increased biogenic activity. The daily changes in the chemistry, which is mostly affected by air masses transported from agricultural fires in southern Russia, in addition to residential activity, are prominent. The computed PCA factors, distinguishable by their unique loadings, are interpreted as marker species that represent different sources, thus providing one step in the process of source apportionment for a megacity.

ACKNOWLEDGEMENT
Analyses of meteorological and chemical data was done under support of Russian Scientific Fond (RSF) Project No. 18-17-00149. HICE (www.hice-vi.eu) is appreciated for analytic measurements. OP thanks the support from the RSF Project No. 19-77-30004 for data interpretations relating to source apportionment.

SUPPLEMENTARY MATERIAL
Supplementary data associated with this article can be found in the online version at http://www.aaqr.org.