Assessing Regional Model Predictions of Wintertime SOA from Aromatic Compounds and Monoterpenes with Precursor-specific Tracers

The Community Multiscale Air Quality (CMAQ) model, with modifications to track precursorspecific SOA, was applied to model SOA formation from aromatic compounds and monoterpenes in Shanghai in November 2018. The modeled total aromatic SOA showed a strong correlation with measured 2,3-dihydroxy-4-oxopentanoic acid (DHOPA) concentrations in the ambient aerosols (R > 0.5 for hourly data and R > 0.75 for daily average data). The ratios of observed DHOPA and modeled aromatic SOA with all components included is around (0.5–1.6) × 10–3, lower than the commonly used ratio of 4 × 10–3 determined for toluene in a series of smog chamber experiments. This suggests that aromatic SOA could be underestimated when directly using the chamber-derived ratios. The predicted monoterpene SOA shows a stronger correlation with the sum of two α-pinene tracers (α-pinT), pinic acid and 3-MBTCA, with R > 0.6 and R > 0.8 for hourly and daily data, respectively. The α-pinT to modeled monoterpene SOA ratios are 0.13–0.25, which generally match the ratio of 0.168 ± 0.081 reported in chamber studies. However, since the current model does not treat α-pinene and its SOA explicitly, future modeling studies should include a more detailed treatment of monoterpene emissions and reactions to predict SOA from these important precursors and compare with the ambient precursor-specific SOA-tracers.


INTRODUCTION
Carbonaceous aerosols generally account for 20-50% of total ambient aerosols globally (Novakov et al., 1997;Kanakidou et al., 2005;Putaud et al., 2010;Contini et al., 2018). Cao et al. (2007) reported that elemental carbon (EC) and organic matter combined accounted for 39-44% of PM 2.5 in 14 Chinese cities, and in Shanghai, 40% of the PM 2.5 were carbonaceous aerosols (Ye et al., 2003). Carbonaceous aerosol can have significant impacts on atmospheric visibility (Hand et al., 2011), regional and global climate (Goldstein et al., 2009), and human health (Highwood and Kinnersley, 2006). While the EC aerosols are directly emitted from fuel combustion sources, organic aerosols are generated from direct emissions (i.e., primary organic aerosol, or POA) and gas-to-particle partitioning of the semi-volatile oxidation products from parent volatile organic compounds (VOCs). Secondary organic aerosol (SOA) contributions to total organic aerosol loading vary from 20% to 80% with significant spatial and seasonal variations (Dechapanya et al., 2004;Yu et al., 2007;Carlton et al., 2009;Hu et al., 2017).
Chemical transport models have been widely used to quantitatively study the regional and global impacts of carbonaceous aerosols (Liousse et al., 1996;Bessagnet et al., 2008;Zhang et al., 2009;Huang et al., 2013). However, correctly predicting SOA in these models is challenging, especially in polluted urban areas, as many precursors contribute to SOA formation, and the ability of each precursor to form SOA is different. For example, aromatic compounds (Nakao et al., 2012;Emanuelsson et al., 2013;Al-Naiema and Stone, 2017), isoprene (Henze and Seinfeld, 2006;Kroll et al., 2006;Carlton et al., 2009;Surratt et al., 2010), monoterpenes (Mutzel et al., 2016;Friedman and Farmer, 2018), and sesquiterpenes (Varutbangkul et al., 2006;Xu et al., 2018) are some of the major contributors to SOA. In addition, aqueous (Carlton et al., 2007) and heterogeneous processes (Ervens and Volkamer, 2010) of dicarbonyls such as glyoxal (GLY) and methylglyoxal (MGLY) have been shown to contribute significantly to SOA formation. Predicted SOA concentrations are affected by the model representation of the emission, photochemical oxidation, gas-to-particle partitioning, and the multiphase reaction processes (Lannuque et al., 2016;Li et al., 2017;Jo et al., 2019). Many of these physical and chemical processes remain uncertain due to an incomplete understanding of the SOA formation mechanisms and large differences between the atmospheric conditions in the ambient environment and the chamber conditions under which the SOA formation experiments were conducted to determine parameters used in the models (Deng et al., 2017;Jorga et al., 2019).
Techniques to apportion the observed total organic aerosol (OA) concentrations to POA and SOA include the minimum OC/EC ratio method (Turpin and Huntzicker, 1991;Wu and Yu, 2016) and its extensions (Castro et al., 1999;Cao et al., 2004), and the positive matrix factorization (PMF) analysis of the aerosol mass spectrums (Yuan et al., 2006;Feng et al., 2013). The estimated SOA concentrations have been used to evaluate model performance of bulk concentration of total SOA predictions. Although the PMF has been applied to quantify the fraction of specific SOA (e.g., IEPOX-SOA and monoterpene-derived SOA) (Budisulistiorini et al., 2013;Hu et al., 2015b;Massoli et al., 2018;Claeys and Maenhaut, 2021), no studies were found in the literature to evaluate the precursor-resolved SOA predictions with the corresponding observations.
In 2007, a tracer method to estimate the contributions of different precursors to ambient SOA concentrations was established by Kleindienst et al. (2007). In this technique, the mass fraction of the identified precursor-specific tracers to SOA formed from the precursor was determined in smog chamber experiments (Kleindienst et al., 2007;Al-Naiema et al., 2020). Among the tracers identified, 2,3-dihydroxy-4-oxopentanoic acid (DHOPA) is used for SOA from monoaromatic hydrocarbons. Pinic acid, pinonic acid, and 3-methyl-1,2,3-butanetricarboxylic acid (3-MBTCA) are tracers for SOA from α-pinene. DHOPA is well accepted to be formed in the gas-phase oxidation reaction of toluene and other aromatic compounds with OH (Yuan et al., 2018;Gao et al., 2019;Al-Naiema et al., 2020). Lau et al. (2021) proposed the possible formation scheme of DHOPA from the oxidation of unsaturated carbonyl compounds, which are the first-generation products of toluene photooxidation. Pinic acid and pinonic acid are formed in the reactions of α-pinene with oxidants including O3 and OH (Jenkin et al., 2000;Ma et al., 2007;Mutzel et al., 2016), while they have also been detected in the chamber experiments of β-pinene and other monoterpenes (Yu et al., 1999;Mutzel et al., 2016;Lau et al., 2021). In addition, 3-MBTCA has been identified as a product of gas-phase oxidation of pinonic acid (Müller et al., 2012). All these marker compounds are formed in the gas-phase chemistry and partition into the organic aerosol phase along with other SOA components.
The concentrations of the tracers in ambient particles are measured and used to calculate the total SOA formed from the corresponding precursor with chamber-determined mass fractions. Since then, ambient concentrations of these tracers have been quantified and applied to estimate the precursor contributions to SOA under various atmospheric environments, assuming that these ratios are applicable under atmospheric conditions (Kleindienst et al., 2007;Kleindienst et al., 2010;Al-Naiema and Stone, 2017;Lyu et al., 2017;Al-Naiema et al., 2020;He et al., 2020). However, since the ambient conditions are different from those in the chamber experiments, it remains unclear whether the tracer concentrations correlate with their target SOA concentrations and whether the SOA estimated using chamber-based tracer mass fractions can match air quality model predictions.
In this study, we applied a regional air quality model with precursor-resolved SOA representation to simulate SOA in Shanghai, China, in November 2018. We compared the model predicted SOA derived from aromatic compounds and monoterpenes with the hourly and daily average concentrations of the specific SOA tracers and estimated the tracer-to-SOA ratio based on two different sets of model results using two different emission inventories. This study is the first evaluation of model-predicted precursor-resolved SOA with source-specific tracers, to the best of the authors' knowledge.

Modeling Precursor-specific SOA
A precursor-specific SOA scheme was implemented in the Community Multiscale Air Quality (CMAQ) model v5.0.1 (Byun and Schere, 2006). A complete description of the scheme is available in Ying et al. (2015), so only a brief description of the model is provided below.
The gas phase photochemical mechanism is based on SAPRC-11 (Carter and Heo, 2013). In SAPRC-11, the aromatic compounds were lumped into two species, ARO1 and ARO2. ARO1 represents aromatics with lower OH reactivity, and ARO2 represents aromatics with higher OH reactivity. Biogenic emissions are handled in the SAPRC-11 mechanism with monoterpene (TERP), sesquiterpene (SESQ), and isoprene (ISOP). Since GLY and MGLY are formed from the oxidation reactions of multiple precursors in SARPC-11, the gas phase mechanism was modified to track the GLY and MGLY from different precursors with extra tagged species. For example, GLY generated from ARO1 and ARO2 are tracked with GLY_A1 and GLY_A2, respectively.
The SOA module is based on the aerosol module version 6 (AERO6). In the module, SOA was formed in three pathways: (1) Equilibrium gas-to-particle partitioning of semi-volatile products from the oxidation of precursors, represented by the classical Odum two-product model (Odum et al., 1996). The SOA yields are the same as those used in Hu et al. (2017).
(2) Oligomerization of the particle phase semi-volatile products. The semi-volatile products are assumed to form oligomers through first-order decay reactions with a half-life of 20 hours (Morris et al., 2006). The original CMAQ AERO6 was modified to track the oligomers from each specific precursor by introducing extra precursor-tagged species. (3) Irreversible surface uptake of isoprene epoxides, GLY, and MGLY on wet aerosols of cloud droplets, with uptake coefficients parameterized according to Li et al. (2015). As tagged GLY and MGLY were used in the gas phase, the SOA module was modified to track their contributions to the secondary GLY and MGLY from each precursor with tagged aerosol species.

SOA Tracer Observations
The observation campaign was conducted on November 9th through December 3rd of 2018 at an urban site of Shanghai Academy of Environmental Sciences (SAES, 31.17°N, 121.43°E), located in the southwest of the central urban area of Shanghai, China. Hourly PM 2.5 mass concentrations were measured with an online beta attenuation particulate monitor (FH 72 C14 series, Thermo Fisher Scientific). Water-soluble inorganic ions (SO 4 2-, NO 3 -, NH 4 + , Cl -, and K + ) with an online Monitor for AeRosols and Gases in the ambient Air (MARGA, Model ADI 2080, Applikon Analytical B.V.). Organic and elemental carbon (OC and EC) were monitored by a semicontinuous OC-EC analyzer (model RT-4, Sunset Laboratory, Tigard, OR, USA).
The molecular tracers were measured through the Thermal desorption Aerosol Gas chromatography (TAG) system operated on a 2-h basis for each sampling and analysis cycle. Each sampling started at the odd hour and lasted for 1 hour. A total of 270 valid samples and 11 blank samples were collected and analyzed. The system integrates online sampling, thermal desorption, and separation and quantification of individual organic compounds by gas chromatography-mass spectrometry (GC/MS) analysis. More detailed descriptions of observation methods were provided by He et al. (2020) and Wang et al. (2020).

Model Application
The modified CMAQ model was used to simulate air quality in China during November 2018. The predictions were compared with the observations from the Shanghai Academy of Environmental Sciences (SAES, 31.17°N, 121.43°E), located in the southwest of the central urban area of Shanghai, China. Details of the measurements and the observation data analyses have been documented elsewhere Wang et al., 2020). The CMAQ model has 197 × 127 grid cells in each layer and has 18 vertical layers to reach the model top of approximately 20 km. The horizontal grid resolution is 36 × 36 km 2 . The model uses stretching vertical layers with the first layer height of approximately 35 m. Initial and boundary conditions for the CMAQ model were generated using the CMAQ default vertical profiles that represent clean continental conditions. Simulation results from the first five days were treated as spin-up and were not included in the analyses reported below.
The meteorological inputs were generated using WRFv4.2 with initial and boundary conditions from the NCEP GDAS/FNL 0.25 Degree Global Tropospheric Analyses and Forecast Grids (available at https://rda.ucar.edu/datasets/ds083.3/). The land use/land cover and topographical data were based on the 30 s resolution default WRF input dataset. In addition, reanalysis nudging was enabled to improve the agreement between predicted and observed meteorological parameters (Hu et al., 2010). The major physics options for the WRF simulations were described by .
Two sets of emissions were applied in this study to represent the anthropogenic emissions from China, the Regional Emission inventory in Asia v3.1 (REAS3) (Ohara et al., 2007), and the 2017 Multiscale Emission Inventory of China (MEIC). Emissions from other countries were always based on the REAS3 inventory. The MEIC and REAS3 inventories were in 0.25° × 0.25° grids and reprojected to the Lambert conformal coordinates. Windblown dust emissions in the entire domain were generated inline (Tong et al., 2015). The MEIC inventory already includes speciated nonmethane hydrocarbons (NMHC). For the REAS3 emissions, selected speciation profiles from the SPECAITE database developed by the US EPA were used to estimate emissions of modelready VOCs (Wang et al., 2018a). Biogenic emissions were generated by the Model for Emissions of Gaseous and Aerosols from Nature (MEGAN) v2.10 .

Evaluation of meteorology inputs
The WRF model predicted meteorological inputs significantly affect the accuracy of the chemical transport model predictions. Predicted temperature and relative humidity at 2 m above the surface, and wind speed and wind direction at 10 m above the surface were compared with the observation data from the National Climate Data Center within the Yangtze River Delta (YRD) area. There were 54 observation sites available in the YRD region. The average observation, average prediction, mean bias (MB), gross error (GE), and root mean square error (RMSE) were calculated for temperature, relative humidity, wind speed, and wind direction, as shown in Table 1. Temperature is overpredicted with an MB of 0.95 K, which is slightly higher than the recommended benchmark (≤ ±0.5 K) by Emery et al. (2001), but the GE of 2.1 K is close to the benchmark value of < 2.0 K. Wind speed is well predicted with the MB and RMSE lower than the benchmarks (MB ≤ ±0.5 m s -1 , RMSE < 2 m s -1 ), but the MB and GE of wind direction are approximately 22% and 16% higher than their respective benchmarks (MB ≤ ±10°, GE < 30°), likely caused by lower grid resolution (36 km) used in this study than the 12 km resolution grid cells used in developing the benchmark. In addition, the benchmarks were developed using observation nudging, which was not applied in this study. In general, the WRF performance statistics in this study is comparable to other studies using WRF in China simulations (Wang et al., 2010;Zhang et al., 2012;Hu et al., 2015a;Hu et al., 2016).
3.1.2 Evaluation of predicted PM 2.5 and its chemical components Hourly model predictions with the MEIC and REAS3 emission inventories were evaluated by comparing with the observations, as shown in Fig. 1. Generally, the predicted PM 2.5 and its components (at the grid cells where the monitors are located) with either emission inventories agree with the observations with mean fractional bias (MFB) values of 0.00 and 0.34 for the MEIC and REAS3, respectively. The MEIC predictions also have a slightly lower mean fractional error (MFE) than REAS3 (0.41 vs. 0.52). In addition, the high concentrations of total PM 2.5 over 100 µg m -3 on November 20, 25, and 27-30 are well captured. However, the model overpredicts the PM 2.5 concentrations on November 30 for both simulations. The overprediction is associated with the underprediction of wind speeds during calm conditions around Shanghai, which causes the overaccumulation of pollutants.
The sulfate aerosols are well-predicted using the MEIC emission inventory without a significant bias (MFB = 0.11) but are over-predicted with the REAS3 inventory (MFB = 0.48). This is because the REAS3 inventory SO 2 emissions are 20-30% higher than those based on MEIC inventory in the Shanghai urban area, and the SO 2 emissions in the surrounding areas are 1.4-2 times those estimated in the MEIC inventory. The higher SO 2 emissions from REAS in China were reported in the previous studies (Saikawa et al., 2017;Wang et al., 2018b). The observed concentrations of nitrate and ammonium secondary aerosols are well reproduced by both MEIC and REAS3 based emissions.
Observed PM 2.5 EC concentrations are between 0-4 µg m -3 . Both MEIC and REAS3 inventories lead to over-predictions of EC with MFB larger than 0.6. EC emissions were likely overestimated in both inventories, and further investigations are needed to understand the cause of the overestimations of EC in the emission inventories. Predicted EC concentrations show strong spatial gradients as indicated by the large ranges based on the predictions within the 3 × 3 grid cells with the monitor station grid cell at the center, as it is primarily from vehicle emissions in urban areas. Uncertainties in the predicted wind speed and direction could also cause large errors in the predicted concentrations. The PM2.5 OC predictions also compare well with the observations with MFB values of -0.20 and -0.08 based on the MEIC and REAS3 emissions, respectively, and the MFE values are less than 0.5 for both predictions. Over-predictions of EC are not expected to affect model predictions of SOA because semi-volatile SOA is assumed to form through absorption partitioning in the organic phase.
The positive matrix factorization (PMF) analysis of the AMS data collected during this period provided an estimation of the primary organic carbon (POC) and SOC. The details of the sampling and data analysis were reported by He et al. (2020). As shown in Fig. 2, both MEIC and REAS3 emissions lead to reasonable predictions of POC, as indicated by the MFB (-0.33 for MEIC and 0.17 for REAS3) and MFE (0.58 for MEIC and 0.54 for REAS3) values. Predicted SOC using MEIC does not have an overall bias using data for the entire month (MFB = -0.06). However, it significantly overpredicts the SOC by more than 25 µg m -3 at the end of November. The model with REAS3 emissions lead to a lower predicted SOC with the MFB of -0.52 but better captured the high concentrations at the end of November.
In summary, PM 2.5 predictions from both inventories reasonably agree with observations. According to the benchmarks derived from regional modeling studies in China by Huang et al. (2021), the MFBs of inorganic aerosols are all within the model performance criteria and most of the MFEs reach the more stringent performance goals. Although sulfate aerosols with REAS3 emission is over-predicted, it still meets the criteria of MFB < ±50% and MFE < 75% (Huang et al., 2021). The good agreement between the predicted and observed secondary inorganic aerosols suggests that the model can reproduce the oxidation capacity of the urban atmosphere in this region (Feng et al., 2021). The better agreement between the PMF resolved SOC and the predicted SOC with the MEIC inventories could imply that the MEIC inventory is more appropriate for SOA predictions, but it is necessary to compare the precursor predictions to confirm this.

VOCs model performance
Volatile organic compounds are direct precursors to SOA and affect the OH radical concentrations. Hourly concentrations of 115 individual VOCs monitored at SAES station for November 11-20 were obtained to evaluate the model performance. The measured VOC species were lumped to match the species in the SARPC-11 and compared with predictions using REAS3 and MEIC emissions, as shown in Fig. 3. While the predicted VOC concentrations using both inventories have some success in matching the observations and reproducing the day-to-day variations of the concentrations, neither emission inventory gives satisfactory results for all species.
The ALK1-5 species represent the alkanes and other non-aromatic compounds with increasingly higher OH reaction rate constants. The predicted ALK2, which mostly includes the less reactive short-chain alkane species, shows relatively good agreement with observations. Both MEIC and REAS3 lead to similar higher predictions of ALK4-5, mainly long-chain alkanes, than the observations by 2-5 times. This overprediction might partially be caused by the fact that the measurements did not have all the species included in the emission inventories. For the other two groups of ALK species, REAS3 predicts better for ALK1 (ethane), but MEIC predicts better for ALK3. Ethene (ETHE) and OLE1-2 (olefin species with increasingly higher OH reactivities) are better predicted with REAS3 emissions, but all are significantly overpredicted with MEIC.
For the lumped aromatics species ARO1 (mostly toluene) and ARO2 (mostly xylene), the predicted concentrations with REAS3 show much better agreement with observations (between 70% and two times) than those from MEIC, which are several times higher than the observations. Large over predictions of aromatics in MEIC was not expected as previous modeling studies using an earlier version of MEIC showed relatively good agreement with observations in Nanjing in August 2013 (MFB = -0.63-0.77)  and from June to August 2014 (NMB = 0.2) (Huang et al., 2016). However, since previous studies were for the summer months, it is possible that the seasonal variations in the emissions were not properly captured in the MEIC inventory.
At the grid of SAES site location, isoprene (ISOP) is mostly from anthropogenic emissions, as shown in Table S1, and is well predicted with the MEIC inventory. The VOC profiles used to speciate REAS3 emissions might have used lower isoprene emission factors. Consequently, methacrolein (MACR) and methyl vinyl ketone (MVK), which are major oxidation products of isoprene, are better predicted with MEIC. Methyl ethyl ketone (MEK) is also better predicted with MEIC as it is an oxidation product from several VOCs, including MVK and MACR. The other oxygenated species, acetone (ACET), acetaldehyde (CCHO), and higher aldehydes (RCHO), which have both primary emissions and secondary formations, are reasonably predicted with both inventories.
Although neither inventory generates perfect VOC estimations, the REAS3 inventory's better predictions of the gas phase aromatics provide more confidence in the SOA predictions than the SOA predicted by the MEIC inventory. Therefore, the MEIC-predicted SOA should be considered an upper limit of the SOA from the aromatic compounds.

Compare Observed SOA Tracers with Modeled SOA
The concentrations of SOA tracers (i.e., pinic acid, 3-MBTCA, and DHOPA) were measured hourly at SAES from November 9 through December 1, 2018 . The total concentrations of the two α-pinene tracers (α-pinT) were compared with the predicted monoterpene SOA, and the DHOPA concentrations were compared with the total SOA (including semi-volatile components, oligomers, and surface uptake products from GLY and MGLY) from ARO1 and ARO2, as shown in Fig. 4. Generally, the predicted SOA has similar day-to-day variations with the SOA tracers. High concentrations of DHOPA in the range of 0.015-0.02 µg m -3 occurred on November 20 and the last several days of November. High concentrations of the model predicted aromatic-SOA (~10-30 µg m -3 ) were also predicted for these days. The α-pinT concentrations were in the range of 0.001-0.08 µg m -3 , while the modeled monoterpene SOA concentrations reached 0.35 µg m -3 and 0.6 µg m -3 with REAS3 and MEIC emissions, respectively.
The precursor-specific SOA were much higher and likely overestimated on November 29 and 30, along with other PM species such as SOC and POC (see Figs. 1 and 2). These data points may significantly affect the analysis of the linear correlations between observed SOA tracers and modeled specific SOA and were excluded for a proper estimation of the relationship between modeled SOA and observed tracer concentrations shown in Fig. 5. The predictions with REAS3  and MEIC emissions show strong correlations with the detected SOA tracers. The correlations between hourly predicted monoterpene SOA concentrations and measured corresponding tracer α-pinT concentrations (R = 0.6-0.65) are slightly higher than those between monoaromatic SOA and DHOPA (R ~0.6). The correlations are increased for daily averaged SOA predictions and the corresponding SOA tracers, with R ~0.8 for aromatic SOA and R > 0.8 for monoterpene SOA. The daily average correlations are improved likely by smoothing out the difference in the formation timescales of the tracers and other major SOA components.
The tracer-to-SOA mass ratios, representing the mass fraction of the precursor-specific SOA tracers in the SOA derived from the precursors, were determined using linear regression with forced zero intercepts (Fig. 5). A robust linear regression method was used to reduce the impact of outliers, and the bootstrap technique (Hesterberg, 2011) was used to determine the uncertainties in the slopes. For aromatic SOA, the tracer-to-SOA mass ratio is 0.00140 ± 0.00006 based on hourly SOA from REAS3. The slope derived from the data with MEIC emissions is significantly lower (0.00053 ± 0.00003) caused by higher SOA predictions due to the overestimation of ARO1 and ARO2 concentrations. The linear regression slopes between the model predicted monoterpene-SOA with REAS3 and MEIC emissions and the measured α-pinT tracers are 0.2042 ± 0.0193 and 0.1345 ± 0.0095, respectively, which are both close to the mass fraction of α-pinT to the α-pinene-SOA of 0.1680 ± 0.0081, as suggested from the previous chamber study (Kleindienst et al., 2007) and applied by He et al. (2020). The detailed regression slope and uncertainties are summarized in Table 2.

Impact of Non-volatile SOA Components to Tracer-to-SOA Ratio
The predicted SOA used in the previous analyses includes semi-volatile components based on equilibrium gas-to-particle partitioning, oligomers from semi-volatile products, and SOA from irreversible surface uptake of GLY and MGLY. The significance of the surface uptake of GLY and  Kleindienst et al. (2007) Applied in He et al. (2020) * SOA formed from ARO1 and ARO2, including the semi-volatile SOA, oligomers, and glyoxal and methylglyoxal SOA products. £ SOA from ARO1 and ARO2, only including the semi-volatile SOA. ^ Based on the average DHOPA-to-SOA ratio of benzene, toluene, ethylbenzene, o/m/p-xylenes and 1,3,5-and 1,2,4-trimethylbenzene with NO x , as reported in Al-Naiema et al. (2020). # SOA formed from monoterpenes, including SSOA, oligomers, and glyoxal and methylglyoxal SOA products.
MGLY on the SOA formation was discussed in previous studies (Fu et al., 2008;Ying et al., 2014;Qiu et al., 2020). As the relative humidity was quite high in winter, significant contributions of GLY and MGLY to SOA were predicted. Fig. 6 shows the high monthly averaged total aromatic SOA concentrations in China for November 2018. The model with MEIC emissions predicts total aromatic SOA concentrations to be approximately 10-15 µg m -3 in central and eastern China, and the REAS3 emission inventory, with lower emissions of aromatics, predicts lower aromatic SOA, approximately 5-10 µg m -3 . However, the fraction of GLY and MGLY SOA in total SOA predicted by the two emission inventories is similar. Fig. 6 also shows that GLY and MGLY SOA has the highest contributions to total aromatic SOA. At the grid cell where the SAES monitor is located, semi-volatile SOA and its oligomers combined have concentrations of 1.48 and 4.01 µg m -3 for REAS3 and MEIC emissions, respectively. The GLY and MGLY SOA at the same grid cell is 1.51 µg m -3 based on REAS3 and 3.62 µg m -3 based on MEIC, as high as the SOA predicted from the traditional pathways. However, the chamber experiments used to measure the tracer-to-SOA ratio were typically operated under much lower RH (e.g., Al-Naiema et al., 2020), and the GLY and MGLY contributions to SOA in these chambers were expected to be very small. In addition to GLY and MGLY contributions, the  Oligomers formed from semi-volatile products contribute as much as the semi-volatile products to the aromatic SOA. However, oligomer formation in the chamber experiments was usually small due to a short detention time of several hours. Thus, the chamber determined ratio might only be good for the estimation of the semi-volatile aromatic SOA components.
The DHOPA tracer-to-SOA ratios were recalculated using predicted aromatic SOA without the oligomers and GLY and MGLY components to evaluate the predicted semi-volatile aromatic SOA with DHOPA. As shown in Fig. 7, excluding the non-volatile SOA components does not significantly influence the correlation coefficients between the predicted aromatics SOA and measured DHOPA concentrations. However, the mass ratio of DHOPA to aromatic SOA with MEIC and REAS3 emissions are increased by a factor of 3-4 (0.00161 ± 0.00014 and 0.00553 ± 0.00032, respectively), indicating that the GLY and MGLY formed SOA accounts for a very significant fraction of total SOA derived from aromatics and that fraction remains relatively constant. The REAS3 ratio is closer to the mass fraction reported by Kleindienst et al. (2007), which has been more broadly used in most aromatic SOA estimation (Ding et al., 2014;Shen et al., 2015;Gao et al., 2019). It is also in better agreement with those reported by Al-Naiema et al. (2020) for toluene. Since the predicted precursor ARO1 and ARO2 concentrations with the REAS3 inventory also generally agree with the observations, this suggests that the semi-volatile aromatic-SOA can be reasonably predicted by the SOA mechanism in the regional model if precursor emissions were estimated correctly.

Uncertainties in the Tracer-to-terpene SOA Ratio
The AERO6 module in CMAQ uses the traditional Odum 2-product model for monoterpene SOA predictions (Odum et al., 1996). The newly released AERO7 module replaced the original monoterpene-SOA yield parameters with the volatility basis set (VBS) fit based on the recent experimental study by Saha and Grieshop (2016). The semi-volatile products were lumped into seven log-10-spaced bins based on saturation mass concentration (C*) from 10 -2 to 10 4 µg m -3 . The enthalpy of the VBS products (∆H vap,i ) was estimated using ∆H vap,I = 80 -100(log 10 C i * ), which is based on linear regression of the chamber data. As seen in Fig. S2, at the standard temperature of 298 K, the monoterpene-SOA yield is higher than the 2-product representation when the total organic aerosol concentration (C OA ) is less than ~27 µg m -3 , and it is lower than the 2-product yield with higher C OA .
To check if this new model representation of monoterpene SOA can lead to significant changes in the estimated SOA and the tracer-to-SOA ratios, the CMAQ model was modified to include this new representation, and an additional simulation was conducted. The results were compared with the results from the original AERO6 module. As shown in Fig. S3, the updated VBS-style monoterpene SOA parameterization led to slightly higher SOA under low concentrations but lower SOA under high concentrations. As a result, the difference in the ratio of α-pinT to predicted TERP SOA is negligible.
While the α-pinT to predicted monoterpene SOA ratio is in good agreement with the reported tracer-to-α-pinene-SOA ratio, two additional factors should be further discussed. First, the tracer mass fraction reported in the previous chamber study of Kleindienst et al. (2007) was calculated based on the sum of 7 α-pinene SOA tracers to α-pinene SOA, but only two tracers (pinic acid and 3-MBTCA) were measured in this study. Thus, the calculated α-pinT to SOA ratio in this study should be increased to directly compare with the data from Kleindienst et al. (2007), but the exact amount of adjustment is difficult to determine because the seven tracers have different concentrations and their concentrations in the ambient air are different from ambient conditions, as reported by Kleindienst et al. (2007). Second, only a fraction of the monoterpene SOA is αpinene-SOA concentration, as α-pinene is lumped with other monoterpenes in the model and the SOA yields of these individual monoterpene species are not the same . Consider the two factors, the α-pinT to modeled α-pinene SOA ratio should be higher than α-pinT to modeled monoterpene SOA ratio, suggesting that the SOA from monoterpenes was likely underestimated in the model, either due to underestimation of emissions or the SOA yields.

Separate DHOPA from ARO1 and ARO2
In a recent chamber study on the aromatic SOA tracers, Al-Naiema et al. (2020) reported the mass fraction of DHOPA to the SOA from major aromatic precursors. The mass fraction of DHOPA to the toluene SOA under high and low NO x conditions are f tol-nox = 0.0032 ± 0.0004 and f tol-Hox = 0.0068 ± 0.0008, respectively. The isomers of xylenes (o/m/p-xylenes) were tested individually in the chamber with NO x present, and the average DHOPA mass fraction for the three xylene isomers is f xyl = 0.0033 ± 0.00024. Using predicted high-NO x and low-NO x SOA from ARO1 and total SOA from ARO2, and the literature reported DHOPA to SOA ratio for toluene and xylene, which are the two most abundant species in ARO1 and ARO2, respectively, the amount of DHOPA from ARO1 and ARO2 were estimated, as shown in Fig. 8. The estimated DHOPA based on predictions with REAS3 emissions generally agrees with the observed hourly data from the SAES site ( Fig. 8(b)), while the predictions using MEIC emissions are significantly higher (Fig. 8(a)). Approximately half of the predicted DHOPA is from ARO1 under high-NOx conditions, and the remaining is from ARO2.

CONCLUSIONS
The predicted hourly aromatic and monoterpene SOA strongly correlates with the hourly tracers DHOPA (R ~0.6) and α-pinT (R ~0.6-0.65). The correlations become stronger when daily average concentrations are considered, R ~0.8 for aromatic SOA and R > 0.8 for monoterpene SOA. The mass fraction of hourly and daily DHOPA is in the range of 5-6 × 10 -3 when SOA components from oligomers, glyoxal (GLY), and methylglyoxal (MGLY) are excluded. This ratio is close to the toluene mass fraction of DHOPA in aromatic SOA reported in the literature. This suggests that the CMAQ model can predict the semi-volatile aromatic SOA reasonably well. The mass fractions of hourly and daily α-pinT to the monoterpene SOA with REAS3 and MEIC emissions fall in a range of 0.13-0.25, similar to the reported α-pinT to α-pinene SOA mass fraction of 0.168. However, since α-Pinene is only one of the lumped monoterpenes, and the α-pinT used in this study did not include all tracer species measured in the chamber experiments from which the ratio was determined, a future study should individually track the emissions of major monoterpene species and the SOA formation from them so that a more detailed evaluation of the modeled biogenic SOA can be performed.