A Method for Automated Estimation of Parameters Controlling Aerosol New Particle Formation

We use a previously published simulator of aerosol new particle formation (NPF) events as the core component of the new method for automated estimation of its input parameters. The simulator has a large number of physical input parameters (e.g., nucleation and growth rates), and it yields results about the dynamics of particle concentrations and their size distributions as its output. Our new method automatically solves the inverse problem. It estimates the physical input parameters by recurrent simulations according to the criterion of best fit of the simulated output parameters with the known data. The method is implemented as a computer program that includes the previous simulator as a subroutine. The method enables the estimation of several physical parameters, e.g., the nucleation rate, particle growth rate, contribution from ion-induced and neutral nucleation, and particle charging state. The program was tested using the results of two NPF events measured at Hyytiälä, Finland. The measured time series of the concentration of atmospheric ions and nanoparticles in the diameter interval of 2.8–8.6 nm were well reproduced by our method. The estimated values of the analyzed input parameters are in agreement with the results known from many other studies carried out at Hyytiälä. The proposed method of inverse simulator may appear useful in the analysis and interpretation of air ion and atmospheric aerosol measurements during new particle formation events.


INTRODUCTION
The bursts of charged atmospheric nanoparticles or intermediate air ions were discovered through measurements at Tahkuse Observatory in 1985 (Tammet et al., 1987;Hõrrak et al., 1988;Tammet et al., 1988).However, they remained unnoticed for the next ten years because at that time, air ion mobility spectrometers were the only devices able to measure intermediate air ions in the size range of 1.5-7.4nm.After 1995, the study of nucleation bursts of atmospheric nanoparticles got a fresh impetus because of their potential relationships with the Earth's climate (Mäkelä et al., 1997;Weber et al., 1997;Hõrrak et al., 1998;Kulmala et al., 2004;Hirsikko et al., 2011).The whole process that starts with nucleation burst and continues with particle evolution is usually called a new particle formation (NPF) event.Several hypotheses and models have been developed for the explanation of this phenomenon.Despite a large number of observed NPF and numerous hypotheses or theoretical models, the exact mechanisms of atmospheric new particle formation are not yet completely known (Kulmala, 2003;Iida et al., 2008;Vana et al., 2008;Leppä et al., 2009;Nieminen et al., 2011;Leppä et al., 2012;Wang et al., 2012;Zhang et al., 2012;Kulmala et al., 2013).
An overview of different aerosol evolution models can be found in the introduction of a paper by Leppä et al. (2009).We employ the NPF model by Tammet andKulmala (2005, 2007).The program that implements this model is hereafter referred to as "burst simulator" or "simulator".The burst simulator was designed for modeling the evolution of neutral and charged nanometer particle size distribution in the size range of about 1-80 nm.The simulator takes into account a number of various physical processes and relationships that influence the generation and growth of aerosol particles in the atmosphere.The simulator takes a list of input parameters (e.g., nucleation and growth rates) and yields the values of several measurable physical parameters characterizing the NPF event (output parameters, e.g., time evolution of particle size distributions).A relatively new and more complicated problem is the solution of inverse problem or inverse modeling or the fitting of simulator output to the actually measured values (time series) of aerosol parameters.The inverse modeling yields a set of (estimated) input parameters that are important for aerosol evolution but can not be known elsewhere.
The main goal of the present study is to develop and test a method for the inverse application of the simulator by Tammet andKulmala (2005, 2007).In this method, we search for the values of the simulator input parameters that result in the output of the simulator close enough to the known data (e.g., to the measured time series of the particle size distribution).The implementation of the method is called the NPF parameter estimator program software or briefly the estimator.The solution of the inverse problem is hereafter also called "estimated parameters" and the process of solution is also called "estimation".A preliminary attempt to apply the burst simulator in this way for a specific NPF event was presented by Salm (2007).Such a study inevitably needed numerous forward simulations, visual comparisons of diagrams and many manual interventions, which were rather time-consuming.In addition, the role of subjectivity is difficult to determine.Despite these shortcomings, several semi-visual techniques are in use (Vana et al., 2004;Dal Maso et al., 2005;Hirsikko et al., 2007a;Vana et al., 2008).Recently, certain algorithms for the automation of the calculations were developed (Salm et al., 2012); the algorithms are considerably more efficient than the former method with manual interventions.As demonstrated by Salm (2007) and Salm et al. (2012), a proper use of the burst simulator for solving the inverse problem can yield several essential parameters that characterize the simulated NPF events (nucleation and growth rates etc.).In case the solution of the inverse problem can be realized in the form of an automated estimator, the estimator can be extensively used to evaluate several essential characteristics of the NPF events that are difficult to measure (nucleation and growth rates, etc.).
The specific objectives of the present study are as follows: 1.To develop a new method and algorithms that enable automated estimation of the input parameters of the burst simulator by the criterion of the best similarity of the output parameters to the corresponding measured (known) parameters, and to integrate the existing burst simulator and these new algorithms into a NPF parameter estimator.
2. To obtain the estimated sets of the new particle formation process characteristics for two NPF events recorded at Hyytiälä from 7 April to 8 April 2000.
3. To analyze the results of the estimation for these two NPF events and to get information about the performance of the estimator.

Simulator
The method and tool for the simulation of NPF events were introduced by Tammet andKulmala (2005, 2007).The tool enables the modeling of nanoparticle evolution, e.g., the evolution of the particle size distributions during a diurnal cycle.The evolution is simulated using a sectional model with several thousands of sections.The model considers generation of small ions, ion-induced and homogeneous nucleation, depletion of nanoparticles and ions on the pre-existing aerosol.The growth of the fresh particles formed by nucleation is described as the coagulation of the particles with parametrized growth units, but the mutual coagulation of fresh particles is neglected.We call this particular tool the burst simulator or simulator.It has 61 physical input parameters (also called input values); details are available in Supplement, section "Description of the Simulator".Values for certain input parameters are usually available from measurements, and certain others can be estimated according to a priori available information.For the group of remaining parameters the default values, included in the simulator, can be used.The estimated values may be constrained to lie between a predetermined range of values, due to the reason that we can presume the range where these values are allowed to vary.The calculations will result in a time-dependent set (or a time series) of the values of selected output parameters.For example, the set of selected parameters can contain the mobility/size distribution within a specific size region and/or the concentration of cluster ions.In mathematical terms, such a set is a table or a matrix, where each particular row contains the full set of the parameters at a particular time point.

Algorithm for Calculating the Estimates of NPF Parameters
Formerly, the search for the appropriate input parameters was accomplished by a visual comparison of the graphs generated on the basis of the output parameters of the burst simulator with the graphs drawn on the basis of measurement results (Salm, 2007).For the present study, we developed an estimator that includes both the simulator and the new algorithms, and it executes automated estimation of simulator input parameters (or solves the inverse problem).The flowchart of the estimation algorithm is depicted in Fig. 1.
In the process of estimation, the simulated time series (output parameters) are compared to the corresponding measured time series (known data).For the numerical comparison of the series, we have to define a measure of similarity between these time series.In this study, we use the sum of the squares (SUM j and SUM c in Fig. 1).If the similarity is not sufficient, then the values of free input parameters are varied, and a next simulation trial is started etc., in order to reach a sufficient similarity.
Let us consider the physical input parameters p i of the simulation tool (61 parameters) and let us call the full set {p i } the input vector (this vector is the set of 61 parameters).The estimation algorithm starts with the initial values of the input parameters, p i , and with the initial values of the increments, Δ i .Then it enters the estimation loop that will be executed repeatedly until the estimation converges.Every single reiteration of the full loop is called an estimation step.
Every estimation step starts with the generation of the new input vectors {p i } by variation of the values of the selected input parameters p i by the values of the increments Δ i .A particular variation, Δ k , depends on the corresponding parameter and on the constraints described below.The value of Δ k can decrease within the calculations (details in Fig. 1).In every new input vector {p i } only one particular p k is modified and all other p i≠k are equal to their initial Set initial values of the input parameters p i and the values of the increments Δ i , i = 1…61.Also, compute the initial value of the difference between the model output and the reference, SUM 0 .Usually initial Δ i = 15% from particular p i .
Repeat for all input parameters p k , k = 1…61; results in J new modified input vectors: For every case from the J ones create a new vector of parameters p i , i = 1…61, where all p i are equal to their initial values.Modify just one particular parameter p k by Δ k .The number of vectors J can be up to 2 × 61 = 122, but only in the case when all Δ k > 0.
Simulate the set of all modified input vectors (J individual simulations), it results in corresponding time dependent tables of the output parameters, one output table per one input vector (altogether J output tables OUT j , j = 1…J).
If Δ k > 0, save the modified vector of parameters p i , i = 1…61.Compare all J output tables OUT j,m,n by the reference table REF m,n , using the sum of the squares SUM j = ∑ m,n (OUT j,m,n -REF m,n ) 2 , j = 1…J.
If SUM j > SUM 0 , reduce the increment Δ j of the corresponding input parameter.If SUM j ~SUM 0 , set Δ j = 0 (cancels the corresponding parameter).
Simulate the set of the modified input vectors, it results in corresponding tables of output parameters (altogether 31 output tables OUT c , c = 1…31).
Find five smaller SUM j and mark the five corresponding input vectors.
Compose the combinations of the five input vectors, marked just above; results in a new set of modified input vectors, altogether in 31 new input vectors.
Compare all 31 output tables OUT c,m,n by the reference table REF m,n , using the equation If SUM c < 0.99 × SUM 0 , mark the corresponding input vector as the new initial vector of input parameters and set SUM 0 = SUM c , otherwise finish.values.The total number of the new input vectors, J, is equal to the number of the applied variations.Next, the algorithm simulates all the J particular cases, and for every particular case computes the corresponding measure of similarity, SUM j .
After that the five best cases that are characterized by the smallest measures of difference are selected and all the possible combinations from the corresponding five input vectors are calculated (an example is provided in the Supplement, section "Algorithm for calculating the estimates of NPF parameters").Next, all the particular cases determined by these combinations of the input parameters are simulated.Finally, the combination, characterized by the smallest difference, SUM c , determines the new input vector and the next estimation step is ready to be executed again.Before each successive estimation step, the current best measure of similarity, SUM c , is compared with that of the previous step, SUM 0 .If SUM c has diminished by less than 1% in comparison with SUM 0 , then the solution is considered converged and the calculation ends.
From our computational experiments, the combinations of the five best cases are found to be optimal to achieve a rapid and reliable convergence of the simulations.Commonly, the solution converged within about ten estimation steps (about 1000 model runs).During numerous individual simulations within estimations (during solution of the inverse problem), the estimator always converged to a solution at a local optimum fit to measurement values that proves the stability of the algorithms.Our algorithm cannot guarantee that the solution corresponds to the global optimum.Observing the rules, the global optimum can be achieved by evaluating the results obtained by all possible combinations of all values of the input parameters but this is a practically unrealizable task.Nevertheless, our computational experiments demonstrated that when we applied certain constraints, the solution always converged close to certain values.The applied constraints are discussed in Section "Prior information and parameter constraints used in the estimations".The section "Results and discussion" contains an analysis of the results, obtained while testing the estimator.

Summary of the Data Used for Testing the Estimator
The site of the measurements was the SMEAR II station, Hyytiälä, Southern Finland (61°51′N, 24°17′E, 180 m above sea level) (Vesala et al., 1998;Hari and Kulmala, 2005).
The instrumentation and certain results from this measurement campaign have been published earlier (Mäkelä et al., 2003;Smirnov et al., 2004a, b).Of the stationary facilities of the SMEAR station, mainly atmospheric and aerosol instrumentation was used for this research (Vesala et al., 1998).The main novel apparatus was a set of two identical differential mobility particle sizers (DMPS) measuring the aerosol particle size distribution in the range of 2.8-500 nm with different pre-charging conditions.One of these DMPSs operated in the customary mode: the aerosol particles were charged in a bipolar charger (or neutralizer).The other DMPS was identical, but without the charger; thus it measured only naturally charged aerosol particles.In both DMPSs, negatively charged particles were analyzed according to their mobility by means of differential mobility analyzers (DMA).Later, this apparatus was modified and called Ion-DMPS (Gagné et al., 2008).The concentration of negative cluster ions was measured by means of 4 integral air ion counters UT8401 designed and built at the University of Tartu.Similar measurements of positive cluster ions were performed at the same place during the BIOFOR III campaign in 1999 (Aalto et al., 2001;Hõrrak et al., 2008).
The main measurement campaign lasted from 29 March to 14 April 2000.According to the classification introduced by Dal Maso et al. (2005), Class 1a, 1b, or Class 2 NPF events occurred on 7 days.
For testing the estimator, we selected measured data from two NPF events with different features, called Case 1 and Case 2.
Case 1: This case occurred on 7 April and it was a NPF event of Class 1a.Fig. S1(a) depicts the measured diurnal evolution of the particular fractions.Fig. S1(b) depicts the diurnal evolution of a particle size distribution.Figs.S1(a), S1(b) are presented as illustrations.The detailed data for the specific time interval, which was used to test the estimator, are presented in Fig. 2(a).This figure contains the simulation results and also a copy of the data, which are depicted in Fig. S1(a).The meteorological and background aerosol parameters were directly measured and they are employed as the fixed values of the corresponding input parameters of the simulator (Table S1(a)).The concentrations of nanoparticles and cluster ions are also known from the measurements, but these parameters are used as test data (Table S1(b)).Our additional study of air mass trajectories by the SILAM model of atmospheric composition and dispersion (http://silam.fmi.fi/)showed that the air measured at Hyytiälä during this event originated from the Arctic and traveled directly to the Hyytiälä site.The local wind direction during the event was approximately from the north.
Case 2: An event of Class 2 occurred around noon on 8 April 2000 (Figs.S2(a) and S2(b)).The corresponding input and test data are presented in Tables S2(a) and S2 (b).The detailed data for specific time interval, which was used for test purposes, are presented in Fig. 2(b).The air mass during the event originated from the Arctic, but passed via Karelia prior to arriving to the station.The local wind direction during the event turned from the northeast to the southeast.
The test concentrations of negatively charged nanoparticles in the size range of 2.8-8.6 nm and cluster ions were directly measured during the campaign, but the concentrations of neutral nanoparticles were calculated on the basis of the measured data of bipolar charged (or neutralized) particles.The particular algorithm of these calculations is presented in Appendix S1.Supplement section "Characteristics of test data" contains certain additional details.

Prior Information and Parameter Constraints Used in the Estimations
By default the initial values of input parameters could be established according to earlier attempts by Salm et al. (2012) and the algorithm could freely vary the parameters within the  The test concentration of negative cluster ions was directly measured, but the one for positive ions was not.However, there are weighty arguments to assume that the concentrations of negative and positive cluster ions are almost equal with a small excess of positive ions.First, the cluster ions are generated in pairs.Secondly, the mutual recombination occurs in pairs.Several measurements demonstrate that the concentrations are almost equal and well correlated with each other (Hõrrak et al., 2000;Hirsikko et al., 2005;Kulmala et al., 2005;Komppula et al., 2007).On the basis of these considerations, it was assumed that the test concentrations of positive small ions can be calculated from the measured concentrations of negative ions by n+ = 1.1•n-,where n+ and n-refer to the test concentrations of positive and negative cluster ions, respectively.The factor 1.1 is based on the long-term measurements at Tahkuse Observatory (Hõrrak et al., 2000).
The meteorological and background aerosol parameters were fixed at the values based on direct measurements.They were either constants or their time-variations were given by certain parameterizations (Tables S1(a) and S2(a)).
If there is weighty prior information about the probable values of certain input parameters, then it is rational to use such information, in order to avoid potential ambiguity of solution of the inverse problem.We fixed certain parameters according to previous studies by Tammet andKulmala (2005, 2007).Thus the recombination coefficient of cluster ions was assumed 1.6•10 -6 cm -3 /s.Certain microphysical parameters that determine the collision interaction between molecules, ions and particles as explained by Tammet and Kulmala (2005) were fixed in simulations: the density of cluster ions 2 g/cm 3 , the polarizability of the first growth units 149 Å 3 , the polarizability of the second growth units 32 Å 3 , the extra distance of Van der Waals capture 0.115 nm, the Nadykto-Yu enhancement factors of dipole moment on the coagulation coefficient at two fixed sizes of 1 and 2 nm are both equal to 1, the critical size of quantum rebound 2.75 nm, the temperature of quantum rebound 500 K, and the power of nano-Köhler approximation 2. The nano-Köhler model describes the initial growth of nucleated particles; in the first phase only by the condensation of sulphuric acid until the particles reach a diameter where low volatile organic vapors can start contributing to their growth.Polarizabilities could help to identify the substance that is responsible for the particular NPF.Because of that, these parameters should in principle be free.Nevertheless, according to our simulation experiments, the burst simulator is rather insensitive to the values of polarizabilities.Therefore, the estimator cannot be driven effectively by these parameters and we used the fixed values, proposed by Tammet and Kulmala (2005).
In the simulator, conifer forest is characterized by six independent input parameters (air residence distance in forest; initial, midpoint and final wind speed in forest; conifer needle diameter and conifer needle length in a unit volume).Variations in some parameters can be compensated by variations in others; therefore the estimated set of these parameters can become somewhat arbitrary.In order to avoid this, we fixed the parameters we could evaluate from elsewhere.On the basis of optimizations and direct measurements of the needles, we evaluated that the average conifer needle diameter could be 1.5 mm and then we fixed it.The preliminary estimations of the NPF event on 8 April resulted in a value for the conifer needle length in a unit volume (m/m 3 ) equal to 155.We regarded this value as plausible and further fixed it.We retained the parameter "air residence distance in forest" that characterizes how many trees an air parcel meets when moving through the forest.Following a moving air parcel, the features of the forest it passes can depend on the wind direction.Therefore, in case the wind blows from a particular direction, the values of the "air residence distance" can differ from the value for the case when the wind blows from another direction.The values of the parameter obtained for different days (different wind directions) can be compared and evaluated in respect to the features of the forest in the particular directions.In Tammet and Kulmala (2007) the term "air residence time in forest" is used.If we multiply the residence time by the wind speed, then we obtain the residence distance.The wind speed was measured at a certain point during the measurement campaign.However, the wind speed as a parameter in the simulator has a somewhat different meaning: it is rather an average wind speed within a large forested area.Therefore we kept it as a free parameter.
The burst simulator is sensitive to the values of ionization rate.In accordance with the preliminary calculations, the estimation can result in diverse values.However, the continental ionization rates are expected to be significantly higher than those found for the marine environment (about 2 cm -3 /s), even though several studies reported low values also for continental stations (about 2.5 cm -3 /s, Laakso et al., 2004).For that reason we set the constraint that the ionization rate cannot be below 4.0 cm -3 /s.This value matches well with the average values of 5.6 cm -3 /s and 3.9 cm -3 /s, found respectively for the same place at the 2 m and 14 m heights from the ground level by Tammet et al. (2006).

Summary of New Results
As a solution of the inverse problem, values of input parameters have been estimated (Table 1).Figs. 2(a) and 2(b) demonstrate that the shape, width and height of the time evolution curves obtained for charged and neutral particles Table 1.Results of the inverse solution of the NPF events recorded at SMEAR II Station, Hyytiälä, on 7 and 8 April 2000.The values of measured parameters are given in Tables S1(a), S2(a) and not listed here.The mark (*) at particular value means that this parameter value is fixed (the estimator cannot modify this parameter).The mark (**) means that the parameter values can be modified, but certain additional criteria are applied (details in text).Several parameters are given by the initial, midpoint and final values.The actual values of these parameters at particular time points in the simulation interval are set inside the burst simulator, according to the selected approximation function, details in Supplement.In Tammet andKulmala (2005, 2007) "midpoint" is referred as "halftime".Nucleation rates are given by their maximal values.The actual values of nucleation rate at particular time points in the simulation interval are set inside the burst simulator, according to the rules determined by the next 3 parameters (rise time etc.).The model contains two condensing substances, below denoted by words "first" and "second".The growth rates listed below are Knudsen growth rates and GR1 corresponds to the first substance and GR2 to the second substance.Details of the parameters are given in (Tammet andKulmala 2005, 2007)  in the diameter interval of 2.8-8.6 nm are comparable with those of the measured curves.This outcome proves the ability of the method to reproduce the general shapes of the measured curves.Since the burst simulator cannot take into account fast fluctuations in the concentration of aerosol particles due to atmospheric non-homogeneity and turbulent movements, these results cannot exactly match the measured curves.Therefore, the small fluctuating deviations of the simulated curves from the measured ones cannot be considered as a shortcoming of the estimator.The majority of estimated input parameters have time independent value for a particular NPF, whereas several parameters (ionization rates, nucleation rates, growth rates, wind speed in the forest) can vary during the event, and for these parameters the estimator yields the initial, midpoint and final values, which can be substantially different.In Tammet and Kulmala (2005) "midpoint" is referred as "halftime".
We can conclude that our methods could reproduce the general shapes of the measured curves.Next we discuss whether the solution of the inverse problem has resulted in reasonable values of the parameters that characterize the nucleation process.

Ionization Rate
We estimated 6 values of ionization rate for both particular NPF events.We obtained the initial, midpoint and final values for free air and the same three values for the forest environment.The values extend from 4.0 to 5.6 cm -3 /s (Table 1).These values coincide well with those found for the same place by Tammet et al. (2006).Alternative studies have shown that ion production rate at Hyytiälä, calculated by external radiation content, varied between 4.2 and 17.6 cm -3 /s while the lowest values were observed in spring (Hirsikko et al., 2007b;Franchin, 2009).Our test data were also obtained in spring when the soil was still frozen, and therefore, the proposed low values of ionization rate seem plausible.
Without the above-mentioned constraint that the ionization rate cannot be below 4.0 cm -3 /s, the estimation process sometimes resulted in the values of ionization rates of about 2.5 cm -3 /s, which are close to the values reported by Laakso et al. (2004), Hõrrak et al. (2008) and Schobesberger et al. (2009).However, several studies demonstrated that low ionization rates probably underestimate the effect of small ion sink.These arguments support the conclusion that the correct ionization rate should be higher (Tammet et al., 2006;Hirsikko et al., 2007b;Franchin, 2009).However, the problem of correct ionization rates is beyond this particular method and needs further independent study.

Nucleation Rate
The estimatied values of maximum nucleation rate (or the formation rate of the particles with the birth size, details in section "Birth size of particles") for charged particles were in the range from 0.05 to 0.19 cm -3 /s and the rate for neutral ones from 0.63 to 1.6 cm -3 /s (Table 1).Kulmala et al. (2013) reported that the nucleation rate is rather variable but during the days with active aerosol formation, the maximum formation rate of 1.5-nm particles approached roughly 4 cm -3 /s around noon, while the same rate for charged particles was about 50 times lower.This is in accordance with our simulations.However, we should take into account that our nucleation rate characterizes about 1 nm particles in diameter, the studies discussed next characterize about 3 nm particles.Dal Maso et al. (2005) analyzed the aerosol measurements at a boreal forest site at Hyytiälä, Finland, between 1996 and2003.The mean formation rate of larger than 3 nm particles was 0.8 cm -3 /s, varying from 0.06 to 5 cm -3 /s.Manninen et al. (2009) analyzed the measurements carried out during the EUCAARI field campaign at Hyytiälä in spring 2007.They evaluated that the median formation rates of all and charged particles with sizes about 2 nm were 0.65 cm -3 /s and 0.03 cm -3 /s, respectively.Vuollekoski et al. (2012) performed an analysis of a nucleation burst that was recorded at Hyytiälä on 1 April 2003.Different calculation methods gave the median formation rates of 3 nm particles 1.75 cm -3 /s and 1.13 cm -3 /s.In principle, the thermodynamically stable particles born with sizes of about 1 nm can vanish before they grow up to sizes of about 3 nm.Therefore, the nucleation rate, determined on the basis of the measurements of 3 nm particles, can be different from that for 1 nm particles.The survival percentage of 1 nm particles depends on several factors, primarily on growth rate and the parameters that determine the sink of the particles.Our evaluation demonstrated that at the conditions, characteristic for Case 1 and Case 2, the survival percentage of 1 nm particles should be at least 50%.Therefore, the nucleation rates obtained by the former studies, summarized above, are approximately comparable with those obtained in this study.
The parameters "rise time of nucleation activity", "time of steady nucleation activity", "time of dropping nucleation activity" in Table 1 characterize the time dependence of nucleation rate.In Case 1, nucleation activity increases slowly, has a constant maximal value for a short time, and decreases slowly.Case 2 is characterized by a rapid increase and decrease in nucleation activity and a long time while it has the top value.

Growth Rate
The burst simulator offers a possibility to take into account two kinds of growth units of condensing substances with different sizes, densities and polarizabilities.The growth unit is a molecule or cluster of a condensing substance as explained by Tammet and Kulmala (2005).The first one (corresponds to growth rate GR1) is considered as a nonevaporating substance responsible for the initial slow growth of particles (e.g., sulphuric acid).The second one (corresponds to GR2) is an evaporating substance (e.g., low volatile organic compounds) responsible for quicker growth of the particles that have passed the threshold size of the nano-Köhler model.The estimated threshold size of the nano-Köhler model (at this particle size GR2 becomes active) has values from 1.65 to 2.4 nm, which coincide well with the nano-Köhler threshold diameter of about 1.7 nm recently published by Kulmala et al. (2013).The estimated values of the initial, midpoint and final GR1 and GR2 extended from 1.2 to 6.4 nm/h (Table 1).
A comparison of the estimated GR1 and GR2 with certain results from previous studies is presented next.We compare growth rates for particles in diameter interval from 2. 8-8.6 nm. Dal Maso et al. (2005) analyzed the aerosol measurements at a boreal forest site at Hyytiälä, Finland, between 1996 and2003.They found that the mean growth rate of the nucleation mode particles was 3.0 nm/h, peaking in summer.Hirsikko et al. (2005) analyzed the measurements performed at Hyytiälä during April 2003-April 2004.They obtained a clear size-dependent growth rate: the rates for the size classes 1.3-3 nm, 3-7 nm and 7-20 nm were on average 1-2 nm/h, 2-4 nm/h and 4-5 nm/h, respectively.Manninen et al. (2009) analyzed the measurements carried out during the EUCAARI field campaign at Hyytiälä in spring 2007.The median growth rate of particles between 3 and 7 nm was 3.8 nm/h.Yli-Juuti et al. (2011) studied the diameter growth rates of nucleation mode particles during the years 2003-2009 at the SMEAR II station.They found that the median growth rates in the diameter ranges of 1.5-3 nm and 3-7 nm were 1.9 nm/h and 3.8 nm/h, respectively.Vuollekoski et al. (2012) performed an analysis of a NPF event that was recorded at Hyytiälä on 1 April 2003.The growth rate of nanometer particles was 3.4 nm/h in average.We may conclude that the growth rate values obtained by our estimator for the particle size interval 2.8-8.6 nm are well comparable to the considered results.
The parameters "diameter of a first/second growth unit" together with polarizabilities could, in principle, help to identify the substances, responsible for GR1 and GR2.Many studies demonstrated that certain trace gases (besides sulphuric acid) can induce NPF.Both the specifics and the magnitude of the effect depend on the particular substance (e.g., Luts et al., 2011a, b;Ortega et al., 2012).Our burst simulator can take into account two different substances, characterized by the size, density and polarizability.As discussed in section "Prior information and parameter constraints used in the estimations", polarizabilities are fixed.The estimated values of the diameters range from 0.49 nm to 0.6 nm, which is clearly not sufficient to identify the substances but implies that the growth units should not be very large molecules.
Former results imply that relatively small molecules (e.g., sulphuric acid) plausibly determine the first stages of growth, but at the later stages some larger molecules (e.g., various VOC molecules with masses above 300 Da) should become dominant (e.g., Kulmala et al., 2013).According to the results by Ehn et al. (2011), a particle with mass about 300 Da should have a diameter of about 1.5 nm.In relation to these results the diameters obtained by our estimator (from 0.49 nm to 0.6 nm) seem unreasonably low.Nevertheless, there is no standardized relationship between the mass and size; in the simulator by Tammet and Kulmala (2005) the particles with masses of 300 Da have diameters of about 0.8 nm, not about 1.5 nm.Therefore, the estimated diameters can fit well for the molecules that are responsible for the first stages of the growth but can be somewhat low for the molecules that determine the later stages.This discrepancy can be caused by our test data, which are limited with the sizes of 8.6 nm.Due to this relatively low upper size, larger molecules could perhaps not take the full effect but the growth was still mainly controlled by the first substance.This issue could be the subject of follow-on studies using a CI-ApI-TOF for measuring ELVOCs.

Birth Size of Particles
The parameter called birth size is linked to the parameter named nucleation rate, the latter is discussed in the previous section "Nucleation rate".Within the simulator, all the particles appear at a certain birth size and with a certain nucleation rate.According to the estimation, the birth size appeared to be between 0.95 and 1.0 nm (Table 1).This result is in agreement with the knowledge that a burst should start at about 1 nm.Indirect measurements and theoretical calculations suggest that from a thermodynamic point of view the critical nucleus has a diameter on the order of 1 nm (Kulmala et al., 2007;Zhang et al., 2012).Kulmala et al. (2013) reported slightly larger values of 1.3-1.7 nm but for mobility diameters.Our simulator yields the mass diameters (Tammet and Kulmala, 2005).The problem of precise comparison of mass and mobility diameter values needs a sophisticated theory and is not completely solved yet (Tammet, 2012).Broadly, our results do not contradict the results by Kulmala et al. (2013).
The birth size of particles is one important parameter that has not yet been thoroughly studied.Additional calculations by varying the birth size showed that burst simulator is very sensitive in this respect.Even variations by 3% resulted in drastic changes in simulated graphs.Formally, according to the estimation, moderate variations in the value of birth size can be compensated by large variations in the values of several other microphysical parameters of the burst simulator, first of all by large changes in the value of the extra temperature of quantum rebound (information about the quantum rebound can be found in Tammet and Kulmala (2005)).This implies that these parameters behave as correlated variables.Neither birth size nor extra temperature can be directly measured, that is why these variations in our estimated values are somewhat arbitrary.Still, such compensations are formally effective up to the values of the birth size of about 1.3 nm, even if we allow extremely large variations in the extra temperature.Above this value of birth size, the similarity with the measured results lessens substantially.Provided the value of extra temperature is fixed, the estimated values of birth size converge effectively and uniquely.Therefore we preferred the value of the extra temperature that is fixed at the guess by Tammet andKulmala (2005, 2007).As discussed above, in this case the values of birth size obtained by our estimator converge near the reasonable values.

Forest Parameters and Ion Mobility
As described above, we fixed certain forest-related parameters in simulations in order to get results that can better be related to real conditions.The estimated input parameter "air residence distance in forest" appeared from 244 m to 770 m.Different values of this parameter for different NPF events can be caused by differences in forest and terrain features when the air mass approaches the measurement site from different directions, or by the changing wind speed.The forest features were not explicitly studied and therefore the estimated values cannot be accurately assessed.Broadly, such air residence distances seem plausible.
The wind speed in the simulating tool by Tammet and Kulmala (2007) has a certain specific meaning: it rather characterizes an average wind speed in a large forested area than at a fixed geographical point.Therefore, the initial, midpoint and final wind speed in the forest were considered as free parameters in the current simulations.The estimations resulted in wind speed values in the interval from 0.74 to 3.7 m/s with the mean of about 1.6 m/s.It is remarkable that the actually measured wind speed at the routine measurement site of the SMEAR II station around noon on these days was 1.2 m/s, in average.Thus the estimated values are close to the measured values despite the fact that, in principle, the simulated values and the measured values do not have exactly the same physical meanings.
The estimated values for the mobility of cluster ions turned out to be close to the values known elsewhere, e.g. from the long-term measurements at Tahkuse Observatory (Hõrrak et al., 2000).

CONCLUSIONS
We developed and tested an NPF parameter estimator that includes the existing burst simulator (Tammet andKulmala, 2005, 2007).It makes possible an automated estimation of the input parameters (e.g., nucleation rates) by the criterion of the best numerical similarity of the output parameters (e.g., simulated size distributions) to the corresponding measured parameters (e.g., measured size distributions).In other words, it is possible to realize the inverse simulation of an NPF event.To our knowledge, this is the first estimator for such a task.Computational experiments demonstrated that the solution of the inverse problem converged within about ten estimation steps (about 1000 model runs) and sometimes even faster.Nevertheless, in the general case the solution cannot be considered as unique.Computational experiments revealed that the variations in the initial values of certain input parameters could in part be compensated by variations in certain other parameters.This is caused by the inherent nature of the existing burst simulator, not by our new algorithms.One of the parameters that appeared to be correlated with certain other parameters is the birth size of particles.To decrease such ambiguities, we fixed or constrained certain input parameters on the basis of available a priori information.For example, the values of the ionization rate could vary only above 4 cm -3 /s, in accordance with the presented discussion.The number of independent forest parameters was reduced to one parameter "air residence distance in forest".The values of this parameter, obtained for different conditions, can in principle be evaluated on the basis of the forest and terrain features.The air residence distance can depend on wind direction, as it probably was the case for the test events on 7 April and 8 April 2000.
We tested the estimator by NPF events recorded at Hyytiälä from 7 April to 8 April 2000.Provided the discussed constraints, the solution of the inverse problem reliably converged.The method resulted in values for a large number of physical parameters for both NPF events.Using the estimated values of the input parameters, the simulator yielded time-dependent curves that are close to those obtained by direct measurements, therefore, the method could reproduce the general shapes of the measured curves.
The estimated ionization rate values varied from 4.0 to 5.6 cm -3 /s.The nucleation rate for charged particles varied from 0.05 to 0.19 cm -3 /s and the rate for neutral particles from 0.63 to 1.6 cm -3 /s.The birth size of nanoparticles in terms of the mass diameter was between 0.95 nm and 1 nm.The critical size of particles for the condensation of presumably organic vapors by the nano-Köhler mechanism appeared to be around 2 nm: the estimated values varied from 1.65 nm to 2.4 nm.The estimated growth rates of nanoparticles spread from 1.2 to 6.4 nm/h.The estimated values for the mobility of cluster ions were between 1.37 and 1.55 cm 2 /V/s.All these estimated values are close to those known from the discussed studies; therefore we can conclude that the obtained solution of the inverse problem is realistic.
We conclude that the simulator by Tammet andKulmala (2005, 2007), when completed with the new algorithms, appears to be a productive tool for the analysis of NPF.We will continue developing the estimator by providing more possibilities for including complementary information, e.g., simultaneous meteorological measurements.A further study of the interdependencies between the input parameters of the burst simulator can yield new and valuable information about the possibilities how to improve the methods.The results of this study can be used to investigate which percentage of the smallest stable particles can survive the evolution up to sizes of about 3 nm.The future prospects include an analysis of the growth rate of nanoparticles and a statistical analysis of a large number of NPF events recorded during simultaneous atmospheric ion, aerosol, and meteorological measurements at different locations.
The actual case is similar: each from the five vectors contains one modified element and 60 elements have been not modified.In the actual case, we obtain 31 combinations.Next, the algorithm simulates all the particular cases, defined by the 31 combinations, and for every particular case computes the corresponding measure of difference.

Characteristics of test data
For the simulation of a NPF event, it is necessary to establish the onset time of the event.For this purpose we examined the graphs of the concentrations of measured nanoparticles and selected the time point, when a considerable rise in the concentrations of these nanoparticles began.As seen in the Fig S1(a), a considerable rise in the concentrations of "natural" and "bipolar" ions started at about 8:00, but this is not the proper onset time.A burst should start from particles with sizes of about 1 nm (Kulmala et al., 2013), but the measured low end of DMPS particle diameter range was 2.8 nm, therefore the instruments do not see the particles that emerge just at the onset of a burst.In case we take the growth rate of nanoparticles for this size region to be between 1 and 2 nm/h (as found by Kulmala et al., 2013), we can assume an about 1 hour earlier burst onset time for about 1 nm size particles.Thus we chose a 1-hour earlier onset times, 7:00 for case 1 and 8:00 for case 2.

Table S1(b).
Measured concentrations of nanoparticles and cluster ions (in cm -3 ) at the SMEAR II station, Hyytiälä, on 7 April 2000.Column "LST" contains the local standard time (standard time for Finland or UTC + 2 hours).Column "Time" presents times, in minutes, elapsed from the start of the NPF event.These time points are also given in Fig. 2a, which presents the detailed graphs.Column with notation "n-" refers to the concentrations of negative cluster ions in the size range of 0-1.9 nm, the actual values are divided by 100; "N-" is the concentrations of negative nanoparticles, the actual values are divided by 10; "No" is the concentrations of neutral nanoparticles, the actual values are divided by 500; all the notations are in line with the ones, used in the simulation tool.The size range of nanoparticles is 2.8-8.6 nm.Values in columns "n-", "N-" and "No" are used as test data.

LST
been set, enters the estimation loop Exits the estimation loop

Fig. 2
Fig. 2(a).Measured and simulated fraction concentrations of charged and neutral aerosol particles and of cluster ions for case 1 at the estimated values of the input parameters.n-is the concentration of negative cluster ions, N-and No are the concentrations of negative and neutral particles in the diameter interval of 2.8-8.6 nm.To draw the lines presented in the figure, the numerical concentrations are divided by the factors as given in the legend.Black lines represent the measured values and gray lines the simulations.Time on the x-axis is counted from the beginning of the expected nucleation onset at 07:00 LST, where LST is the local standard time (standard time for Finland or UTC + 2 hours).

Fig. 2
Fig. 2(b).Measured and simulated fraction concentrations of charged and neutral aerosol particles and of cluster ions for case 2 at the estimated values of the input parameters.The notations are the same as in Fig.2(a).Time on the xaxis is counted from the beginning of the expected nucleation onset at 08:00 LST.

(
2011).Growth Rates of Nucleation Mode Particles in Hyytiälä during 2003-2009: Variation with Particle Size, Season, Data Analysis Method and Ambient Conditions.
Figure captionsFig.S1(a).Diurnal evolution of the particular fractions (cm -3 ).These fractions are used as test data within the present study.Figure depicts the evolution of negatively charged aerosol particles and negative small ions at SMEAR II Station, Hyytiälä, Finland, on 7 April 2000 (case 1)."Natural" denotes naturally charged particles in a diameter range of 2.8-8.6 nm, "Bipolar" denotes particles in the same size range charged in the bipolar charger, and "n" cluster air ions measured by means of air ion counters.LST is the local standard time (standard time for Finland or UTC + 2 hours).

Fig
Fig. S1(b).Diurnal evolution of particle size distribution for case 1.

Fig
Fig.S2(a).Diurnal evolution of the particular fractions (cm -3 ).These fractions are used as test data within the present study.Figure depicts the evolution of negatively charged aerosol particles and negative small ions at SMEAR II Station, Hyytiälä, Finland, on 8 April 2000 (case 2).The notations are the same as in Fig S1(a).

Fig
Fig. S2(b).Diurnal evolution of particle size distribution for case 2.
and summarized in Supplement.

Table S2 (
b). Measured concentrations of nanoparticles and cluster ions (in cm -3 ) at the SMEAR II station, Hyytiälä, on 8 April 2000.Fig.2bpresents the detailed graphs.The notations are the same as in Table1b.Values in columns "n-", "N-" and "No" are used as test data.