Exceedance Analysis of PM 10 Concentration in Central Indian City : Predicting Gap between Two Exceedances

In this study the gap between the two exceedances is analyzed using time series analysis. The time series of PM10 (particulate matter of size less than 10 micron) observed during 2005–2013 in two cities; Nagpur and Chandrapur in central India is considered. Higher PM10 concentration is observed in Chandrapur as compared to Nagpur. Exponential relationship is observed between the average time between the two exceedances and annual average PM10 concentration. This information along with the PM10 concentration prediction model is utilized to predict the average number of observations between the two exceedances for the following year. k-nearest neighbor approach is used for forecasting PM10 concentration which enabled estimating the average number of observations between two exceedances using exponential relationship. The approach can be used for estimating the average number of observations between the two exceedances over a year, which can further be utilized to make appropriate decision to control and manage high particulate matter pollution in an area.


INTRODUCTION
Air quality in an area is usually evaluated by comparing the measurements with the standard or threshold set up by national regulatory agencies such as Environmental Protection Agency (USEPA) in US or Central Pollution Control Board (CPCB) in India.Extreme values of air pollutant concentrations that exceed the threshold pose serious threat to human health and environment (Ercelebi and Toros, 2009;Lonati et al., 2011).The extreme concentrations have been studied extensively in the literature considering their importance in air pollution management for source emission reduction (Sharma et al., 1999).Two aspects have been mainly considered while assessing the extreme concentration events; probabilistic analysis of extreme events and exposure studies.The later i.e exposure studies have mainly been focused on the pollutant concentration, duration and time pattern of exposure (Borrego et al., 2009).Probabilistic studies have been focused on predicting the probability of exceedance to air quality thresholds and return period for the highest concentration levels (Lonati et al., 2011).The extreme value analysis has been carried out usually on the time series of number or total time of exceedances in a time interval and duration of the events continuously exceeding the threshold concentration (Lonati et al., 2011;Chelani, 2013a).The probability distributions are then used for further projections and to obtain the probability of exceedances (Ercelebi and Toros, 2009;Lonati et al., 2011).
Gap or number of observations between two high concentration events is a useful quantity in extreme value theory and can be utilized to study the flow of extreme values in a time series.A lower gap suggests the influence of emissions and favorable meteorological conditions for the built-up of high concentration events.High gap between two exceedances on the other hand suggests that the air pollutant concentrations are well within the thresholds and may not pose serious health risks.It has often been estimated in many studies using the probability distributions (Ercelebi and Toros, 2009;Sharma et al., 1999).Predicting the gap between two exceedances is an important problem in health impact assessment studies.Assuming the independent and identically distributed extreme concentrations, type I asymptotic distribution has been used to obtain the mean, distributional properties and expected probability of the extreme values in a year (Larsen, 1973;Horowitz and Barakat, 1979;Lu and Fang, 2003).Other distributions, including type II and type III are also used in the literature to estimate the expected probability of extreme values and associated time between the exceedances (Ercelebi and Toros, 2009).The use of these distributions however, requires the assumption of identically and independently distributed time series, whereas air pollutant concentrations are often non-stationary and auto-correlated.Data driven approaches provide the way to analyze and project the extreme values based on the empirical relationships.Lonati et al. (2011) used empirical relation of extremes with thresholds and provided a useful way to predict the high concentrations.Another approach to obtain the future projections is by characterizing hidden extreme patterns in the data observed over time using time series analysis.
In this study, the gap or number of observations between the two exceedances is analyzed using time series analysis.The time series of coarse particulate matter (particulate matter of size less than 10 micron or PM 10 ) observed during [2005][2006][2007][2008][2009][2010][2011][2012][2013] in two cities in central India is considered for the study.The two cities; Nagpur and Chandrapur are moderate sized with different source and land-use characteristics and have population of about 29,00,000 and 20,00,000, respectively.Yearly projections are then obtained using the relationship between the number of observations between two exceedances and annual average concentration.For forecasting, PM 10 concentration is modeled using k-nearest neighbor approach.

Location and Data
Nagpur (21°08′N, 79°10′E) and Chandrapur (19°57′N, 79°18′E) are urban cities in central India surrounded by rural mainly agricultural areas.Both lie on the Deccan plateau of the Indian Peninsula at a mean altitude of 310 and 189 meters above sea level, respectively; with hot summer months, moderately cold winter and southwest monsoon.The annual rainfall is about 1100 mm (http://www.worldweatheronline.com/Nagpur-weather-averages/Maharashtra/IN.aspx) and 1200 mm (http://www.worldweatheronline.com/C handrapur-weather-averages/Maharashtra/IN.aspx), at Nagpur and Chandrapur, respectively.Both cities are located on the tropics.Extreme temperature has been recorded with 47.9°C in Nagpur during May 2013 (http://timesofindia.indiatimes.com/city/nagpur/Nagpur-records-all-time-high-temperature-at-47-9-C/articleshow/20216419.cms) and 49°C in Chandrapur during June 2007 (Jaswal et al., 2015).Nagpur is recognized as the second green and clean city in India (http://pdf.usaid.gov/pdf_docs/pdacw407.pdf).It has witnessed rapid economic expansion resulting in increase in environmental problems in recent years.The major air pollution sources are power plant and automobile exhausts.There are 2 mega power plants with the total capacity of 1880 MW surrounding Nagpur (http://indianpowersector.com/home/ power-station/thermal-power-plant).In Chandrapur, the prominent wind direction is from south.Largest super thermal power plant in India is located in Chandrapur with the capacity of 2340 MW (http://indianpowersector.com/h ome/power-station/thermal-power-plant).Due to the reserves of coal, intense mining activity is carried out in the area, which causes the elevation of coarse particulate matter concentration.The city is also termed as city of black gold due to intense coal mining.It has been listed in Comprehensive Environmental Pollution Index (CEPI), which is carried out on 43 industrial clusters of India.Critical area is identified with an index of > 70 on a scale of 0 to 100.CEPI of Chandrapur is calculated as 83.88.Further details on CEPI are given in CPCB (2009).
PM 10 data observed during 2005-2013 at three sites in each city have been obtained from the website of Maharashtra Pollution Control Board (www.mpcb.gov.in).The agency is monitoring the ambient air quality at various locations with different land use activities.In Nagpur, Site NGP1 is located in Hingna, which is mainly an industrial area with approximately 900 small and medium scale industries with mainly automobiles, food manufacturing plants, casting units and steel rolling plant.Site NGP2 is at Sadar, with an open and flat area located approximately 150 meters away from the heavily trafficked activities.Site NGP3 is located in Institution of Engineers' premises, which can be considered as a residential site with less traffic activities.The sampling sites can be classified as an urban locations distanced from sources but representative of general population exposure.Site CPUR1 i.e., Maharashtra Industrial Development Corporation (MIDC) in Chandrapur is an industrial area.Sites CPUR2 and CPUR3 located at city council office premise (SRO) and Gram Panchayat office respectively are mainly residential locations.The location of sites is given in Fig. 1.The data is available with a frequency of twice a week.In order to evaluate the significance of PM 10 pollution in two cities, PM 10 observed at an urban residential site in Delhi is also analyzed.According to World Health Organization report released in 2014, Delhi has been recognized as the most polluted city in the world (http://www.who.int/phe/health_topics/outdoorair/databases/AAP_database_results_2014 .pdf).The data at one of the sites in Delhi namely 'Shadipur' over 2010 to 2013 is obtained from www.cpcb.nic.in.

Methods
The k-nearest neighbor is a simple machine learning method, in which the nearest neighbors of an object in question are determined based on the historical data.The relationship between the object in question and nearest neighbors is approximated using estimation techniques.The future projections are then obtained using the established relationships.Nearest neighbor forecasting models have been found to perform well for predicting complex nonlinear behavior due to the assumption that 'the object to be predicted has close neighbors in the historical set'.The method is shown to capture linear and nonlinear patterns inherent in the data (Yankov et al., 2006).With the help of nearest neighbors, one can predict the object using appropriate estimation techniques.The function to estimate the object may be well defined or can be estimated.Moreover, the algorithm does not require a prior assumption of the model and also does not need data pre-processing.The practice is usually to divide the data observed over a period of time into two groups, of which the first group is used to obtain the estimates of the second group.The method is described below.
Let x(t) be the time series or sequence of observations over equal intervals of time t = 1, …, n.Let x(l) be the continuation of the time series x(t).The k-nearest neighbors of object x(l) can be obtained by using the distance or norm D as; The next step is to rank the distance matrix D in ascending order and note down the k values of x(t 1 ) with minimum distance.This gives the k-nearest neighbors of x(l) in x(t 1 ).The h-step ahead forecasts can be obtained by utilizing the k-nearest neighbors of x(l).The appropriate function of knearest neighbors give the estimate of x(l + h), where h = 1 for next step forecasting.
where x' k (l) is the k-nearest neighbors of x(l).Function f can either be median, mean, linear combination or kernel function of k-nearest neighbors.The linear combination of k-nearest neighbors is given as; where w is the coefficient matrix to be determined by ordinary least squares technique (Atkeson et al., 1996).Kernel regression function f can be obtained by the kernels such as Gaussian, radial basis function, polynomial or uniform.Gaussian kernel is most widely used for kernel regression modelling (http://people.revoledu.com/kardi/tutorial/Regression/KernelRegression/KernelRegression.htm).
The kernel function between the values x (which are to be estimated) and the input x' k (l) is given as, The forecasts are then obtained as, where σ is the bandwidth to be selected.Further details of kernel regression are given in Smola and Scholkopf (1998).
The details of k-nearest neighbor method are given in Yankov et al. (2006)

Seasonal Cycles
Time series of PM 10 concentration observed at various study locations is given in Fig. 2. It can be seen that the PM 10 time series at all the sites has seasonal cycles and no specific monotonous trend.At Nagpur, highest average concentration of 116.6 ± 68.1 µg m -3 is observed at NGP1 followed by 93.9 ± 58.2 µg m -3 at NGP3 and 85.5 ± 48.8 µg m -3 at NGP2.Median of 102.5, 77.0 and 87.0 µg m -3 with interquartile range of 77.2, 52.5 and 66.0 µg m -3 is observed at NGP1, NGP2 and NGP3, respectively.At industrial site NGP1, the average PM 10 concentration exceeds the standard concentration.At Chandrapur, the highest average concentration of 178.5 ± 107.2 µg m -3 is observed at CPUR3 followed by 123.6 ± 69.8 µg m -3 at CPUR1 and 103.1 ± 73.4 µg m -3 at CPUR2.Median of 113.0, 84.0 and 161.0 µg m -3 with interquartile range of 95.0, 77.0 and 128.0 µg m -3 is observed at CPUR1, CPUR2 and CPUR3, respectively.Relatively high concentrations exceeding the standard limit of 100 µg m -3 stipulated by CPCB at three sites in Chandrapur are observed as compared to Nagpur.The maximum PM 10 concentration is observed in winter months during 2007 in Nagpur, whereas at CPUR1 and CPUR3, maximum concentration is observed in summer months during 2008.Due to clear weather and hot climatic conditions in summer (Gawai et al., 2014), the resuspension of dust due to mining activities at CPUR1 and CPUR3 occurs resulting in high PM 10 concentration.

Exceedance Analysis
In order to understand the exceedance pattern over time, the time series of exceedance to a threshold of 100 µg m -3 is formed.CPCB guidelines states that "24 hourly or 8 hourly or 1 hourly monitored values, as applicable, shall be complied with 98% of the time in a year.2% of the time, they may exceed the limits but not on two consecutive days of monitoring".The above guidelines are provided as the exceedance on two consecutive days of monitoring may raise serious health alarms and the control measures need to be initiated, in case if observed (Chelani, 2013b).The yearly probability of exceedance to the standard limit during the study period is plotted in Fig. 3.The probability of exceedance has increased initially.Decreasing trend is observed from 2008 till 2011 and an increase is observed thereafter at three sites in Nagpur.At Chandrapur, lowest probability of exceedance is observed in 2013 at CPUR1 and CPUR2, with the decreasing trend from 2008 like Nagpur.The probability of exceedance is above the limit of 2% during 2007-2013 at Nagpur and during 2005-2012 at Chandrapur.CPUR3 is observed to be the most critical site for PM 10 as it has the highest probability of exceedances.It is observed as even more critical than PM 10 at residential site in Delhi with more probability of exceedances during 2010-2013.In a study on comparing PM 10 in mining and urban area, George et al. (2013) observed high PM 10 concentration in Delhi as compared to Chandrapur.
From the CPCB guidelines it is also clear that the concentration should not be exceeded on two consecutive days of monitoring.To remain within the standard limit, the average time between two exceedances therefore should not be equal to 1.To assess the applicability of the above guidelines, exceedance time series is formed from the PM 10 data observed at six sites in two cities.The number of observations between the two exceedances is then computed.It can be observed from Fig. 3 that the time series of number of observations between two exceedances has many spikes at all the sites.Further Table 1 shows that it is quite low at NGP1 followed by NGP3 and NGP2.Frequent high concentrations resulted in the lower number of observations between the two exceedances at an industrial site.Due to the homogeneous nature of emissions and meteorology, annual average concentrations and number of observations between the two exceedances is quite similar at two residential sites in Nagpur.In Chandrapur, CPUR3 which is mainly a residential site, exhibits lower number of observations between two exceedances, which confirms the observation of highest probability of exceedances at this site.Unpaved road dust, loose soil and nearby mining activities are the main reasons for high annual average concentration and lower number of observations between exceedances at this site.CPUR1 which is an industrial site also has a lower number of observations between two exceedances as compared to CPUR2.The linear trend analysis suggested an insignificant trend in the annual average time series of number of observations at the three sites in Nagpur, whereas in Chandrapur, significant positive trend with a slope of 1.6 (p = 0.02) and negative trend with a slope of -1.82 (p = 0.009) and -0.731 (p = 0.016) is observed at CPUR2, CPUR1 and CPUR3, respectively.At CPUR1, the slope is computed during 2007-2012 as it was found insignificant for the whole study period.Significant negative slope suggests the decrease in number of observations between the exceedances, which is a matter of concern as more exceeding values to a standard may lead to higher health risks.
The number of observations between two exceedances is also estimated for winter, summer, monsoon and post monsoon to gain insight into the seasonal variations in high concentration events and their time distribution patterns.The results are given in Table 2, which shows that winter is the most critical season due to the low number of observations between two exceedances at all the sites except at NGP3 where the post monsoon is observed to be critical.The highest number of observations between two exceedances is observed in monsoon at all the sites due to the washout effect resulting in lower PM 10 concentration.

Predicting Number of Observations between Two Exceedances
To predict the number of observations between two exceedances over a year, a simple yet effective method is envisaged instead of using probabilistic approach.The relationship between average PM 10 concentration and the number of observations between two exceedances over a year is first examined.It can be observed from Fig. 4 that annual average number of observations between exceedances decreases exponentially with annual average PM 10 concentration.Exponential fit is observed to be best among straight line, logarithmic and power law fitting with respect to R 2 .For the year with high PM 10 concentration, the number of observations between exceedances is low and vice versa.For low average concentration, the measured values are generally lower than standard limit and the gap between the two exceedances is more.Based on this relationship, if one is able to reliably predict PM 10 concentration, the  4.
With the predicted PM 10 concentration, average number of observations between two exceedances is estimated for 2013 using the exponential equation as given in Fig. 4. The observed and predicted number of observations between Fig. 5(a).Observed vs. predicted PM 10 concentration using k-nn approach: for Nagpur.
two exceedances is given in Table 5.It is seen that R 2 between observed and predicted annual series is > 0.7 at all the sites.This suggests that the approach presented here is useful to predict the average number of observations between the two exceedances over a year, which is a useful quantity to policy makers and concerned authorities to make appropriate decisions to control and manage high particulate matter pollution in an area.Although the number of observations between two exceedances can be predicted using probabilistic approaches that are well applied in several studies including Larsen (1973); Horowitz and Barakat (1979); Sharma et al. (1999); Lu and Fang (2003); Ercelebi and Toros (2009); Lonati et al. (2011), the applicability of the data driven nearest neighbor based approach with acceptable prediction error provides alternative to assumption based probabilistic approach.

CONCLUSIONS
The number of observations between the two exceedances of PM 10 time series at two cities in central India is analyzed using time series analysis.Higher PM 10 concentration is observed in Chandrapur as compared to Nagpur.Seasonal analysis of PM 10 time series showed winter to be critical in Nagpur and summer in Chandrapur due to clear weather and hot climate leading to resuspension of dust due to mining activities.The exceedance analysis showed a higher exceedance probability than the prescribed standard of 2%  in a year by CPCB in both cities.The results are compared with the PM 10 observed at residential site in Delhi, which revealed that PM 10 at one of the residential sites in Chandrapur is higher.An insignificant trend in the number of observations between two exceedances in Nagpur, an increase at CPUR2 and decrease at CPUR1 and CPUR3 is observed.CPUR1 and CPUR3 are observed to be critical locations with high PM 10 pollution.It is observed that the number of observations between two exceedances over a year has an exponential relationship with the annual average particulate matter.This information along with PM 10 concentration prediction model is utilized to predict the average number of observations between two exceedances for the following year.The approach is useful to predict the average number of observations between two exceedances over a year, which is a useful quantity to policy makers and concerned authorities to make appropriate decisions to control and manage high particulate matter pollution in an area.

Fig. 4 .
Fig. 4. Number of observations between two exceedances vs. annual average PM 10 concentration.

Table 1 .
Annual average number of observations between two exceedances during 2005-2013.

Table 2 .
Average number of observations between two exceedances during different seasons over2005-2013.

Table 3 .
Bandwidth σ as in equation 4 for different data sets.

Table 4 .
Model performance statistics of k-nn approach for different data sets.