Special issue in honor of Prof. David Y.H. Pui for his “50 Years of Contribution in Aerosol Science and Technology”

Vikas Kumar1, Vasudev Malyan2, Manoranjan Sahu This email address is being protected from spambots. You need JavaScript enabled to view it.2,1,3, Basudev Biswal4 

1 Interdisciplinary Program in Climate Studies, Indian Institute of Technology Bombay, Mumbai 400076, India
2 Aerosol and Nanoparticle Technology Laboratory, Environmental Science and Engineering Department, Indian Institute of Technology Bombay, Mumbai 400076, India
3 Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai 400076, India
4 Department of Civil Engineering, Indian Institute of Technology Bombay, Mumbai 400076, India


Received: November 7, 2022
Revised: March 26, 2023
Accepted: April 8, 2023

 Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.


Download Citation: ||https://doi.org/10.4209/aaqr.220386  


Cite this article:

Kumar, V., Malyan, V., Sahu, M., Biswal, B. (2023). Machine Learning Classification Model to Label Sources Derived from Factor Analysis Receptor Models for Source Apportionment. Aerosol Air Qual. Res. https://doi.org/10.4209/aaqr.220386


HIGHLIGHTS

  • Machine learning model to label the factors derived from factor analysis receptor models.
  • Train and test score of the model is 0.85 and 0.79.
  • Overall weighted average precision, recall and F1 score is 0.79.
  • Performance of the model during validation exhibits acceptable results.
  • Reduce the time taken and the subjectivity to assign a factor to a source.
 

ABSTRACT


Factor analysis (FA) receptor models are widely used for source apportionment (SA) due to their ability to extract the source contribution and profile from the data. However, there is subjectivity in the source identification and labelling due to manual interpretation, which is time-consuming. This raises a barrier to the development of the real-time SA process. In this study, a machine learning (ML) classification algorithm, k-nearest neighbour (kNN), is applied to the source profiles obtained from the United States Environmental Protection Agency’s (US EPA) SPECIATE database to develop a model that can automatically label the factors derived from FA receptor models. The train and test score of the model is 0.85 and 0.79, respectively. The overall weighted average precision, recall and F1 score is 0.79. The performance of the model during validation exhibits acceptable results. The application of ML models for source profile labelling will reduce the time taken and the subjectivity associated with results due to modeler bias. This process can act as another layer of the process for verification of the results of FA receptor models. The application of this methodology advances the process towards real-time SA.


Keywords: Particulate matter, Source apportionment, Receptor models, Machine learning, Classification




Share this article with your colleagues 

 

Subscribe to our Newsletter 

Aerosol and Air Quality Research has published over 2,000 peer-reviewed articles. Enter your email address to receive latest updates and research articles to your inbox every second week.

Aerosol and Air Quality Research (AAQR) is an independently-run non-profit journal that promotes submissions of high-quality research and strives to be one of the leading aerosol and air quality open-access journals in the world. We use cookies on this website to personalize content to improve your user experience and analyze our traffic. By using this site you agree to its use of cookies.