Vikas Kumar1, Vasudev Malyan2, Manoranjan Sahu This email address is being protected from spambots. You need JavaScript enabled to view it.2,1,3, Basudev Biswal4 1 Interdisciplinary Program in Climate Studies, Indian Institute of Technology Bombay, Mumbai 400076, India
2 Aerosol and Nanoparticle Technology Laboratory, Environmental Science and Engineering Department, Indian Institute of Technology Bombay, Mumbai 400076, India
3 Centre for Machine Intelligence and Data Science, Indian Institute of Technology Bombay, Mumbai 400076, India
4 Department of Civil Engineering, Indian Institute of Technology Bombay, Mumbai 400076, India
Received:
November 7, 2022
Copyright The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are cited.
Revised:
March 26, 2023
Accepted:
April 8, 2023
Download Citation:
||https://doi.org/10.4209/aaqr.220386
Kumar, V., Malyan, V., Sahu, M., Biswal, B. (2023). Machine Learning Classification Model to Label Sources Derived from Factor Analysis Receptor Models for Source Apportionment. Aerosol Air Qual. Res. https://doi.org/10.4209/aaqr.220386
Cite this article:
Factor analysis (FA) receptor models are widely used for source apportionment (SA) due to their ability to extract the source contribution and profile from the data. However, there is subjectivity in the source identification and labelling due to manual interpretation, which is time-consuming. This raises a barrier to the development of the real-time SA process. In this study, a machine learning (ML) classification algorithm, k-nearest neighbour (kNN), is applied to the source profiles obtained from the United States Environmental Protection Agency’s (US EPA) SPECIATE database to develop a model that can automatically label the factors derived from FA receptor models. The train and test score of the model is 0.85 and 0.79, respectively. The overall weighted average precision, recall and F1 score is 0.79. The performance of the model during validation exhibits acceptable results. The application of ML models for source profile labelling will reduce the time taken and the subjectivity associated with results due to modeler bias. This process can act as another layer of the process for verification of the results of FA receptor models. The application of this methodology advances the process towards real-time SA.HIGHLIGHTS
ABSTRACT
Keywords:
Particulate matter, Source apportionment, Receptor models, Machine learning, Classification