Evaluating machine learning model performance in a two-step colocation process for TVOC and BTEX sensor calibration

Caroline Frischmon; Jack Porter; Ethan Balagopalan; William Senga; Jill Johnston; Michael Hannigan

Journal ArticleOPEN ACCESS

Evaluating machine learning model performance in a two-step colocation process for TVOC and BTEX sensor calibration

Atmospheric Measurement Techniques (2026) 19(9) 2923-2939

DOI: 10.5194/amt-19-2923-2026

0Citations

1Readers

Abstract

Calibration of low-cost air quality sensors (LCSs) for total volatile organic compound (TVOC) and benzene, toluene, ethylbenzene, and xylenes (BTEX) quantification remains challenging due to the sensors' cross-sensitivity to temperature and humidity and their tendency to drift over time. In this study, we aimed to improve TVOC and BTEX metal oxide (Figaro TGS 2600, 2602, 2611) sensor calibration using a two-step colocation strategy. A two-step colocation places one LCS (the secondary standard) with a reference monitor while others operate in the field, then briefly colocates the field sensors with the secondary standard to address inter-sensor variability and drift. This strategy made it possible to develop the calibration model under environmental conditions closely matching those of the field, which is essential for model transferability from colocation to field conditions. In addition to TVOC and BTEX, we applied the two-step colocation process to NO2 electrochemical (Alphasense B-4) sensors to demonstrate the broader applicability of our approach beyond TVOC and BTEX quantification. Next, we compared the performance of multiple machine learning models, including ridge, lasso, random forest, gradient boosting, extreme gradient boosting, support vector regression, and linear regression, to investigate the optimal model choice for calibration. We found that no single model performed best across all pollutants. For example, gradient boosting excelled at capturing peak TVOC concentrations, while linear regression performed best for BTEX. Conversely, linear regression was the worst-performing model for NO2. Overall, the models showed satisfactory RMSE around 40–50 ppb for TVOC, 1.25–1.75 ppb for BTEX, and 4–6 ppb for NO2. However, all models also overestimated baseline concentrations and underestimated peaks. The severity of this bias depended on the reference concentration distribution, with the most severe peak underestimation occurring in the more heavily skewed TVOC and BTEX data. The systematic bias at baseline and peak concentrations was not evident in the overall mean bias error, which was near zero for all pollutants. This result underscores the need to evaluate model performance across the entire concentration distribution. Finally, we found that calibration performance was sensitive to the choice of training and testing data split. Future research could seek to optimize the training and testing split to ensure robust model transferability to field data.

Cite

CITATION STYLE

APA

Frischmon, C., Porter, J., Balagopalan, E., Senga, W., Johnston, J., & Hannigan, M. (2026). Evaluating machine learning model performance in a two-step colocation process for TVOC and BTEX sensor calibration. Atmospheric Measurement Techniques, 19(9), 2923–2939. https://doi.org/10.5194/amt-19-2923-2026

Evaluating machine learning model performance in a two-step colocation process for TVOC and BTEX sensor calibration

Abstract

Cite

Register to see more suggestions