A value-based approach for training of classifiers with high-throughput small molecule screening data

Natalia Khuri; Sarah Parsons

Conference ProceedingsOPEN ACCESS

A value-based approach for training of classifiers with high-throughput small molecule screening data

Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021 (2021)

DOI: 10.1145/3459930.3469514

2Citations

5Readers

Get full text

Abstract

In many practical applications of machine learning, models are built using experimental data that are noisy, biased and of low quality. Binary classifiers trained with such data have low performance in independent and prospective tests. This work builds upon techniques for the estimation of the value of training data and evaluates a batch-based data valuation. Comparative experiments conducted in this work with seven challenging benchmarks, demonstrate that classification performance can be improved by 10% to 25% in independent tests, using value-based training of classifiers. Additionally, between 97% to 100% of class labels can be detected among low-valued training samples. Finally, results show that simpler and faster learning methods, such as generalized linear models, perform as well as complex gradient boosting trees when training data comprises only the high-valued samples extracted from high-throughput small molecule screens.

Author supplied keywords

Cite

CITATION STYLE

APA

Khuri, N., & Parsons, S. (2021). A value-based approach for training of classifiers with high-throughput small molecule screening data. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021. Association for Computing Machinery, Inc. https://doi.org/10.1145/3459930.3469514

A value-based approach for training of classifiers with high-throughput small molecule screening data

Abstract

Author supplied keywords

Cite

Register to see more suggestions