An innovative approach of determining the sample data size for machine learning models: a case study on health and safety management for infrastructure workers

8Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Numerical experiment is an essential part of academic studies in the field of transportation management. Using the appropriate sample size to conduct experiments can save both the data collecting cost and computing time. However, few studies have paid attention to determining the sample size. In this research, we use four typical regression models in machine learning and a dataset from transport infrastructure workers to explore the appropriate sample size. By observing 12 learning curves, we conclude that a sample size of 250 can balance model performance with the cost of data collection. Our study can provide a reference when deciding on the sample size to collect in advance.

Cite

CITATION STYLE

APA

Wang, H., Yi, W., & Liu, Y. (2022). An innovative approach of determining the sample data size for machine learning models: a case study on health and safety management for infrastructure workers. Electronic Research Archive, 30(9), 3452–3462. https://doi.org/10.3934/era.2022176

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free