Abstract
Numerical experiment is an essential part of academic studies in the field of transportation management. Using the appropriate sample size to conduct experiments can save both the data collecting cost and computing time. However, few studies have paid attention to determining the sample size. In this research, we use four typical regression models in machine learning and a dataset from transport infrastructure workers to explore the appropriate sample size. By observing 12 learning curves, we conclude that a sample size of 250 can balance model performance with the cost of data collection. Our study can provide a reference when deciding on the sample size to collect in advance.
Author supplied keywords
Cite
CITATION STYLE
Wang, H., Yi, W., & Liu, Y. (2022). An innovative approach of determining the sample data size for machine learning models: a case study on health and safety management for infrastructure workers. Electronic Research Archive, 30(9), 3452–3462. https://doi.org/10.3934/era.2022176
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.