Abstract
The data set used by machine learning usually contains missing value and text type data, and sometimes, it is necessary to combine the attributes in the data set. The data set must be cleaned and converted before the machine learning model can be generated. This is frequently a chain of events. The entire processing procedure will be time-consuming and inconvenient. This article examines the data pipeline and recommends that it be used to process all data. We carry out automation and use k-fold cross-validation to evaluate the performance of the model. Experiments demonstrate that it can lower the regression prediction model's root mean square error and enhance prediction accuracy.
Cite
CITATION STYLE
Zhang, H., Zheng, G., Xu, J., & Yao, X. (2022). Research on the Construction and Realization of Data Pipeline in Machine Learning Regression Prediction. Mathematical Problems in Engineering, 2022. https://doi.org/10.1155/2022/7924335
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.