Production data of the cheesemaking process are used to monitor milk fat and protein recoveries in cheese, cheese yield, and composition and eventually to predict these parameters. Due to the large impact of these factors on cheese quality and plant profitability, it is very important to use reliable data for analysis, modeling, and control of the process. This paper tested six methods for detecting erroneous data in industrial cheesemaking databases. The data analyzed came from 4 yr of stirred-curd Cheddar cheese production in an industrial cheesemaking facility, comprising over 10,000 vats. Single vat outliers were detected using a simple statistical criterion of x̄ ± 3.6 SD on single variable distributions, Fourier series modeling of seasonal variables (fat, protein, lactose, and total solids in milk, and protein in whey), and the multivariate Mahalanobis outlier analysis. Detection of outlier productions (corresponding to several vats) was done by applying the x̄ ± 3.6 SD criterion to variables obtained through calculating the fat mass balance, fat retention coefficient, and yield efficiency. Data treatment enabled the detection of outlier data, but also pinpointed variables with a low reliability (manually registered times). Single variable and multivariable methods proved complementary, and the use of both types of methods is recommended when validating an existing database.
CITATION STYLE
Jimenez-Marquez, S. A., Lacroix, C., & Thibault, J. (2002). Statistical data validation methods for large cheese plant database. Journal of Dairy Science, 85(9), 2081–2097. https://doi.org/10.3168/jds.S0022-0302(02)74286-0
Mendeley helps you to discover research relevant for your work.