Comparison of supervised learning statistical methods for classifying commercial beers and identifying patterns

8Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this study, 13 properties (alcohol-, real extract-, flavonoid-, anthocyanin, glucose, fructose, maltose, sucrose content, EBC [European Brewery Convention] and L*a*b* color, bitterness) of 21 beers (alcohol-free pale lagers, alcohol-free beer-based mixed drinks, beer-based mixed drinks, international lagers, wheat beers, stouts, fruit beers) were determined. In the first step, multiple factor analysis (MFA) was performed for the whole data and five clusters (target classes) were determined; then, a bootstrapping was applied to establish a balanced data so as every cluster should contain 100 samples and the total sample size is 500. In the second step, 12 supervised learning algorithms (random trees [RND], Quinlan's C4.5 decision tree algorithm [C4.5], Iterative Dichotomiser 3 algorithm [ID3], cost-sensitive decision tree algorithm [CSMC4], cost-sensitive classification tree [CSCRT], k-nearest neighbors algorithm [KNN], radial basis function [RBF], multilayer perceptron neural network [MLP], prototype nearest neighbor [PNN], linear discriminant analysis [LDA], naïve Bayes with continuous variables [NBC], partial least squares discriminant analysis [PLS-DA]) were applied to classify each brand into the target classes. Furthermore, several error rates were calculated: re-substitution error rate (RER), cross-validated error rate (CV), bootsrap error (BOOT), leave-one-out (LOO), and train-test error rate (TRAIN). The MFA could discriminate five groups, which can be characterized by some analytical parameters, and the other multivariate methods performed similarly. The methods can be discriminated best based on the BOOT, CV, and LOO. The best estimation methods are the C4.5, CSMC4, and CSCRT; these performed best along the flavonoid content and EBC color. It identified that the methods most sensitive to the properties are the NBC. The classification ability fluctuated greatly in the case of three properties (glucose, maltose, sucrose). A remarkable fluctuation has been experienced in the case of L*a*b* color parameters, flavonoid content, EBC color, and bitterness by NBC method.

Cite

CITATION STYLE

APA

Koren, D., Lőrincz, L., Kovács, S., Kun-Farkas, G., Vecseriné Hegyes, B., & Sipos, L. (2020). Comparison of supervised learning statistical methods for classifying commercial beers and identifying patterns. Journal of Chemometrics, 34(4). https://doi.org/10.1002/cem.3216

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free