Neural networks can be used to fit complex models to high dimensional data. High dimensionality often obscures the fact that the model overfits the data and it often arises in the publication industry because we are usually interested in a large number of concepts; for example, a moderate thesaurus will contain thousands of concepts. In addition, the discovery of ideas, sentiments, tendencies, and context requires that our modelling algorithms be aware of many different features such as the words themselves, length of sentences (and paragraphs), word frequency counts, phrases, punctuation, number of references, and links. Overfitting can be counterbalanced by Regularization, but the latter can also cause problems. This paper attempts to clarify the concepts of 'overfitting' and 'regularization' using two-dimensional graphs that demonstrate over fitting and how regularization can force a smoother fit to noisy data.
CITATION STYLE
Vasicek, D. (2020). Artificial intelligence and machine learning: Practical aspects of overfitting and regularization. Information Services and Use, 39(4), 281–289. https://doi.org/10.3233/ISU-190059
Mendeley helps you to discover research relevant for your work.