Abstract
This contribution compares statistical analysis and deep learning approaches to textual data. The extraction of key passages using statistics and deep learning is implemented using the Hyperbase software. An evaluation of the underlying calculations is given by using examples from two different languages—French and Latin. Our hypothesis is that deep learning is not only sensitive to word frequency but also to more complex phenomena containing linguistic features that pose problems for statistical approaches. These linguistic patterns, also known as motives Mellet and Longrée (Belg J Linguist 23:161–173, 2009 [9]), are essential for highlighting key passages. If confirmed, this hypothesis would provide us with a better understanding of the deep learning black box. Moreover, it would bring new ways of understanding and interpreting texts. Thus, this paper introduces a novel approach to explore the hidden layers of a convolutional neural network, trying to explain which are the relevant linguistic features used by the network to perform the classification task. This explanation attempt is the major contribution of this work. Finally, in order to show the potential of our deep learning approach, when testing it on the two corpora (French and Latin), we compare the obtained linguistic features with those highlighted by a standard text mining technique (z-score computing).
Cite
CITATION STYLE
Vanni, L., Corneli, M., Longrée, D., Mayaffre, D., & Precioso, F. (2020). Key Passages: From Statistics to Deep Learning. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 41–53). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-52680-1_4
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.