Studying Linguistic Changes over 200 Years of Newspapers through Resilient Words Analysis

  • Buntinx V
  • Bornet C
  • Kaplan F
N/ACitations
Citations of this article
24Readers
Mendeley users who have this article in their library.

Abstract

This paper presents a methodology to analyze linguistic changes in a given textual corpus allowing to overcome two common problems related to corpus linguistics studies. One of these issues is the monotonic increase of the corpus size with time and the other one is the presence of noise in the textual data. In addition, our method allows to better target the linguistic evolution of the corpus, instead of other aspects like noise fluctuation or topics evolution. A corpus formed by two newspapers, "La Gazette de Lausanne" and "Le Journal de Gen\`eve", is used, providing 4 million articles from 200 years of archives. We first perform some classical measurements on this corpus in order to provide indicators and visualizations of linguistic evolution. We then define the concept of a lexical kernel and word resilience, to face the two challenges of noises and corpus size fluctuations. This paper ends with a discussion based on the comparison of results from linguistic change analysis and a concludes with possible future works continuing in that direction.

Cite

CITATION STYLE

APA

Buntinx, V., Bornet, C., & Kaplan, F. (2017). Studying Linguistic Changes over 200 Years of Newspapers through Resilient Words Analysis. Frontiers in Digital Humanities, 4. https://doi.org/10.3389/fdigh.2017.00002

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free