Automatic Dating of Documents and Temporal Text Classification

Angelo Dalli; Yorick Wilks

Conference Proceedings

Automatic Dating of Documents and Temporal Text Classification

COLING ACL 2006 - ARTE Annotating and Reasoning about Time and Events, Proceedings of the Workshop (2006) 17-22

DOI: 10.3115/1629235.1629238

25Citations

92Readers

Get full text

Abstract

The frequency of occurrence of words in natural languages exhibits a periodic and a non-periodic component when analysed as a time series. This work presents an unsupervised method of extracting periodicity information from text, enabling time series creation and filtering to be used in the creation of sophisticated language models that can discern between repetitive trends and non-repetitive writing patterns. The algorithm performs in O(n log n) time for input of length n. The temporal language model is used to create rules based on temporal-word associations inferred from the time series. The rules are used to guess automatically at likely document creation dates, based on the assumption that natural languages have unique signatures of changing word distributions over time. Experimental results on news items spanning a nine year period show that the proposed method and algorithms are accurate in discovering periodicity patterns and in dating documents automatically solely from their content.

Cite

CITATION STYLE

APA

Dalli, A., & Wilks, Y. (2006). Automatic Dating of Documents and Temporal Text Classification. In COLING ACL 2006 - ARTE Annotating and Reasoning about Time and Events, Proceedings of the Workshop (pp. 17–22). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1629235.1629238

Automatic Dating of Documents and Temporal Text Classification

Abstract

Cite

Register to see more suggestions