Using LDA and Time Series Analysis for Timestamping Documents

Costin-Gabriel Chiru; Bishnu Sarker

Book Chapter

Using LDA and Time Series Analysis for Timestamping Documents

Chiru C
Sarker B

DOI: 10.1007/978-3-319-55789-2_4

N/ACitations

4Readers

Get full text

Abstract

Identifying the moment of time when a book was published is an important problem that might help solving the problem of authorship identification and could also shed some light into identifying the realities of the human society during different periods of time. In this paper, we present an attempt to estimate the publication date of books based on the time series analysis of their content. The main assumption of this experiment is that the subject of a book is often specific to a time period. Therefore, it is likely to use topic modeling to learn a model that might be used to timestamp different books, given for training many books from similar periods of time. To validate the assumption, we built a corpus of 10 thousand books and used LDA to extract the topics from them. Then, we extracted the time series of particular terms from each topic using Google Books N-gram Corpus. By heuristically combining the words’ time series and the topics from a document, we have built that document’s time series. Finally, we applied peak detection algorithms to timestamp the document.

Cite

CITATION STYLE

APA

Chiru, C.-G., & Sarker, B. (2017). Using LDA and Time Series Analysis for Timestamping Documents (pp. 49–61). https://doi.org/10.1007/978-3-319-55789-2_4

Using LDA and Time Series Analysis for Timestamping Documents

Abstract

Cite

Register to see more suggestions