Using LDA and Time Series Analysis for Timestamping Documents

  • Chiru C
  • Sarker B
N/ACitations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Identifying the moment of time when a book was published is an important problem that might help solving the problem of authorship identification and could also shed some light into identifying the realities of the human society during different periods of time. In this paper, we present an attempt to estimate the publication date of books based on the time series analysis of their content. The main assumption of this experiment is that the subject of a book is often specific to a time period. Therefore, it is likely to use topic modeling to learn a model that might be used to timestamp different books, given for training many books from similar periods of time. To validate the assumption, we built a corpus of 10 thousand books and used LDA to extract the topics from them. Then, we extracted the time series of particular terms from each topic using Google Books N-gram Corpus. By heuristically combining the words’ time series and the topics from a document, we have built that document’s time series. Finally, we applied peak detection algorithms to timestamp the document.

Cite

CITATION STYLE

APA

Chiru, C.-G., & Sarker, B. (2017). Using LDA and Time Series Analysis for Timestamping Documents (pp. 49–61). https://doi.org/10.1007/978-3-319-55789-2_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free