Identification of lost or deserted written texts using Zipf's law with NLTK

Devanshi Gupta; Priyank Singh Hada; Deepankar Mitra; Niket Sharma

Conference Proceedings

Identification of lost or deserted written texts using Zipf's law with NLTK

Smart Innovation, Systems and Technologies (2014) 27(VOL 1) 511-518

DOI: 10.1007/978-3-319-07353-8_59

0Citations

2Readers

Get full text

Abstract

Sometimes it becomes very difficult to identify the valuable text written by some great personalities; especially when the text is not having a signature or the author is anonymous. Deserted manuscripts or documents without a title or heading can be an additional pain. It might happen that the work of dignitaries are lost or only some part of their valuable piece of work is found available in the libraries or with other storage media's. By deploying Zipf's law with the NLTK module available in python, this problem can be solved to a great extent, helping save the originality of the valuable texts and not leaving them unidentified. This can also be helpful in some real time data analysis where frequency plays an important role; plagiarism detection in written texts is one such example. NLTK is a strong toolkit which helps in extracting, segmenting, parsing, tagging and searching etc. of many natural languages with the help of python modules. In this paper it has been tried to combine Zipf's Law with NLTK to come up with a tool to identify the anonymous or deserted valuable texts. © Springer International Publishing Switzerland 2014.

Author supplied keywords

Cite

CITATION STYLE

APA

Gupta, D., Hada, P. S., Mitra, D., & Sharma, N. (2014). Identification of lost or deserted written texts using Zipf’s law with NLTK. In Smart Innovation, Systems and Technologies (Vol. 27, pp. 511–518). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-07353-8_59

Identification of lost or deserted written texts using Zipf's law with NLTK

Abstract

Author supplied keywords

Cite

Register to see more suggestions