Identification of lost or deserted written texts using Zipf's law with NLTK

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Sometimes it becomes very difficult to identify the valuable text written by some great personalities; especially when the text is not having a signature or the author is anonymous. Deserted manuscripts or documents without a title or heading can be an additional pain. It might happen that the work of dignitaries are lost or only some part of their valuable piece of work is found available in the libraries or with other storage media's. By deploying Zipf's law with the NLTK module available in python, this problem can be solved to a great extent, helping save the originality of the valuable texts and not leaving them unidentified. This can also be helpful in some real time data analysis where frequency plays an important role; plagiarism detection in written texts is one such example. NLTK is a strong toolkit which helps in extracting, segmenting, parsing, tagging and searching etc. of many natural languages with the help of python modules. In this paper it has been tried to combine Zipf's Law with NLTK to come up with a tool to identify the anonymous or deserted valuable texts. © Springer International Publishing Switzerland 2014.

Author supplied keywords

Cite

CITATION STYLE

APA

Gupta, D., Hada, P. S., Mitra, D., & Sharma, N. (2014). Identification of lost or deserted written texts using Zipf’s law with NLTK. In Smart Innovation, Systems and Technologies (Vol. 27, pp. 511–518). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-07353-8_59

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free