Sometimes it becomes very difficult to identify the valuable text written by some great personalities; especially when the text is not having a signature or the author is anonymous. Deserted manuscripts or documents without a title or heading can be an additional pain. It might happen that the work of dignitaries are lost or only some part of their valuable piece of work is found available in the libraries or with other storage media's. By deploying Zipf's law with the NLTK module available in python, this problem can be solved to a great extent, helping save the originality of the valuable texts and not leaving them unidentified. This can also be helpful in some real time data analysis where frequency plays an important role; plagiarism detection in written texts is one such example. NLTK is a strong toolkit which helps in extracting, segmenting, parsing, tagging and searching etc. of many natural languages with the help of python modules. In this paper it has been tried to combine Zipf's Law with NLTK to come up with a tool to identify the anonymous or deserted valuable texts. © Springer International Publishing Switzerland 2014.
CITATION STYLE
Gupta, D., Hada, P. S., Mitra, D., & Sharma, N. (2014). Identification of lost or deserted written texts using Zipf’s law with NLTK. In Smart Innovation, Systems and Technologies (Vol. 27, pp. 511–518). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-07353-8_59
Mendeley helps you to discover research relevant for your work.