We focus on automatically finding similar documents using coherent chunks. The similarity between the documents is determined by identifying the coherent chunks present in them. We apply linguistic rules in identifying the coherent chunks and uses Vector Space Model (VSM) in determining the similarity among documents. We have taken patent documents from USPTO for this work. This method of using coherent chunks for identifying similar documents has shown encouraging results. © 2009 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Lalitha Devi, S., Kuppan, S., Venkataswamy, K., & Rao, P. R. K. (2009). Identification of similar documents using coherent chunks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5847 LNAI, pp. 54–68). https://doi.org/10.1007/978-3-642-04975-0_5
Mendeley helps you to discover research relevant for your work.