Automatic segmentation of big data of patent texts

6Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Patent documents are abundant, lengthy and are written in very technical language. Thus, reading and analyzing patent documents can be complex and time consuming. This is where the use of automatic patent segmentation can help. This work attempts to automatically segment the description part of patent texts into semantic sections. Our goal is to develop a robust and scalable segmentation tool for automatic structuring of the patent texts into pre-defined sections that will serve as a pre-processing step to patent text IR(information retrieval) and IE(information extraction) tasks. To do so, an established set of guidelines is exploited for defining the segments in the description part of the patent text. Depending on those guidelines a segmentation tool called PatSeg is developed based on a combination of text mining techniques. A rule-based algorithm is used to identify the headings inside patent text, machine learning technique is used to classify the headings into pre-defined sections, and heuristics are used to identify the sections in patent text that do not contain headings. The performance of our methods achieved up to 94% of accuracy. In addition, we proposed a big data approach based on Hadoop ecosystem modules to apply our methods on the huge amount of patent documents.

Author supplied keywords

Cite

CITATION STYLE

APA

Sofean, M. (2017). Automatic segmentation of big data of patent texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10440 LNCS, pp. 343–351). Springer Verlag. https://doi.org/10.1007/978-3-319-64283-3_25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free