Patent keyword extraction algorithm based on distributed representation for patent classification

Jie Hu; Shaobo Li; Yong Yao; Liya Yu; Guanci Yang; Jianjun Hu

Journal ArticleOPEN ACCESS

Patent keyword extraction algorithm based on distributed representation for patent classification

Hu J
Li S
Yao Y
et al.

Entropy (2018) 20(2)

DOI: 10.3390/e20020104

82Citations

132Readers

Abstract

Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this paper, we propose a patent keyword extraction algorithm (PKEA) based on the distributed Skip-gram model for patent classification. We also develop a set of quantitative performance measures for keyword extraction evaluation based on information gain and cross-validation, based on Support Vector Machine (SVM) classification, which are valuable when human-annotated keywords are not available. We used a standard benchmark dataset and a homemade patent dataset to evaluate the performance of PKEA. Our patent dataset includes 2500 patents from five distinct technological fields related to autonomous cars (GPS systems, lidar systems, object recognition systems, radar systems, and vehicle control systems). We compared our method with Frequency, Term Frequency-Inverse Document Frequency (TF-IDF), TextRank and Rapid Automatic Keyword Extraction (RAKE). The experimental results show that our proposed algorithm provides a promising way to extract keywords from patent texts for patent classification.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Hu, J., Li, S., Yao, Y., Yu, L., Yang, G., & Hu, J. (2018). Patent keyword extraction algorithm based on distributed representation for patent classification. Entropy, 20(2). https://doi.org/10.3390/e20020104

Readers' Seniority

PhD / Post grad / Masters / Doc 46

71%

Lecturer / Post doc 11

17%

Researcher 5

Professor / Associate Prof. 3

Readers' Discipline

Computer Science 38

51%

Engineering 23

31%

Business, Management and Accounting 8

11%

Economics, Econometrics and Finance 5

Article Metrics

Mentions

Blog Mentions: 1

Social Media

Shares, Likes & Comments: 7

View details >

Patent keyword extraction algorithm based on distributed representation for patent classification

Abstract

Author supplied keywords

References Powered by Scopus

GloVe: Global vectors for word representation

A Neural Probabilistic Language Model

Automatic Keyword Extraction from Individual Documents

Cited by Powered by Scopus

Survey on supervised machine learning techniques for automatic text classification

A review of text corpus-based tourism big data mining

A survey on deep learning for patent analysis

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline

Article Metrics