A tree kernel based on classification and citation data to analyse patent documents

Markus Arndt; Ulrich Arndt

Conference Proceedings

A tree kernel based on classification and citation data to analyse patent documents

Studies in Classification, Data Analysis, and Knowledge Organization (2010) 571-578

DOI: 10.1007/978-3-642-10745-0_62

0Citations

2Readers

Get full text

Abstract

We consider the problem of representing patent documents in such a way that a kernel matrix reflecting the similarities of the documents can be efficiently computed. The European classification system ECLA is a deep level hierarchical taxonomy comprising about 130,000 classification symbols. Depending on their technical content, patent documents are assigned one or more ECLA classification symbols. In this study we represent the complete ECLA taxonomy as a tree labelled by the classification symbols, called the ECLA tree. Within the ECLA tree a positive value is attached to each node of the tree reflecting the technical specificity of the corresponding classification symbol. Based on the directly assigned symbols as well as on symbols of the cited and citing documents, patent documents are mapped to subtrees of the ECLA tree. Taking into account the specificity of the tree nodes, we define an inner product on subtrees representing the documents. It is shown that the inner product is a valid kernel function which can be effectively used for discovering clusters in a set of patent documents. © Springer-Verlag Berlin Heidelberg 2010.

Cite

CITATION STYLE

APA

Arndt, M., & Arndt, U. (2010). A tree kernel based on classification and citation data to analyse patent documents. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 571–578). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-642-10745-0_62

A tree kernel based on classification and citation data to analyse patent documents

Abstract

Cite

Register to see more suggestions