We consider the problem of representing patent documents in such a way that a kernel matrix reflecting the similarities of the documents can be efficiently computed. The European classification system ECLA is a deep level hierarchical taxonomy comprising about 130,000 classification symbols. Depending on their technical content, patent documents are assigned one or more ECLA classification symbols. In this study we represent the complete ECLA taxonomy as a tree labelled by the classification symbols, called the ECLA tree. Within the ECLA tree a positive value is attached to each node of the tree reflecting the technical specificity of the corresponding classification symbol. Based on the directly assigned symbols as well as on symbols of the cited and citing documents, patent documents are mapped to subtrees of the ECLA tree. Taking into account the specificity of the tree nodes, we define an inner product on subtrees representing the documents. It is shown that the inner product is a valid kernel function which can be effectively used for discovering clusters in a set of patent documents. © Springer-Verlag Berlin Heidelberg 2010.
CITATION STYLE
Arndt, M., & Arndt, U. (2010). A tree kernel based on classification and citation data to analyse patent documents. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 571–578). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-642-10745-0_62
Mendeley helps you to discover research relevant for your work.