An LSTM approach to patent classification based on fixed hierarchy vectors

Marawan Shalaby; Jan Stutzki; Matthias Schubert; Stephan Günnemann

Conference Proceedings

An LSTM approach to patent classification based on fixed hierarchy vectors

SIAM International Conference on Data Mining, SDM 2018 (2018) 495-503

DOI: 10.1137/1.9781611975321.56

37Citations

28Readers

Get full text

Abstract

Recently, innovative techniques for text processing like Latent Dirichlet Allocation (LDA) and embedding algorithms like Paragraph Vectors (PV) allowed for improved text classification and retrieval methods. Even though these methods can be adjusted to handle different text collections, they do not take advantage of the fixed document structure that is mandatory in many application areas. In this paper, we focus on patent data which mandates a fixed structure. We propose a new classification method which represents documents as Fixed Hierarchy Vectors (FHV), reflecting the document's structure. FHVs represent a document on multiple levels where each level represents the complete document but with a different local context. Furthermore, we sequentialize this representation and classify documents using LSTM-based architectures. Our experiments show that FHVs provide a richer document representation and that sequential classification improves classification performance when classifying patents into the International Patent Classification (IPC) taxonomy.

Author supplied keywords

Cite

CITATION STYLE

APA

Shalaby, M., Stutzki, J., Schubert, M., & Günnemann, S. (2018). An LSTM approach to patent classification based on fixed hierarchy vectors. In SIAM International Conference on Data Mining, SDM 2018 (pp. 495–503). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611975321.56

An LSTM approach to patent classification based on fixed hierarchy vectors

Abstract

Author supplied keywords

Cite

Register to see more suggestions