Recently, innovative techniques for text processing like Latent Dirichlet Allocation (LDA) and embedding algorithms like Paragraph Vectors (PV) allowed for improved text classification and retrieval methods. Even though these methods can be adjusted to handle different text collections, they do not take advantage of the fixed document structure that is mandatory in many application areas. In this paper, we focus on patent data which mandates a fixed structure. We propose a new classification method which represents documents as Fixed Hierarchy Vectors (FHV), reflecting the document's structure. FHVs represent a document on multiple levels where each level represents the complete document but with a different local context. Furthermore, we sequentialize this representation and classify documents using LSTM-based architectures. Our experiments show that FHVs provide a richer document representation and that sequential classification improves classification performance when classifying patents into the International Patent Classification (IPC) taxonomy.
CITATION STYLE
Shalaby, M., Stutzki, J., Schubert, M., & Günnemann, S. (2018). An LSTM approach to patent classification based on fixed hierarchy vectors. In SIAM International Conference on Data Mining, SDM 2018 (pp. 495–503). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611975321.56
Mendeley helps you to discover research relevant for your work.