Three Real-World Datasets and Neural Computational Models for Classification Tasks in Patent Landscaping

Subhash Chandra Pujari; Jannik Strötgen; Mark Giereth; Michael Gertz; Annemarie Friedrich

Conference ProceedingsOPEN ACCESS

Three Real-World Datasets and Neural Computational Models for Classification Tasks in Patent Landscaping

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (2022) 11498-11513

DOI: 10.18653/v1/2022.emnlp-main.791

1Citations

22Readers

Abstract

Patent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics. Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents' full texts as well as embeddings created based on the patents' CPC labels. We find that for PLS-oriented classification tasks, going beyond title and abstract is crucial, CPC labels are an effective source of information, and combining all features yields the best results.

Cite

CITATION STYLE

APA

Pujari, S. C., Strötgen, J., Giereth, M., Gertz, M., & Friedrich, A. (2022). Three Real-World Datasets and Neural Computational Models for Classification Tasks in Patent Landscaping. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 11498–11513). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.791

Three Real-World Datasets and Neural Computational Models for Classification Tasks in Patent Landscaping

Abstract

Cite

Register to see more suggestions