Three Real-World Datasets and Neural Computational Models for Classification Tasks in Patent Landscaping

1Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

Abstract

Patent Landscaping, one of the central tasks of intellectual property management, includes selecting and grouping patents according to user-defined technical or application-oriented criteria. While recent transformer-based models have been shown to be effective for classifying patents into taxonomies such as CPC or IPC, there is yet little research on how to support real-world Patent Landscape Studies (PLSs) using natural language processing methods. With this paper, we release three labeled datasets for PLS-oriented classification tasks covering two diverse domains. We provide a qualitative analysis and report detailed corpus statistics. Most research on neural models for patents has been restricted to leveraging titles and abstracts. We compare strong neural and non-neural baselines, proposing a novel model that takes into account textual information from the patents' full texts as well as embeddings created based on the patents' CPC labels. We find that for PLS-oriented classification tasks, going beyond title and abstract is crucial, CPC labels are an effective source of information, and combining all features yields the best results.

Cite

CITATION STYLE

APA

Pujari, S. C., Strötgen, J., Giereth, M., Gertz, M., & Friedrich, A. (2022). Three Real-World Datasets and Neural Computational Models for Classification Tasks in Patent Landscaping. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 11498–11513). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.791

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free