Patent images are very important for patent examiners to understand the contents of an invention. Therefore there is a need for automatic labelling of patent images in order to support patent search tasks. Towards this goal, recent research works propose classification-based approaches for patent image annotation. However, one of the main drawbacks of these methods is that they rely upon large annotated patent image datasets, which require substantial manual effort to be obtained. In this context, the proposed work performs extraction of concepts from patent images building upon a supervised machine learning framework, which is trained with limited annotated data and automatically generated synthetic data. The classification is realised with Random Forests (RF) and a combination of visual and textual features. First, we make use of RF's implicit ability to detect outliers to rid our data of unnecessary noise. Then, we generate new synthetic data cases by means of Synthetic Minority Over-sampling Technique (SMOTE). We evaluate the different retrieval parts of the framework by using a dataset from the footwear domain. The results of the experiments indicate the benefits of using the proposed methodology.
CITATION STYLE
Liparas, D., Moumtzidou, A., Vrochidis, S., & Kompatsiaris, I. (2014). Concept-oriented labelling of patent images based on Random Forests and proximity-driven generation of synthetic data. In V and L Net 2014 - 3rd Annual Meeting of the EPSRC Network on Vision and Language and 1st Technical Meeting of the European Network on Integrating Vision and Language, A Workshop of the 25th International Conference on Computational Linguistics, COLING 2014 - Proceedings (pp. 25–32). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-5404
Mendeley helps you to discover research relevant for your work.