Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

6Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Due to the expansion of the internet, we encounter various types of big data such as web documents or sensing data. Compared to traditional small data such as experimental samples, big data provide more chances to find hidden and novel patterns with big data analysis using statistics and machine learning algorithms. However, as the use of big data increases, problems also occur. One of them is a zero-inflated problem in structured data preprocessed from big data. Most count values are zeros because a specific word is found in only some documents. In particular, since most of the patent data are in the form of a text document, they are more affected by the zero-inflated problem. To solve this problem, we propose a generation of synthetic samples using statistical inference and tree structure. Using patent document and simulation data, we verify the performance and validity of our proposed method. In this paper, we focus on patent keyword analysis as text big data analysis, and we encounter the zero-inflated problem just like other text data.

Cite

CITATION STYLE

APA

Uhm, D., & Jun, S. (2022). Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples. Future Internet, 14(7). https://doi.org/10.3390/fi14070211

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free