Synthetic Data and Its Evaluation Metrics for Machine Learning

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Artificial Intelligence (AI) has become the key driving force in Industrial Automation. Machine learning (ML) and Deep Learning (DL) can be considered to be the components of AI which rely on data for model training. Data generation has increased due to the Internet, connected devices, mobile devices and social networking which in turn have also given rise to cybercrime and cyber thefts. To prevent those and preserve the identity of individuals in the public data, government and policymakers have put stringent privacy-preserving laws. The economy of data collection, quality of data in the public domain, and data bias have made data accessibility and its usage a challenge for AI/ML training for research work or industrial purposes. This has forced researchers to look into the alternative. Synthetic Data offers a promising solution to overcome the data challenges. The last few years have seen many studies conducted to verify the utility and privacy protection capability of synthetic data. However, all of these have been exploratory. This paper focuses on various methods of synthetic data generation and their validation metrics. It opens up a few questions that need further study before we conclude that synthetic data offers a universal solution for AI and ML.

Cite

CITATION STYLE

APA

Kiran, A., & Kumar, S. S. (2023). Synthetic Data and Its Evaluation Metrics for Machine Learning. In Smart Innovation, Systems and Technologies (Vol. 324, pp. 485–494). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-19-7447-2_43

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free