On Defining Rules for Cancer Data Fabrication

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data is essential for machine learning projects, and data accuracy is crucial for being able to trust the results obtained from the associated machine learning models. Previously, we have developed machine learning models for predicting the treatment outcome for breast cancer patients that have undergone chemotherapy, and developed a monitoring system for their treatment timeline showing interactively the options and associated predictions. Available cancer datasets, such as the one used earlier, are often too small to obtain significant results, and make it difficult to explore ways to improve the predictive capability of the models further. In this paper, we explore an alternative to enhance our datasets through synthetic data generation. From our original dataset, we extract rules to generate fabricated data that capture the different characteristics inherent in the dataset. Additional rules can be used to capture general medical knowledge. We show how to formulate rules for our cancer treatment data, and use the IBM solver to obtain a corresponding synthetic dataset. We discuss challenges for future work.

Cite

CITATION STYLE

APA

Bowles, J. K. F., Silvina, A., Bin, E., & Vinov, M. (2020). On Defining Rules for Cancer Data Fabrication. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12173 LNCS, pp. 168–176). Springer. https://doi.org/10.1007/978-3-030-57977-7_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free