Machines Learn Better with Better Data Ontology: Lessons from Philosophy of Induction and Machine Learning Practice

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As scientists start to adopt machine learning (ML) as one research tool, the security of ML and the knowledge generated become a concern. In this paper, I explain how supervised ML can be improved with better data ontology, or the way we make categories and turn information into data. More specifically, we should design data ontology in such a way that is consistent with the knowledge that we have about the target phenomenon so that such ontology can help us make the inductive leap. I do so by thinking through a thought experiment, Goodman’s New Riddle of Induction (Fact, fiction, and forecast, Harvard University Press, 1955). Goodman’s riddle helps flesh out three problems of induction: (1) the problem of equal goodies, that there are often too many equally good inductive results given the same data; (2) the problem of diverging performance, that these equally good results can give opposite predictions in the future; and (3) the problem of mediocrity, that when averaged across all equally possible datasets and tasks, no inductive algorithm outperforms any other. I show that all these three problems are manifested as real obstacles in ML practice, namely, the Rashomon effect (Breiman in Stat Sci 16(3):199–231, 2001), the problem of underspecification (D’Amour et al. in J Mach Learn Res, 2020, https://doi.org/10.48550/arXiv.2011.03395), and the No Free Lunch theorem (Wolpert in Neural Comput 8(7):1341–90, 1996, https://doi.org/10.1162/neco.1996.8.7.1341). Lastly, I argue that proper data ontology can help mitigate these problems and I demonstrate how using concrete examples from climate science. This research highlights the links between philosophers’ discussions of induction and implications in ML practice.

Cite

CITATION STYLE

APA

Li, D. (2023). Machines Learn Better with Better Data Ontology: Lessons from Philosophy of Induction and Machine Learning Practice. Minds and Machines, 33(3), 429–450. https://doi.org/10.1007/s11023-023-09639-9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free