Calculating error bars on inferences from web data

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

. In this work, we explore uncertainty in automated question answering over real-valued data from knowledge bases on the Internet. We argue that the coefficient of variation (cov ) is an intuitive and general form in which to express this uncertainty, with the added advantage, it can be calculated exactly and efficiently. The large amounts of data on the Internet presents a good opportunity to answer queries that go beyond simply looking up facts and returning them. However, such data is often vague and noisy. For discrete results, e.g. stating that a particular city is the capital of a particular country, probabilities are a natural way to assign uncertainty to answers. For continuous variables or quantities that are typically treated as continuous (such as populations of coun-tries), probabilities are uninformative, being infinitesimal. For instance, the probability that the population of India is exactly equal to last census count is effectively zero. Our aim is to capture uncertainty in these estimates in an intuitive, uniform, and computationally efficient way. We present initial efforts at automating the inference process over real-valued web data while accounting for some of the typical sources of uncertainty: noisy data and errors from inference operations. Having considered several problem domains and query types, we find that approximating all continuous random variables with Gaussian distributions, and communicating uncertainties to users as coefficients of variation. Our experiments show that the estimates of uncertainty derived by our method are well-calibrated and correlate with the actual deviations from the true answer. An immediate benefit of our approach is that our inference framework can attach credible intervals to real-valued answers that it infers. This conveys to a user the plausible magnitudes of the error in the answer, a meaningful measure of uncertainty compared to ranking scores provided in other question answering systems.

Cite

CITATION STYLE

APA

Nuamah, K., & Bundy, A. (2018). Calculating error bars on inferences from web data. In Advances in Intelligent Systems and Computing (Vol. 869, pp. 618–640). Springer Verlag. https://doi.org/10.1007/978-3-030-01057-7_48

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free