Sign up & Download
Sign in

Artificial Intelligence for Artificial Artificial Intelligence

by Daniel S Weld
Artificial Intelligence (2011)

Abstract

This article discusses the replacement of computers with human for certain tasks. Many tasks still baffle even the most powerful electronic brain. AskForCents, a computer service which provides internet users with various types of information, is an example of this replacement. Members of the website can ask and answer questions without the inflexibility of an algorithm-driven system. The premise is that humans are vastly superior to computers at tasks such as pattern recognition.

Cite this document (BETA)

Available from citeseerx.ist.psu.edu
Page 1
hidden

Artificial Intelligence for Artificial Artificial Intelligence

Artificial Intelligence for Artificial Artificial Intelligence
Peng Dai Mausam Daniel S. Weld
Dept of Computer Science and Engineering
University of Washington
Seattle, WA-98195
fdaipeng,mausam,weldg@cs.washington.edu
Abstract
Crowdsourcing platforms such as Amazon Mechanical Turk
have become popular for a wide variety of human intelligence
tasks; however, quality control continues to be a significant
challenge. Recently, we propose TURKONTROL, a theoret-
ical model based on POMDPs to optimize iterative, crowd-
sourced workflows. However, they neither describe how to
learn the model parameters, nor show its effectiveness in a
real crowd-sourced setting. Learning is challenging due to
the scale of the model and noisy data: there are hundreds of
thousands of workers with high-variance abilities.
This paper presents an end-to-end system that first learns
TURKONTROL’s POMDP parameters from real Mechanical
Turk data, and then applies the model to dynamically opti-
mize live tasks. We validate the model and use it to control
a successive-improvement process on Mechanical Turk. By
modeling worker accuracy and voting patterns, our system
produces significantly superior artifacts compared to those
generated through nonadaptive workflows using the same
amount of money.
Introduction
Within just a few years of their introduction, crowdsourcing
marketplaces, such as Amazon Mechanical Turk1, have be-
come an integral component in the arsenal of an online appli-
cation designer. These have spawned several new companies
such as CrowdFlower, CastingWords, and led to creative ap-
plications, e.g., helping blind people shop or localize in a
new environment [1]. The availability of hundreds of thou-
sands of workers allows a steady stream of output. Unfor-
tunately, the workers also come with hugely varied skill sets
and motivation levels. Thus, quality control of the worker
output continues to be a serious challenge.
To work around the variability in worker accuracy, peo-
ple design workflows, flowcharts connecting sequences of
primitive steps, where a step may be performed by multi-
ple workers, thus improving overall quality. For example,
CastingWords employs a proprietary workflow for the task
of audio transcription. Recently, Little et al. [8] achieve im-
pressive results using a workflow of iterative improvement
for several tasks such as handwriting recognition and writ-
ing a text description for an image. In this workflow (see
Copyright c
2011, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
1Amazon uses the tagline “Artificial Artificial Intelligence”
Figure 1: Flowchart for the iterative text improvement task,
reprinted from [8].
Figure 1), the work by one worker goes through several im-
provement iterations; each iteration comprising an improve-
ment phase (where previous work is improved by a worker)
and an evaluation phase (where voters choosing between the
improved work and the previous work through ballots – the
‘vote’ step in Figure 1). In essence, these workflows embody
a novel type of collaboration between workers; one that gen-
erates high quality work.
These workflows typically have several decision points,
e.g., for iterative improvement one must decide how many
evaluation votes to obtain and whether to repeat the im-
provement loop. From an AI perspective, this offers a new,
exciting and impactful application area for intelligent con-
trol. Recently we [3] propose a POMDP formulation of the
workflow control problem and show that TURKONTROL, the
decision-theoretic controller, obtains higher quality outputs
in a simulated environment. However, our previous work is
primarily theoretical and provides no methods to learn the
several distributions in the POMDP model. Nor does it pro-
vide strong evidence that the approach actually works in a
real crowdsourced environment.
In this paper, we implement an end-to-end system that
first learns parameters for TURKONTROL using real data
from Mechanical Turk. This learning is challenging because
of a large number of parameters and sparse and noisy train-
ing data. To make our problem feasible we choose specific
parametric distributions to reduce the parameter space, and
learn the improvement and ballot parameters independently.
We validate the learned parameters in a simple voting task
and observe that the model needs only half the votes com-
pared to the commonly used majority baseline. This sug-
gests the effectiveness of the model and our parameters.
We then employ TURKONTROL with our learned param-
eters to control the iterative improvement workflow for the
image description task on Mechanical Turk. TURKONTROL
can exploit knowledge of individual worker accuracies;
however, it does not need such information and incremen-
Page 2
hidden
tally updates its model of each worker as she completes
each job. We compare our AI-controlled, dynamic work-
flows with a nonadaptive workflow that spends the same
amount of money. The results demonstrate that our system
obtains an 11% improvement in the average quality of im-
age descriptions. Our results are statistically significant with
p < 0.01. More interestingly, to achieve the same quality, a
nonadaptive workflow spends 28.7% more money, as quality
improvement is not linear in the amount of cost.
Background
Iterative Improvement Workflow. Little et al. [8] design
the iterative improvement workflow to get high-quality re-
sults from the crowd. As shown in Figure 1, the work created
by the first worker goes through several improvement itera-
tions; each iteration comprising an improvement and a ballot
phase. In the improvement phase, an improvement job, solic-
its a0, an improvement of the current artifact a (e.g., the cur-
rent image description). In the ballot phase, several workers
respond to a ballot job, in which they vote on the better of the
two artifacts (the current one and its improvement). Based
on majority vote, the better one is chosen as the current ar-
tifact for next iteration. This process repeats until the total
cost allocated to the particular task is exhausted.
POMDP. A partially-observable Markov decision process
(POMDP) [7] is a widely-used formulation to represent se-
quential decision problems under partial information. An
agent, the decision maker, tracks the world state and faces
the decision task of picking an action. Performing the action
transitions the world to a new state. The transitions between
states are probabilistic and Markovian, i.e., the next state
only depends on the current state and action. The state in-
formation is unknown to the agent, but she can infer a belief,
the probability distribution of possible states, from observ-
ing the world.
Controlling A Crowd-Sourced Workflow. There are var-
ious decision points in executing an iterative improvement
process, such as which artifact to select, when to start a new
improvement iteration, when to terminate the job, etc. We re-
cently [3] introduce TURKONTROL, a POMDP based agent
that controls the workflow, i.e., makes these decisions auto-
matically. The world state includes the quality of the current
artifact, q 2 [0, 1], and q0 of the improved artifact; true q and
q0 are hidden and the controller can only track a belief about
them. Intuitively, the extreme value of 0 (or 1) represents
the idealized condition that all (or no) diligent workers will
be able to improve the artifact. We use Q and Q0 to denote
the random variables that generate q and q0.
Different workers may have different skills in improving
an artifact. A conditional distribution function, fQ0x jq, ex-
presses the probability density of the quality of a new arti-
fact when an artifact of quality q is improved by worker x.
The worker-independent distribution function, fQ0 jq, acts as
a prior in cases where a previously unseen worker is encoun-
tered. The ballot job compares the two artifacts; intuitively,
if the two artifacts have qualities close to each other then the
ballot job is harder. We define intrinsic difficulty of a ballot
job as d = 1 jq q0jM, where M is a trained constant.

𝑑
𝑤
𝛤 𝛾 𝑏
n
m
Figure 2: A plate model of ballot jobs; b represents the ballot
outcome; g, a worker’s individual error parameter; d, the difficulty
of the job and w, truth value of the job. G is the prior on workers’
errors. Shaded nodes represent observed variables.
Given the difficulty d, ballots of two workers are condition-
ally independent to each other. We assume that the accuracy
of worker x follows a(d,gx) = 12 [1+ (1 d)
gx ], where gx
is x’s error parameter; higher gx signifies that x makes more
errors.
Previously we discuss several POMDP algorithms to con-
trol the workflow including limited look-ahead, UCT, etc.
While simulation results suggest benefits of our model, we
do not discuss any approaches to learn these complex distri-
butions, nor implement our techniques on a real platform to
prove that the simplifying assumptions, formulae, and sim-
ulated gains hold in practice.
Model Learning
In order to estimate TURKONTROL’s POMDP model, one
must learn two probabilistic transition functions. The first
function is the probability of a worker x answering a ballot
question correctly, which is controlled by the error param-
eter gx of the worker. The second function estimates the
quality of an improvement result, the new artifact returned
by a worker.
Learning the Ballot Model
Figure 2 presents our generative model of ballot jobs; shaded
variables are observed. We seek to learn the error parame-
ters ~g where gx is parameter for the xth worker and use the
mean g¯ as an estimate for future, unseen workers. To gen-
erate training data for our task we select m pairs of artifacts
and post n copies of a ballot job which asks the workers
to choose between these pairs. We use bi,x to denote xth
worker’s ballot on the ith question. Let wi = true( f alse) if
the first artifact of the ith pair is (not) better than the second,
and di denote the difficulty of answering such a question.
We assume the error parameters are generated by a ran-
dom variable G. The ballot answer of each worker directly
depends on her error parameter, as well as the difficulty of
the job, d, and its real truth value, w. For our learning prob-
lem, we collect w and d for the m ballot questions from the
consensus of three human experts and treat these values as
observed. In our experiments we assume a uniform prior of
G, though our model can incorporate more informed priors2.
Our aim is to estimate gx parameters – we use the standard
maximum likelihood approach. We use vector notation with
2We also tried priors that penalize extreme values but that did
not help in our experiments.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

20 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
40% Ph.D. Student
 
15% Assistant Professor
 
15% Researcher (at a non-Academic Institution)
by Country
 
40% United States
 
10% India
 
5% Japan