Human Intelligence Needs Artificial Intelligence
Available from scholar.google.com
Page 1
Human Intelligence Needs Artificial Intelligence
Human Intelligence Needs Artificial Intelligence
Daniel S. Weld Mausam Peng Dai
Dept of Computer Science and Engineering
University of Washington
Seattle, WA-98195
fweld,mausam,daipengg@cs.washington.edu
Abstract
Crowdsourcing platforms, such as Amazon Mechanical Turk,
have enabled the construction of scalable applications for
tasks ranging from product categorization and photo tagging
to audio transcription and translation. These vertical appli-
cations are typically realized with complex, self-managing
workflows that guarantee quality results. But constructing
such workflows is challenging, with a huge number of alter-
native decisions for the designer to consider.
We argue the thesis that “Artificial intelligence methods can
greatly simplify the process of creating and managing com-
plex crowdsourced workflows.” We present the design of
CLOWDER, which uses machine learning to continually re-
fine models of worker performance and task difficulty. Us-
ing these models, CLOWDER uses decision-theoretic opti-
mization to 1) choose between alternative workflows, 2) opti-
mize parameters for a workflow, 3) create personalized inter-
faces for individual workers, and 4) dynamically control the
workflow. Preliminary experience suggests that these opti-
mized workflows are significantly more economical (and re-
turn higher quality output) than those generated by humans.
Introduction
Crowd-sourcing marketplaces, such as Amazon Mechani-
cal Turk, have the potential to allow rapid construction of
complex applications which mix human computation with
AI and other automated techniques. Example applications
already span the range from product categorization [2],
photo tagging [24], business listing verifications [16] to au-
dio/video transcription [17; 23], proofreading [19] and trans-
lation [20].
In order to guarantee quality results from potentially
error-prone workers, most applications use complex, self-
managing workflows with independent production and re-
view stages. For example, iterative improvement [14] and
find-fix-verify workflows [1] are popular patterns. But de-
vising these patterns and adapting them to a new task is both
complex and time consuming. Existing development envi-
ronments, e.g. Turkit [14] simplify important issues, such
as control flow and debugging, but many challenges remain.
For example, in order to craft an effective application, the
designer must:
Choose between alternative workflows for accomplish-
ing the task. For example, given the task of transcribing
an MP3 file, one could ask a worker to do the transcrip-
tion, or first use speech recognition and then ask work-
ers to find and fix errors. Depending on the accuracy
and costs associated with these primitive steps, one or the
other workflow may be preferable.
Optimize the parameters for a selected workflow. Sup-
pose one has selected the workflow which uses a single
worker to directly transcribe the file; before one can start
execution, one must determine the value of continuous pa-
rameters, such as the price, the length of the audio file,
etc.. If the audio track is cut into snippets which are too
long, then transcription speed may fall, since workers of-
ten prefer short jobs. But if the audio track is cut into
many short files, then accuracy may fall because of lost
context for the human workers. A computer can method-
ically try different parameter values to find the best.
Create tuned interfaces for the expected workers. The
precise wording, layout and even color of an interface can
dramatically affect the performance of users. One can use
Fitt’s Law or alternative cost models to automatically de-
sign effective interfaces [7]. Comprehensive “A-B” test-
ing of alternative designs, automated by computer, is also
essential [12].
Control execution of the final workflow. Some deci-
sions, for example the number of cycles in an iterative
improvement workflow and the number of voters used
for verification, can not be optimally determined a priori.
Instead, decision-theoretic methods, which incorporate a
model of worker accuracy, can dramatically improve on
naive strategies such as majority vote [3].
Our long-term goal is to prove the value of AI methods
on these problems and to build intelligent tools that fa-
cilitate the rapid construction of effective crowd-sourced
workflows. Our first system, TURKONTROL [3; 4], used a
partially-observable Markov decision process (POMDP) to
perform decision-theoretic optimization of iterative, crowd-
sourced workflows. This paper presents the design of our
second system, CLOWDER1, which we are just starting to
implement. We start by summarizing the high-level archi-
tecture of CLOWDER. Subsequent sections detail the AI rea-
1It is said that nothing is as difficult as herding cats, but maybe
decision theory is up to the task? A clowder is a group of cats.
Daniel S. Weld Mausam Peng Dai
Dept of Computer Science and Engineering
University of Washington
Seattle, WA-98195
fweld,mausam,daipengg@cs.washington.edu
Abstract
Crowdsourcing platforms, such as Amazon Mechanical Turk,
have enabled the construction of scalable applications for
tasks ranging from product categorization and photo tagging
to audio transcription and translation. These vertical appli-
cations are typically realized with complex, self-managing
workflows that guarantee quality results. But constructing
such workflows is challenging, with a huge number of alter-
native decisions for the designer to consider.
We argue the thesis that “Artificial intelligence methods can
greatly simplify the process of creating and managing com-
plex crowdsourced workflows.” We present the design of
CLOWDER, which uses machine learning to continually re-
fine models of worker performance and task difficulty. Us-
ing these models, CLOWDER uses decision-theoretic opti-
mization to 1) choose between alternative workflows, 2) opti-
mize parameters for a workflow, 3) create personalized inter-
faces for individual workers, and 4) dynamically control the
workflow. Preliminary experience suggests that these opti-
mized workflows are significantly more economical (and re-
turn higher quality output) than those generated by humans.
Introduction
Crowd-sourcing marketplaces, such as Amazon Mechani-
cal Turk, have the potential to allow rapid construction of
complex applications which mix human computation with
AI and other automated techniques. Example applications
already span the range from product categorization [2],
photo tagging [24], business listing verifications [16] to au-
dio/video transcription [17; 23], proofreading [19] and trans-
lation [20].
In order to guarantee quality results from potentially
error-prone workers, most applications use complex, self-
managing workflows with independent production and re-
view stages. For example, iterative improvement [14] and
find-fix-verify workflows [1] are popular patterns. But de-
vising these patterns and adapting them to a new task is both
complex and time consuming. Existing development envi-
ronments, e.g. Turkit [14] simplify important issues, such
as control flow and debugging, but many challenges remain.
For example, in order to craft an effective application, the
designer must:
Choose between alternative workflows for accomplish-
ing the task. For example, given the task of transcribing
an MP3 file, one could ask a worker to do the transcrip-
tion, or first use speech recognition and then ask work-
ers to find and fix errors. Depending on the accuracy
and costs associated with these primitive steps, one or the
other workflow may be preferable.
Optimize the parameters for a selected workflow. Sup-
pose one has selected the workflow which uses a single
worker to directly transcribe the file; before one can start
execution, one must determine the value of continuous pa-
rameters, such as the price, the length of the audio file,
etc.. If the audio track is cut into snippets which are too
long, then transcription speed may fall, since workers of-
ten prefer short jobs. But if the audio track is cut into
many short files, then accuracy may fall because of lost
context for the human workers. A computer can method-
ically try different parameter values to find the best.
Create tuned interfaces for the expected workers. The
precise wording, layout and even color of an interface can
dramatically affect the performance of users. One can use
Fitt’s Law or alternative cost models to automatically de-
sign effective interfaces [7]. Comprehensive “A-B” test-
ing of alternative designs, automated by computer, is also
essential [12].
Control execution of the final workflow. Some deci-
sions, for example the number of cycles in an iterative
improvement workflow and the number of voters used
for verification, can not be optimally determined a priori.
Instead, decision-theoretic methods, which incorporate a
model of worker accuracy, can dramatically improve on
naive strategies such as majority vote [3].
Our long-term goal is to prove the value of AI methods
on these problems and to build intelligent tools that fa-
cilitate the rapid construction of effective crowd-sourced
workflows. Our first system, TURKONTROL [3; 4], used a
partially-observable Markov decision process (POMDP) to
perform decision-theoretic optimization of iterative, crowd-
sourced workflows. This paper presents the design of our
second system, CLOWDER1, which we are just starting to
implement. We start by summarizing the high-level archi-
tecture of CLOWDER. Subsequent sections detail the AI rea-
1It is said that nothing is as difficult as herding cats, but maybe
decision theory is up to the task? A clowder is a group of cats.
Page 2
HTN
library
DT planner
user
models
task
models
worker
marketplace
renderer
rendered
job
learner
Figure 1: Architecture of the CLOWDER system.
soning used in its major components. We end with a discus-
sion of related work and conclusions.
Overview of CLOWDER
Figure 1 depicts the proposed architecture of CLOWDER.
At its core, CLOWDER has the capability to generate, se-
lect from, optimize, and control a variety of workflows and
also automatically render the best interfaces for a task. It
achieves this by accessing a library of pre-defined workflow
patterns expressed in a hierarchical task network (HTN)-
like representation [18]. Along with each HTN it maintains
the relevant planning and learning routines. The learning
routine learns task and user models. These parameters aid
in controlling the workflow dynamically using a decision-
theoretic planner. Finally, it optimizes the task interfaces
based on user performance. Overall, it aims to achieve a
higher quality-cost-completion time trade-off by optimizing
each step of the process. CLOWDER proposes the following
novel features:
A declarative language to specify workflows.
CLOWDER’s language is inspired by the HTN rep-
resentation. An HTN is a hierarchy of tasks, in which
each parent task is broken into multiple children tasks.
At the lowest level are the primitive actions that can be
directly executed (in our case, jobs that are solved either
by machines or are crowd-sourced). Thus, HTN provides
a systematic way to explore the possible ways to solve
the larger problem. A workflow can be quite naturally
represented in an HTN-like representation.
Shared models for common task types. Most crowd-
sourcing jobs can be captured with a relatively small num-
ber of job classes, such as jobs with discrete alternatives,
creating/improving content, etc. By having a library of job
types CLOWDER will be able to share parameters across
similar job types. Given a new task, CLOWDER can trans-
fer the knowledge from similar prior tasks, speeding up
the learning process. E.g., it could use audio transcription
parameters to seed those for the handwriting recognition
task, as they are quite similar.
Integrated modeling of workers. CLOWDER models
and continually updates its worker’s quality parameters.
This is especially necessary, since workers often perform
poor quality work, so tracking their work and rewarding
the good workers is imperative to a healthy functioning
platform. While a worker’s quality could change based
on the task (people not good at writing English descrip-
tions could still be potent audio transcribers), we can seed
their task-specific quality parameters based on their aver-
age parameters from similar prior tasks.
Comprehensive Decision-Theoretic Control. A work-
flow has several choices to make including pricing, bonus,
number of iterations or voters, and interface layout. Our
previous work, TURKONTROL, optimized a subset of
these factors for a specific type of workflow. CLOWDER
will extend TURKONTROL by allowing a large number of
workflows and optimizing for all of these choices.
We now discuss each of these components in detail.
Modeling Worker Performance
Poor quality workers present a major challenge for crowd-
sourced applications. Although early studies concluded
that the majority of workers on Mechanical Turk are dili-
gent [22], more recent investigations suggest a plethora of
spam workers. Moreover, the error rates are quite high for
open-ended tasks like improving an artifact or fixing gram-
matical errors [1].
Ipeirotis [9] has suggested several important improve-
ments to the Mechanical Turk marketplace platform, one of
which is a better reputation system for evaluating workers.
He argues that payment should be separated from evalua-
tion, employers should be allowed to rate workers, and the
platform should provide more visibility into a worker’s his-
tory. Worker quality should be reported as a function of job
type in addition to aggregate measures. By surfacing lim-
ited information, such as percentage acceptance and num-
ber of completed hits, Mechanical Turk makes it easy for
spam workers to pose as responsible by rank boosting [8;
6]. Yet even if Mechanical Turk is slow to improve its
platform, alternative marketplaces, such as eLance, guru,
oDesk, and vWorker, are doing so.
But even if Ipeirotis’ improved reputation system is
widely adopted, the best requesters will still overlay
their own models and perform proprietary reasoning about
worker quality. In a crowd-sourced environment, the spe-
cific workflow employed (along with algorithms to control
it) is likely to represent a large part of a requester’s com-
petitive advantage. The more an employer knows about the
detailed strengths and weaknesses of a worker, the better the
employer can apply the worker to appropriate jobs within
that workflow. Thus, knowledge about a worker provides
a proprietary advantage to an employer and is unlikely to
be fully shared. Just as today’s physically-based organiza-
tions spend considerable resources on monitoring employee
performance, we expect crowd-sourced worker modeling to
be an area of ongoing innovation. TURKONTROL devised a
novel approach to worker modeling, which CLOWDER ex-
tends.
Learning a Model of Simple Tasks: Let us focus on the
simplest tasks first – predicting the worker behavior when
answering a binary question. The learning problem is to
estimate the probability of a worker x answering a binary
ballot question correctly. While prior work has assumed all
library
DT planner
user
models
task
models
worker
marketplace
renderer
rendered
job
learner
Figure 1: Architecture of the CLOWDER system.
soning used in its major components. We end with a discus-
sion of related work and conclusions.
Overview of CLOWDER
Figure 1 depicts the proposed architecture of CLOWDER.
At its core, CLOWDER has the capability to generate, se-
lect from, optimize, and control a variety of workflows and
also automatically render the best interfaces for a task. It
achieves this by accessing a library of pre-defined workflow
patterns expressed in a hierarchical task network (HTN)-
like representation [18]. Along with each HTN it maintains
the relevant planning and learning routines. The learning
routine learns task and user models. These parameters aid
in controlling the workflow dynamically using a decision-
theoretic planner. Finally, it optimizes the task interfaces
based on user performance. Overall, it aims to achieve a
higher quality-cost-completion time trade-off by optimizing
each step of the process. CLOWDER proposes the following
novel features:
A declarative language to specify workflows.
CLOWDER’s language is inspired by the HTN rep-
resentation. An HTN is a hierarchy of tasks, in which
each parent task is broken into multiple children tasks.
At the lowest level are the primitive actions that can be
directly executed (in our case, jobs that are solved either
by machines or are crowd-sourced). Thus, HTN provides
a systematic way to explore the possible ways to solve
the larger problem. A workflow can be quite naturally
represented in an HTN-like representation.
Shared models for common task types. Most crowd-
sourcing jobs can be captured with a relatively small num-
ber of job classes, such as jobs with discrete alternatives,
creating/improving content, etc. By having a library of job
types CLOWDER will be able to share parameters across
similar job types. Given a new task, CLOWDER can trans-
fer the knowledge from similar prior tasks, speeding up
the learning process. E.g., it could use audio transcription
parameters to seed those for the handwriting recognition
task, as they are quite similar.
Integrated modeling of workers. CLOWDER models
and continually updates its worker’s quality parameters.
This is especially necessary, since workers often perform
poor quality work, so tracking their work and rewarding
the good workers is imperative to a healthy functioning
platform. While a worker’s quality could change based
on the task (people not good at writing English descrip-
tions could still be potent audio transcribers), we can seed
their task-specific quality parameters based on their aver-
age parameters from similar prior tasks.
Comprehensive Decision-Theoretic Control. A work-
flow has several choices to make including pricing, bonus,
number of iterations or voters, and interface layout. Our
previous work, TURKONTROL, optimized a subset of
these factors for a specific type of workflow. CLOWDER
will extend TURKONTROL by allowing a large number of
workflows and optimizing for all of these choices.
We now discuss each of these components in detail.
Modeling Worker Performance
Poor quality workers present a major challenge for crowd-
sourced applications. Although early studies concluded
that the majority of workers on Mechanical Turk are dili-
gent [22], more recent investigations suggest a plethora of
spam workers. Moreover, the error rates are quite high for
open-ended tasks like improving an artifact or fixing gram-
matical errors [1].
Ipeirotis [9] has suggested several important improve-
ments to the Mechanical Turk marketplace platform, one of
which is a better reputation system for evaluating workers.
He argues that payment should be separated from evalua-
tion, employers should be allowed to rate workers, and the
platform should provide more visibility into a worker’s his-
tory. Worker quality should be reported as a function of job
type in addition to aggregate measures. By surfacing lim-
ited information, such as percentage acceptance and num-
ber of completed hits, Mechanical Turk makes it easy for
spam workers to pose as responsible by rank boosting [8;
6]. Yet even if Mechanical Turk is slow to improve its
platform, alternative marketplaces, such as eLance, guru,
oDesk, and vWorker, are doing so.
But even if Ipeirotis’ improved reputation system is
widely adopted, the best requesters will still overlay
their own models and perform proprietary reasoning about
worker quality. In a crowd-sourced environment, the spe-
cific workflow employed (along with algorithms to control
it) is likely to represent a large part of a requester’s com-
petitive advantage. The more an employer knows about the
detailed strengths and weaknesses of a worker, the better the
employer can apply the worker to appropriate jobs within
that workflow. Thus, knowledge about a worker provides
a proprietary advantage to an employer and is unlikely to
be fully shared. Just as today’s physically-based organiza-
tions spend considerable resources on monitoring employee
performance, we expect crowd-sourced worker modeling to
be an area of ongoing innovation. TURKONTROL devised a
novel approach to worker modeling, which CLOWDER ex-
tends.
Learning a Model of Simple Tasks: Let us focus on the
simplest tasks first – predicting the worker behavior when
answering a binary question. The learning problem is to
estimate the probability of a worker x answering a binary
ballot question correctly. While prior work has assumed all
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
11 Readers on Mendeley
by Discipline
9% Economics
by Academic Status
45% Ph.D. Student
18% Researcher (at an Academic Institution)
9% Student (Master)
by Country
45% United States
9% United Kingdom
9% Italy


