A Crowdsourcing Based Mobile Image Translation and Knowledge Sharing Service
- ISBN: 9781450304245
- DOI: 10.1145/1899475.1899481
Abstract
Travelers in countries that use an unfamiliar script cannot use pocket translators or online translation services to un- derstand menus, maps, signs and other important informa- tion, because they are unable to write the text they see. So- lutions based on optical character recognition provide very limited performance in real-world situations and for com- plex scripts such as Chinese and Japanese. In this paper, we propose an alternative image translation solution based on crowdsourcing. A large number of human workers on mobile terminals are used to carry out the tasks of image recognition, translation and quality assurance. Compared to purely technical solutions, this human computation ap- proach is also able to account for context and non-textual cues, and provide higher level information to the end-user. In this paper, we describe a preliminary user study to create
Author-supplied keywords
A Crowdsourcing Based Mobile Image Translation and Knowledge Sharing Service
Knowledge Sharing Service
Yefeng Liu, Vili Lehdonvirtay, Mieke Kleppez, Todorka Alexandrova,
Hiroaki Kimura, Tatsuo Nakajima
Department of Computer Science, Waseda University
Helsinki Institute for Information Technologyy
Eindhoven University of Technologyz
fyefeng,toty,hiroaki,tatsuog@dcl.info.waseda.ac.jp
vili.lehdonvirta@hiit.y, m.kleppe@student.tue.nlz
ABSTRACT
Travelers in countries that use an unfamiliar script cannot
use pocket translators or online translation services to un-
derstand menus, maps, signs and other important informa-
tion, because they are unable to write the text they see. So-
lutions based on optical character recognition provide very
limited performance in real-world situations and for com-
plex scripts such as Chinese and Japanese. In this paper,
we propose an alternative image translation solution based
on crowdsourcing. A large number of human workers on
mobile terminals are used to carry out the tasks of image
recognition, translation and quality assurance. Compared
to to purely technical solutions, this human computation
approach is also able to account for context and non-textual
cues, and provide higher level information to the end-user.
The main research challenges we aim to address through this
concept are 1) motivation of human microworkers and 2) the
dynamic distribution and aggregation of tasks with real-time
requirements. In this paper, we describe a preliminary user
study to create a model of end-user requirements.
Categories and Subject Descriptors
H.5.m [Information interfaces and presentation (e.g.,
HCI)]: Miscellaneous; H.1.2 [User/Machine Systems]:
Human Factors
General Terms
Design, Human Factors
Keywords
Crowdsoucing, mobile image translation, image-text recog-
nition, knowledge sharing
1. INTRODUCTION
The increasing market penetration of mobile phones with
Internet connectivity, high processing power and integrated
sensors such as cameras [1] has given rise to a new ubiquitous
computing platform. As is typical with new technologies,
entertainment applications such as mobile games are leading
the growth. The purpose of our research is to examine ways
in which the platform could be used for new kinds of human
computation or crowdsourcing applications { applications
that make use of the distributed and always-on nature of
the mobile phone, while borrowing ideas from gaming to
attract contributions from users.
The specic application area addressed in this paper is mo-
bile image translation [2], which refers to camera phone ap-
plications that attempt to solve the problem of translating
text written in an unfamiliar script; for example, Chinese,
Japanese, Thai or Arabic. Traditional digital pocket trans-
lators and online translation services are useless if the user
is unable to input the unfamiliar characters into the device.
Mobile image translators typically attempt to solve this by
recognizing the characters optically.
Image translation services (e.g., Ta With You1, Interlecta
Translator for BlackBerry smartphones2, RantNetwork Com-
municator3, [2]) generally consists of two steps. The rst
step involves extracting text from an image taken by the mo-
bile phone's camera. The second step involves translating
the text. The rst step is often considered the more dicult
one, and the bottleneck of current image translation appli-
cations. This is because text and handwriting recognition
are still very dicult and computationally expensive tasks
for computers. Thus mobile image translators are mostly
limited to languages written in regular Latin scripts on sim-
ple backgrounds. Figure 1 shows examples4 of typical texts
that such applications usually cannot handle.
In this paper, we explore a crowdsourcing based solution
for mobile image translation. Crowdsourcing [3] is a way of
outsourcing tasks, traditionally performed by an employee
or contractor, to a large group of people (a crowd), through
an open call. Here, the image-text translation tasks are
outsourced to a translator community. A dierence to tra-
ditional crowdsourcing models is that the work assignments
1http://www.tauyou.com/en/image.html
2http://home.interlecta.com/paypal-blackberry/
3http://www.rantnetwork.com/communilator.aspx
4These examples are the menu and signs of a Japanese
restaurant
for machine to recognize
also originate from a crowd of numerous end-users with dif-
fering needs. The approach can also be described as a type
of human computation [4], which is understood as the no-
tion of solving dicult AI-related computation problems by
obtaining help from (ordinary) humans instead of applying
machine algorithms. Compared to a purely machine based
solution, it is harder for a human computation solution to
provide real-time or on-demand service, but the quality of
the outcome is matchless because humans are as of yet much
better than machine solutions in many tasks. Another sig-
nicant feature of human workers is that they can provide
richer interpretations and responses to tasks in addition to
literal answers. In mobile image translation, this can mean
that the system works not only as a translator, but also as
a knowledge broker that allows users to share higher level
information such as advice, instructions and suggestions per-
tinent to the situation at hand. The result can be seen as a
mixture of social search engine[5] and mobile image transla-
tor capabilities.
In this paper, we introduce the human mobile image trans-
lator concept and report on the results of a preliminary user
study conducted to yield a model of end-user requirements,
a rst step towards concrete implementation [6, 7]. The user
study took the form of a simulation or experiment, where
real potential end-users sought help to their actual trans-
lation problems and real potential translators responded to
them. More generally, this research aims to address the rel-
ative scarcity of real-world studies on mobile and ubiquitous
human computation and\human sensor"applications, as op-
posed to purely machine-based solutions. The research chal-
lenges we aim to address through the human mobile image
translation concept are 1) motivation of human microwork-
ers in a distributed setting and 2) the dynamic distribution
and aggregation of tasks with near-real-time requirements
when the solution involves humans. As argued above, the
chosen application area is also important in itself, and a suc-
cessful translation system would be a major contribution.
The main contributions of this paper are introducing the
concept of human mobile image translation and sketching
an architecture based on real user requirements and issues
identied in the user study.
The rest of the paper is organized as follows. In Section 2,
we introduce the proposed concept in detail. In Section 3,
we describe the results of the user experiment. In Section 4,
a number of interesting research questions and ndings are
discussed. In Section 5, some important related researches
are described. And nally, the future direction of our project
is discussed in Section 6.
2. HUMAN MOBILE IMAGE TRANSLATION
In this section we present the design of the crowdsoucing
model for mobile image translation and knowledge sharing.
We start with a scenario to illustrate the problem considered
in the paper.
2.1 Scenario
\Daniel is a western traveler who has just arrived in
Japan. He can't speak Japanese and of course he has
no idea how to write or read Kanji (the Japanese ver-
sion of Chinese characters). When traveling in Europe,
Daniel would not worry about the language problems, as
his smart-phone can provide enough support for translat-
ing. However, the situation in Japan is somewhat dier-
ent. Though he can still access on-line translators and
use o-line dictionaries easily, the real problem is how
to input the Japanese text? The following situation is
an example of one of the many cases when he would be
happy to use translation help.
Daniel notices an interesting sign (see Figure 2) on the
menu board in front of a Japanese restaurant. From the
image on it he understands it is related to ties but he
has no idea about the exact meaning. He gets really cu-
rious whether you are allowed or not to wear a tie inside
this restaurant? He cannot use a digital dictionary as he
has no idea how to input the characters. He tries to ask
the waiter about the meaning of the sign but he cannot
answer in English."
Figure 2: A sign on a menu board outside a Japanese
restaurant
The above scenario explicitly describes the problem of the
existing text-based translators. Logographic script (e.g.,
Chinese and Japanese) input methods are totally dierent
from Latin-based languages, since users cannot type the
character unless they know how to pronounce it. Handwrit-
ing input method could be one solution, however it is ad-
mittedly time consuming, especially when the characters are
unfamiliar to the user. Furthermore, it also requires many
hardware/software prerequisites, e.g., handwriting recogni-
tion tool, support for foreign language, touch screen, and so
on.
form
In this paper we propose a mobile crowdsourcing solution
to the problems described in the previous section. Since the
workload of each job in the mobile image translation service
is lightweight enough to be described as a micro-task, the
tasks are perfectly suitable to be outsourced to large groups
of casual workers.
The platform utilizes a server-client architecture. The dif-
ference with the conventional server-client structure is that
there are two types of clients: the users who make requests
are called client users, and the translators are named work
users. On the other end, the server plays a proxy role. It
receives translation requests from client users, assigns these
tasks to appropriate work users, and forwards translators'
nal answers to the original requesters.
Figure 3 illustrates the basic work-
ow of the proposed trans-
lation model and a detailed description of it is given below.
Figure 3: Basic work-
ow of the proposed model
1. A client user makes a translation request by taking a
picture of the text using a mobile phone's camera, and
submits the image to the server. Optionally, a short
text message can be attached with the photo to clarify
what exactly they want to know. Context information,
such as location and time, will also be automatically
attached to the request, although the availability of
the context information may depend on the client's
terminal's functionality (e.g., if GPS module is em-
bedded or not). Such context information, together
with a work user's prole, is useful for assigning tasks
appropriately.
2. For the purpose of enhancing the response time, each
task is assigned to multiple workers, simultaneously.
3. The original request is sent to translators via email.
Translators are encouraged to reply in \key words" or
\tag"style. In many cases what the requester wishes to
know is not the semantic meaning of the word, but the
real meaning, e.g., in the case of a dish, \pork, spicy,
Chinese food, famous" is a better answer than \twice
cooked pork (huiguo rou)" as far as understandability
is concerned.
4. For the response time considerations, the rst answer
to be received from translators is forwarded to the re-
quester immediately. For the rest of the replies, the
client user can set a maximum waiting period and re-
ject any responses received after that.
5. Eventually, the client user receives the nal result,
and optionally scores the participants according to the
quality of the translation outcome. Before the client
scores, a default amount of points is given to each
translator. The client has one day to decide the points,
however they cannot change the score once given.
The translation service is designed as a mobile touch screen
application (e.g., iPhone/ Android/ Symbian apps, etc.).
The main interface of this application consists of three main
parts. i) After opening the application the rst interface is
shown, which is a camera display view. Clients take pic-
tures of what they want to ask using this view. ii) Then,
a simple image editing tool is provided, by which the users
can emphasis the important points in the photo that they
are interested in (e.g., by circling them). iii) The last part
is a text editor, which is used for inputing a short message
to the translator, that describes what exactly the requester
wants to know.
As for translators, they are rst asked whether they are
willing to accept the task. If the answer is positive, an image
editor and a text editor are provided as translation tools that
translators may use depending on the request content.
3. USER EXPERIMENT
For this platform to work, we have to know if the translators
are able to deliver the desired responses to the requesters.
Therefore, we designed a qualitative experiment to answer
the following research question: How should the user's ques-
tions be presented to the translator in order to provide the
preferred results for the user? We held a series of meetings
for discussing the experiment design. Participants included
Japanese and foreigners (that can be seen as our potential
users) from dierent background areas such as technology,
design, economics, and psychology. The original intention of
the project has been to design a human based image trans-
lator, however, through discussions with potential users, we
found out that what they require is much more than a simple
translator. Instead of just knowing the semantic meaning of
the words, users are much more interested in a service that
can answer their questions related to the photo, which is
more like a translation service mixed with an image based
mobile real time Q&A service. Based on this nding, in the
later meetings we have improved our concept idea and ex-
tended the design from simple translation to a knowledge
sharing service. Eventually the experiment work
ow has
been realized as follows.
First we collected example images (taken by a mobile phone)
from foreigners who are currently staying in Tokyo, together
with their corresponding questions. From these images, we
chose several characteristic cases from dierent types of users
(or we could also say, from users who had dierent needs).
Then, we interviewed the photos' providers, questioning what
kind of answers they were expecting. In the next stage we
sent the requests to a group of seven invited translators,
mostly Japanese university students. After receiving the
answers from the translators, we interviewed the translators
nally, we compared the results from the translators with the
expected answers from the requesters, identifying whether
they matched or not and discussed the possible reasons for
mismatching.
In this paper, we present four of the typical user study cases
that were observed in the experiment.
3.1 User Study
Picture I The picture in Figure 4 is received from a
traveler who has come to Tokyo for a short term vacation.
He had no Japanese language skills at all and that is why he
also tended to ignore most of the information surrounding
him, being only concerned about the information directly
related to him. He took this photo when he was waiting
for a train at the station a bit longer than expected, when
he noticed that the information display was showing some
special information instead of the arrival time of the coming
trains.
\Why isn`t there any train? How long do I have to wait
here?"
Figure 4: Picture of study case I
When we interviewed the requester, he said at that moment
he understood there was some trains delay, but he didn't
know if keeping waiting was a good choice or not, since some
people were leaving, while others were staying. During the
translators' interview, four of them considered this question
as dicult, mainly because \The information in the picture
is too little". or \The reason is hard to explain in English."
However, the result shows most of the collected answers were
actually good enough for the requester, because he was not
really concerned with the reason of the delay but what he
should do seeing the message on the display. Most of the
replies suggested an approximate time the requester might
need to wait, e.g., \Maybe, you have to wait for thirty min-
utes to one hour.", although the exact time to wait and the
exact reason for the delay were not given (which was, in
fact, not possible to know), e.g., \There is a trouble and you
have to wait some more minutes. But I don't know how long
exactly you will wait.".
Picture II From the collected questions we found
out that in many cases requests were driven from curiosity
rather than real problems. A typical example is when the
users obtain a partial information for something interesting
but are unable to gure out the whole information that they
want to know. If we take a poster as an example, people
(who cannot read Japanese) can see the image, understand
the time and the date, but are unable to understand what
the event is about, what the location is, and other detailed
information. Figure 5 is an example of such a poster and
the user's question in this case is:
\What are the events between 5th and 8th?"
Figure 5: Picture of study case II
The requester said she was deeply attracted by the poster
because she really wanted to experience a Japanese tradi-
tional event during her stay in Tokyo. On the poster there
were events schedules for July 3 and July 4, however, when
she saw the poster it was July 5, already. So she wanted to
know what kind of events she would be able to attend from
that day on. Unfortunately two translators misunderstood
her question and their answer was \The event is a festival
of the weaver". But afterwards they also pointed out that
if the question was written in a clearer way (e.g., \There
is only information about events on 3rd and 4th, what are
the events between 5th and 8th?"), such misunderstanding
would have been avoided. Three of the other answers indi-
cated that there were no special events during the mentioned
period, and some of them suggested \You can still enjoy a
general Japanese festival."
Picture III Many foreigners (except Chinese) in Japan
could still have troubles with reading and writing Kanji even
after staying in Japan for a few years. They usually have no
urgent needs of translating/explaining everyday words, but
may need help on unusual phrases or specialized vocabular-
ies. Specialized texts such as recipes will be a good example
of this kind of requests. Figure 6 has been provided by a
foreigner who has been staying in Japan for three years and
the picture shows one recipe in her cookbook.
\What are these two? Can you provide links of pictures?"
This kind of nouns is particularly dicult for language learn-
ers, since they can hardly be learned from vocabulary books,
and are rarely used. The requester comes from a non English
speaking country, which means that even if the translator
provides English words, it is very likely she still has no idea
of what it is. That is why she asked for links of pictures.
The results show that the three translators, who understood
the question correctly, provided good responses. However,
another three translators appeared to start answering the
question before reading all the question carefully. Instead of
replying back the requested links, they were trying to give a
text explanation of the ingredients, which was very dicult.
Half of them ended up with somewhat useless answers, and
others gave up the task.
Picture IV This example belongs to the most di-
cult ones. For people who have been staying in Japan for
several years, ordinary or basic knowledge issues would not
be a problem. What they would really like to know is mainly
higher level information, often related to other background
knowledge e.g., cultural understanding. Considering most
of our work users are not skilled or experienced translators,
providing a text-based answer for this type of requests or
explanation can be quite challenging. Figure 7 illustrates a
typical example of this kind, an electricity bill.
\What is the dierence between 1 and 2 in my bill?"
Figure 7: Picture of study case IV
Literally it is unlikely to understand what is the dierence
between the two fees, the direct translation of the two phrases
is \the rst level fee" and \the second level fee", respec-
tively. The requester said normally she would ask for help
her Japanese friends the next day, but then she had to wait
for at least one night, and sometimes the document con-
tained private information which she was unwilling to share
with anyone else, e.g., \result of medical examinations". The
experiment results conrmed the diculty of this type of
requests. Three of the translators replied valuable answers
(which satised the origin requester), while others gave up
the task saying \I don't know the dierence." or \Even if
I know, maybe I can't explain in English". However, the
fact that the majority of our translators are college students
could also be part of the reason for this outcome, i.e., such
knowledge might be out of their life experience.
3.2 Summary of findings
First of all, as one of the most important requirements of the
proposed model, we consider the clear statement of both the
requester's question (to translator) and the translator's an-
swer (to requester). Translator needs to understand the real
question of the requester, in order to be able to provide the
desired answer for the requester. The system should support
the functionality of avoiding the misunderstanding between
the two parties in order to achieve an accurate result.
We found that in most cases, once the translators have the
correct understanding of the question messages from the re-
questers, they can give reasonable and useful answers. The
results also show that unclear writing (by requester) and
hasty reading (by translator) are the major reasons for mis-
understanding of the question. To overcome this problem
one of our suggestion to the requesters is to always ask ques-
tions with choices as long as possible, since it is the clear-
est way to express what they mean. Moreover, requesters
should always better clarify to what level of details they
want to know the answer, e.g., \Which station/ district/
city?" is a better question than \Where?" for translators.
The lack of enough language skills to explain the answer is
another reason that may lead to unfavorable results. The
translators are (normally) not native English speakers, that
is why they often face the problem of being unable to explain
clearly their thoughts in English. Therefore, we encourage
translators to reply a short text message rather than long
sentences. On the one hand, the shorter the sentence is, the
fewer the language mistakes are, and moreover it is easier
for the translators to write such short replies.
\Because I'm not
uent in English, tweeting-style (short
message style) is good for me."
Another eective workaround of the lack of language skills
problem is the idea of replying with a link which can explain
the question. This is a clear and distinct solution for both
parties. For some situations, the translator can even only
response with a link of the Wikipedia5 or Google Images6
search results. This would be a useful and satisfactory in-
formation for the requesters that are not able to type the
Japanese characters and search on Internet by themselves.
Through the experiments we also conrmed the inherent
5http://www.wikipedia.com
6http://www.google.com/imghp
the machine based image recognition and text translation
technologies can always provide perfect outcome, they still
have no chance of oering the desired answer for such kind
of services that demand higher level information.
4. DISCUSSION
In this section we discuss the interesting research questions
we found during the design and the experiment. We focus
on the discovered design implications from the results of the
user study.
4.1 Incorrect/ Useless Answers due to Misun-
derstanding
The quality of the results in crowdsoucing is always a hot is-
sue in human based systems, and has been discussed widely
recently [8, 9, 10, 11]. In this study we found that the way
to input requests and answers is an important factor, which
aects not only the usability but also the quality of the out-
come. The simplest way of making a request is just sending
the picture directly, but it can hardly be an option because
translators can be easily confused about the real purpose of
the request. In other words, client users must clarify what
exactly they want to ask by adding more information. On
the other hand, the translators often use English as their
second or third language and thus, they may also misunder-
stand the question if the text is too complicated.
There are dierent ways to lower the possibility of misun-
derstanding. One way is to limit the complexity of the mes-
sages (e.g., instructing a client user, setting maximum size
of message, or other mechanisms). Moreover, depending on
the question, the translator may need to reply in dierent
ways in addition to the text, e.g., for questions like \Which
button should I click?" or\Which one will you recommend?",
it is better to simply circle the corresponding part in the pic-
ture rather than giving a description by words. However, a
translator (as a human) always makes mistakes no matter
how perfect the instructions are. For example in our study
case in Picture III , even if the question was clear and sim-
ple enough, \bad" replies still appeared due to the fact that
some translators began answering before nishing reading
the request message. Even worse, we also have to consider
the possible existing of a malicious reply.
Figure 8: Work-
ow with an additional proofreading
phase
As single reply can hardly be trusted, another possibility is
to provide multiple results to the client user. The users can
compare the dierent replies by themselves, and make their
own decision (e.g., choose the majority answer). Neverthe-
less, if we consider the response time, this approach might
be expensive.
There is a third solution which can be seen as a compro-
mise between the above mentioned methods. We can add
a proofreader (see Figure 8) to verify the correctness of the
answer and to prevent from malicious replies. Moreover, the
task of classifying/tagging images can also be assigned to the
proofreader, for the purpose of maintaining a more valuable
results database.
4.2 Accuracy vs. Timeliness
Based on the experiment results, we can identify dierent
types of users. Clients can rst be divided into two classes
based on their period of stay in the foreign country, i.e.,
short-term (e.g., tour, business, visiting, etc.) and long-
term (e.g., study, work, training, etc.). Then, users can be
further classied according to the time requirements on their
requests. Some of the translation tasks expect immediate
answers (e.g., menu, instruction, sign, etc.), in most cases
the requesters are blocked in the middle of their on-going
activity and they have to wait for this answer to continue.
The other type of requests are so called waitable questions
(e.g., documents, posters, etc.), which often involve more
complicated question contents. The translator may need
to add extra explanation or related background information
to avoid meaningless/useless answers or misunderstandings
between the two parties rather than simply translating the
semantic meaning.
Eventually, four basic types (A, B, C and D) of clients were
dened (see Figure 9).
Figure 9: Four basic types of clients
Depending on the dierent clients types, there is a trade o
between the accuracy and the response time of the reply.
For requests that need immediate answer, timeliness is the
key factor with regards to clients' quality of experiences. We
may want to skip intermediate stages and directly forward
the answer from the rst translator to the user. On the other
hand, for waitable type of requests, the proofreading or mul-
tiple answers should be a strict requirement. In general, we
advocate the appropriate use of dierent request processing
strategies, depending on the users' request types.
4.3 Interactivity
Our current design does not involve any means for establish-
ing a direct link between translators and requesters, but the
necessity of such a communication link is worthy to be dis-
cussed. From the study results we noticed a trend of requir-
ing translator-to-requester communication. When transla-
of them wish to conrm what they have understood with
the requester. On the other end, after receiving the reply,
some of the requesters expressed their desires to ask further
questions related to the answer.
However, building a communication link between the two
parties brings a drawback as well. Serial and continuous
tasks heavily increase translators' workload, which is against
our original intentions to outsource micro-weight tasks to a
large number of work users. This point is a fundamental
part of our design philosophy { if the job is heavy, it is very
dicult to motivate people to participate.
5. RELATED WORK
There have been a number of image-text translation systems
deployed over years (see [12] for an overview). Most of them
utilize OCR technology to recognize images. Masashi Koga
et al. [2] discussed a camera based mobile image transla-
tion application using Kanji OCR. Their main target source
text is machine-printed documents. This study suggests
that users are also interested in LED displays and other
non-printed texts, and more important, in deeper contex-
tual information and advice as opposed to merely the literal
meaning of a word or character. The latter is especially im-
portant when the cultural distance between the source and
target languages is great.
There have been some earlier eorts related to language
translation, crowdsourcing and mobile devices. Ambati et
al. [13] proposed an Active Crowd Translation (ACT) for
automatic translation of low-resource language pairs, which
makes use of both active learning and crowdsourcing con-
cepts. Callison-Burch conducted a signicant study of the
performance of crowdsourcing based translation quality eval-
uation [14]. His report concludes that compared to con-
ventional manual evaluation, the online market is a faster,
cheaper and more creative option. Neither of these stud-
ies were aimed at mobile translation and problem-solving,
however. Eagle conducted a eld study [15] in Kenya and
Rwanda on a mobile crowdsourcing system txteagle, where
the users were able to earn small amounts of money by com-
pleting simple tasks on their mobile phones. The tasks did
not include translation, but the study demonstrates the use
of mobile phones and economic incentives in crowdsourcing.
Other relevant studies can be found in the human computa-
tion research stream [4, 16, 17]. Arase et al. [16] proposed
a web-based multi-player game to collect knowledge on the
geographical relevance of images, in order to better repre-
sent images' geographical context for searching and brows-
ing. Barrington et al. [17] proposed a Facebook7 game to
collect data for building machine learning models that au-
tomatically associate music with tags. These concepts use
game-like incentive systems and deal with task distribution,
but do not address mobile and real-time requirements. In
relation to the latter, Quinn et al. [18] proposed a toolkit
called CrowdFlow for blending human computing and ma-
chine computing in order to attain tighter control over the
inherent tradeos in speed, cost and quality. In the follow-
ing section, we discuss some of the next steps in our work
7http://www.facebook.com
to contribute to the research landscape outlined above.
6. FUTURE DIRECTIONS
This paper represents an early stage in the development of
a human mobile image translation and knowledge sharing
system. A number of research questions were discussed and
addressed but most remain to be addressed. Some of the im-
mediate issues are how to detect people's availability, what
base platform is suitable for implementing the system, how
to deal with the results, and how to encourage participa-
tion. In terms of the next step of work, we look forward to
addressing these research problems via a larger scale user
study on a functional prototype.
6.1 Task distribution and real-time require-
ments
How to achieve ecient and appropriate task allocation is an
important topic of this work. Here appropriateness stands
for two aspects: the capacity, and the availability. Capacity
indicates whether a worker has enough knowledge or skill to
accomplish the task, and availability is about if this is a good
time that the user willing to work. Generally speaking, the
former aspect mainly aects the quality of translation, and
the later one may aect the quantity.
Compared to ordinary everyday task, micro-task are com-
monly dened as jobs with lightweight workload and low
diculty, hence should be able to easily handled by most
normal people. As a consequence, the capability require-
ment of crowdsourcing is less strict. In this application, rst
of all, it is assumed that participants are people who at least
have enough self-condence of their skill in both languages.
Furthermore, a work user is expected to provide a list of
\familiar areas/places" when they sign-up. Server compares
such data with each requester's location, and always prefers
to assign a task to a worker who claim her/himself know this
or nearby place well.
The other requirement, detection of a user's availability, is
more dicult to deal with. It is not only about if people are
free [19], but also involves other factors like social relation-
ship, expertise, properties of questions, etc. We will look
deeper in this issue in the future. In fact, we noticed it is a
common issue existing in various elds. For instance, Tejin-
der and Carman [20] summarized the design challenges in
future domestic communication technologies and indicated
that one important issue is how to represent the true avail-
ability or the \willingness" to video conference in the initi-
ating stage. Besides existing researches, we believe the user
availability detection technology also opens new possibilities
in ubiquitous computing research. If the availability of an
individual at given time is detectable, both response rate
and time of mobile crowdsourcing can be greatly improved.
Thus, in addition to use people as processors (as what we do
in human computation), it is also possible to use human as
sensor to perform tasks with relatively harder real-time re-
quirements. For example, people can be employed to collect
high-level context information (e.g., human activity, non-
electronic object's location, identication or state, etc) of
a given environment. Such rich data are extremely expen-
sive and dicult to get via machines, but very valuable and
useful for ubiquitous computing applications.
Besides establishing a new model from top to bottom, we
also consider the possibility of building our system atop of
existing web-based platforms. On-line crowdsourcing mar-
ketplaces like Amazon Mechanical Turk8 support outsource
tasks to the messes, though such marketplaces normally use
nancial rewards as incentive. Another choice is to use so-
cial networking and microblogging service Twitter9, since it
is lightweight, easy to operate, current gained widespread
popularity worldwide (registered user reaching 190 million
by June 2010 [21]), and well support all needed functionali-
ties (i.e., text/image transfer, positioning, etc.).
6.3 Results database
All \good" results (i.e., answers that get full score from re-
questers) should be stored due to their utility value. In ad-
dition, we need to require translators to tag all the images
they are translating with the aim of more ecient access-
ing of the data. We believe this results database, as its size
grows, can be of great value to various kinds of groups such
as future clients, language learners, AI researchers, tourism
service provider, etc.
6.4 User motivation and incentive systems
Another key challenge of this system is how to design par-
ticipation incentive mechanisms, since the performance of
the system strongly depends on whether the task rewards
can activate the work user's participation. To some extent,
it is also one of the most signicant and fundamental chal-
lenges in user participation systems such as human compu-
tation systems, persuasive systems, open-content publishing
systems. Previous studies have identied three broad ap-
proaches to motivating contributors: economic incentives,
social incentives and intrinsic incentives [22, 23, 24, 25].
Figure 10: Main interface of the described location-
based mobile game
For example, our preliminary design leverages social and
intrinsic motivations related to game play [16, 26]. As men-
tioned above, work users are awarded scores by the end-
users. A location-based mobile game could be constructed
on the basis of these scores or points. The main interface
of game (see Figure 10) is a Google map based real world
8https://www.mturk.com/
9https://www.twitter.com/
map, which is divided into non-overlapping hexagons. The
goal of the game is to conquer territories. Every hexagon
has one owner or lord, who is the player with the highest
number of points awarded inside this area. The lord's pro-
le photo along with an identifying color are displayed in
their hexagons on the map. Depending on the extent and
geographic location of their territory, players hold dierent
special titles which are shown in their prole. In the next
steps of this project, we plan to experiment with this and
similar concepts.
7. REFERENCES
[1] Gartner says mobile phone sales will exceed one billion
in 2009, http://www.gartner.com/press releases/
asset 132473 11.html.
[2] Masashi Koga, Ryuji Mine, Tatsuya Kameyama,
Toshikazu Takahashi, Masahiro Yamazaki, and
Teruyuki Yamaguchi. Camera-based kanji ocr for
mobile-phones: Practical issues. In ICDAR '05:
Proceedings of the Eighth International Conference on
Document Analysis and Recognition, pages 635{639,
2005.
[3] Je Howe. Crowdsourcing: Why the Power of the
Crowd Is Driving the Future of Business. Crown
Publishing Group, New York, NY, USA, 2008.
[4] Luis von Ahn. Human computation. In K-CAP '07:
Proceedings of the 4th international conference on
Knowledge capture, pages 5{6, 2007.
[5] Damon Horowitz and Sepandar D. Kamvar. The
anatomy of a large-scale social search engine. In
WWW '10: Proceedings of the 19th international
conference on World wide web, pages 431{440, New
York, NY, USA, 2010. ACM.
[6] Tim Brown. Change by Design: How Design Thinking
Transforms Organizations and Inspires Innovation.
HarperBusiness, 2009.
[7] W. Keith Edwards, Victoria Bellotti, Anind K. Dey,
and Mark W. Newman. The challenges of
user-centered design and evaluation for infrastructure.
In CHI '03: Proceedings of the SIGCHI conference on
Human factors in computing systems, pages 297{304,
New York, NY, USA, 2003. ACM.
[8] Lada A. Adamic, Jun Zhang, Eytan Bakshy, and
Mark S. Ackerman. Knowledge sharing and yahoo
answers: everyone knows something. In WWW '08:
Proceeding of the 17th international conference on
World Wide Web, pages 665{674, New York, NY,
USA, 2008. ACM.
[9] Zoltan Gyongyi, Georgia Koutrika, Jan Pedersen, and
Hector Garcia-Molina. Questioning yahoo! answers.
Technical Report 2007-35, Stanford InfoLab, 2007.
[10] Pei-Yun Hsueh, Prem Melville, and Vikas Sindhwani.
Data quality from crowdsourcing: a study of
annotation selection criteria. In HLT '09: Proceedings
of the NAACL HLT 2009 Workshop on Active
Learning for Natural Language Processing, pages
27{35, Morristown, NJ, USA, 2009. Association for
Computational Linguistics.
[11] Panagiotis G. Ipeirotis, Foster Provost, and Jing
Wang. Quality management on amazon mechanical
turk. In HCOMP '10: Proceedings of the ACM
SIGKDD Workshop on Human Computation, pages
[12] Hiromichi Fujisawa. Forty years of research in
character and document recognition-an industrial
perspective. Pattern Recogn., 41(8):2435{2446, 2008.
[13] Stephan Vogel Vamshi Ambati and Jaime Carbonell.
Active learning and crowd-sourcing for machine
translation. In Proceedings of the Seventh conference
on International Language Resources and Evaluation
(LREC'10), may 2010.
[14] Chris Callison-Burch. Fast, cheap, and creative:
evaluating translation quality using amazon's
mechanical turk. In EMNLP '09: Proceedings of the
2009 Conference on Empirical Methods in Natural
Language Processing, pages 286{295, 2009.
[15] Nathan Eagle. txteagle: Mobile crowdsourcing. In
Nuray Aykin, editor, Internationalization, Design and
Global Development, volume 5623 of Lecture Notes in
Computer Science, pages 447{456. Springer Berlin /
Heidelberg, 2009.
[16] Yuki Arase, Xing Xie, Manni Duan, Takahiro Hara,
and Shojiro Nishio. A game based approach to assign
geographical relevance to web images. In Proceedings
of the 18th international conference on World wide
web - WWW '09, page 811, New York, New York,
USA, 2009. ACM Press.
[17] Luke Barrington, Damien O'Malley, Douglas Turnbull,
and Gert Lanckriet. User-centered design of a social
game to tag music. In HCOMP '09: Proceedings of the
ACM SIGKDD Workshop on Human Computation,
pages 7{10, New York, NY, USA, 2009. ACM.
[18] Bederson B. B. Yeh T. Lin J. Quinn, A. J. Crowd
ow:
Integrating machine learning with mechanical turk for
speed-cost-quality
exibility. Technical report,
Technical Report HCIL-2010-09 (University of
Maryland, College Park, 2010.
[19] Meredith Ringel Morris, Jaime Teevan, and Katrina
Panovich. What do people ask their social networks,
and why?: a survey study of status message q&a
behavior. In CHI '10: Proceedings of the 28th
international conference on Human factors in
computing systems, pages 1739{1748, New York, NY,
USA, 2010. ACM.
[20] Tejinder K. Judge and Carman Neustaedter. Sharing
conversation and sharing life: video conferencing in
the home. In CHI '10: Proceedings of the 28th
international conference on Human factors in
computing systems, pages 655{658, New York, NY,
USA, 2010. ACM.
[21] Costolo: Twitter now has 190 million users tweeting
65 million times a day, http://techcrunch.com/2010
/06/08/twitter-190-million-users/.
[22] R Benabou and J. Tirole. Intrinsic and extrinsic
motivation. Review of Economic Studies,
70(3):489{520, 2003.
[23] A. Hars and S. Ou. Working for free? - motivations of
participating in open source projects. In Proceedings
of The Annual Hawaii International Conference on
System Science, page 163, 2001.
[24] Huber Michael Bretschneider Ulrich Leimeister,
Jan Marco and Helmut Krcmar. Leveraging
crowdsourcing: Activation-supporting components for
it-based ideas competition. Journal of Management
Information Systems, 26(1):194{224, Summer 2009.
[25] Miyuki Shiraishi, Yasuyuki Washio, Chihiro
Takayama, Vili Lehdonvirta, Hiroaki Kimura, and
Tatsuo Nakajima. Using individual, social and
economic persuasion techniques to reduce co2
emissions in a family setting. In Persuasive '09:
Proceedings of the 4th International Conference on
Persuasive Technology, pages 1{8, New York, NY,
USA, 2009. ACM.
[26] Luis von Ahn and Laura Dabbish. Designing games
with a purpose. Commun. ACM, 51(8):58{67, 2008.
A. EXPERIMENT RESULTS
Question I : Why isn't there any train? How long do I have to wait here?
Translator 1
Due to the trac accident. I'm not sure how long you have to wait but the board says the train company
might prepare alternative ways to go.
Translator 2
There are some troubles and you have to wait some more minutes. But I don't know exactly how long
you will wait wait .
Translator 3 (No response)
Translator 4
Because there is an accident at Tokyu-Denentoshi line. I can't tell you how long you have to wait from
only this picture.
Translator 5 I don't know the matter. Maybe, you have to wait for 30 minutes to 1 hours.
Translator 6 I have no idea.
Translator 7
The reason for delay of trains is uncertain. It is also unclear how long does it takes the operation of trains is
recovered. But you may be able to take alternative route by using another transportation.
Question II :What are the events between 5th and 8th?
Translator 1
There was no special event on 5th and 8th. On 3rd and 4th, there were some special events such as paradi-
-ng,
ea market.
Translator 2 The event is Festival of the Weaver.
Translator 3 There is no information in this picture.
Translator 4 (No response)
Translator 5 The events are festivals on 5th to 8th. There are many shops along the street.
Translator 6
A festival will take place. It's theme is \tanabata", an old story related to a couple which is separated by th-
-e Milky way.
Translator 7 There is no special event. But you can enjoy a general Japanese festival.
Question III: What are these two? Can you provide links of pictures?
Translator 1 1: chinese cabbage: [google image link] 2: sesame oil: [google image link]
Translator 2 The rst one: [google image link] The second one: [google image link]
Translator 3 [google image link] and [google image link]
Translator 4 I don't know how to answer.
Translator 5 The rst word : This is a vegetable. The second word : This is a oil.
Translator 6 (No response)
Translator 7
The rst one is a vegetable whose shape is like elongated cabbage. And the second one is oil made from ses-
-ame. The latter may be used for seasoning.
Question IV: What is the dierence between 1 and 2 in my electricity bill?
Translator 1 I cannot answer this question.
Translator 2
According to Tokyo Electric Power Company(Tokyo-denryoku), an unit amount of electricity is dened as f-
-ollowing: Level-1 -> 17.87yen/kWh, applied up to 120kWh Level-2 -> 22.86yen/kWh, applied more than
120 to 300kWh.
Translator 3 Sorry, but I don't know.
Translator 4 1: basic charge. 2: metered fee.
Translator 5 1 is basic fee. 2 is the amount fee of electronic energy.
Translator 6 I do not know how to answer this question.
Translator 7 I have no idea.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


