A predictive Blissymbolic to English translation system
Assets (2002)
- ISBN: 1581134649
- DOI: 10.1145/638249.638283
Available from
Kris Jack's profile on Mendeley.
or
Abstract
This paper reports on the use of predictive techniques to translate Blissymbol sentences into grammatically correct English. Evaluations of this approach show that it is possible to translate short sentences by analysing the likelihood of word tri-gram occurrences in English source texts. The translation system is designed to be a component of a Blissymbol word processor which allows users to convert Blissymbol sentences into grammatically correct English.
Author-supplied keywords
Available from
Kris Jack's profile on Mendeley.
Page 1
A predictive Blissymbolic to English translation system
A Predictive Blissymbolic to English Translation System
Annalu Waller Kris Jack
Department of Applied Computing
University of Dundee
Dundee, DD1 4HN Scotland
+44 1382 345080
awaller@computing.dundee.ac.uk kjack@computing.dundee.ac.uk
ABSTRACT
This paper reports on the use of predictive techniques to
translate Blissymbol sentences into grammatically correct
English. Evaluations of this approach show that it is
possible to translate short sentences by analysing the
likelihood of word tri-gram occurrences in English source
texts. The translation system is designed to be a
component of a Blissymbol word processor which allows
users to convert Blissymbol sentences into grammatically
correct English.
Keywords
AAC, Blissymbolics, Natural Language Translation
BACKGROUND
Blissymbolics is a semantic based graphic language used
primarily by physically disabled, non-speaking people to
communicate [5]. Bliss users usually communicate by
pointing to one or more Bliss-words from a restricted set of
Bliss-words on a paper-based communication board. The
speaking partner uses the English translation associated
with each Bliss-word to translate the sentence as a whole.
The Bliss user then confirms whether or not the message
has been correctly understood.
The use of Blissymbolics on electronic augmentative and
alternative communication (AAC) devices has, however,
limited the dynamic use of the language.
Blissymbolics and Electronic AAC Devices
Blissymbolics is currently used with AAC devices in two
main ways:
• Access to dedicated AAC devices: Bliss-words are
used on an overlay on an AAC device, e.g. Dynavox
[http://www.dynavoxsys.com/3]. Selecting a Bliss-
word will result in a spoken word or phrase,
whichever has been linked to the Bliss-word.
• Electronic Blissymbol communication boards: Several
software programs allow clinicians to develop
electronic communication boards which allow users to
select from screens of Bliss-words. These systems, e.g.
WinBliss [http://www.anycom.se/products/WinBliss/]
and Bliss for Windows [http://www.cameleon-
web.com/] with Clicker [http://www.cricksoft.com/
allow users to ‘speak’ the gloss associated with each
Bliss-word.
At a basic level these systems can be used to speak the
English (or other language) translation of each Blissymbol
word for word, but these computer programs do not
provide an automatic translation of a Bliss-sentence. For
example, the Bliss sentence,
has a word-for-word translation ‘boy’, ‘to go’, ‘home’
which would not be considered a correct English sentence.
Ideally, one would want an automatic translation to be
‘The boy goes home’ or ‘The boy is going home’.
Automatic Blissymbolic Translation
Several projects have focused on the automatic translation
of Blissymbolics into grammatically correct spoken
language. For example, traditional techniques of natural
language using grammar and parse trees have been applied
to translate Bliss into Swedish [4] and English [7]. These
systems have created word dictionaries for each Bliss-word
which specify information such as the part of speech. Parse
trees are used to create grammatically correct sentences
based on the order of the Bliss-words. These techniques
have a good success rate giving syntactically correct results
most of the time. The main drawback is that of the nature
of the dictionary. Not only must the words be categorised
through a laborious process but the dictionary is also
language dependent. More importantly is the requirement
that Bliss sentences must be written using Bliss-words for
each spoken word required in the target translation.
Returning to our previous example, a Bliss user might
select:
The bliss user might be wanting to say: “The boy goes
home” (articles “the” and “a” are implied in Bliss). Using
the approach described above, the translation might be
“Boy goes home.”, requiring the user to actively select the
Bliss-word for “the” to achieve “The boy goes home.”
Similarly, the user would have to select the Bliss-word “to
be” before “to go” to ensure “Boy is going home.”
Annalu Waller Kris Jack
Department of Applied Computing
University of Dundee
Dundee, DD1 4HN Scotland
+44 1382 345080
awaller@computing.dundee.ac.uk kjack@computing.dundee.ac.uk
ABSTRACT
This paper reports on the use of predictive techniques to
translate Blissymbol sentences into grammatically correct
English. Evaluations of this approach show that it is
possible to translate short sentences by analysing the
likelihood of word tri-gram occurrences in English source
texts. The translation system is designed to be a
component of a Blissymbol word processor which allows
users to convert Blissymbol sentences into grammatically
correct English.
Keywords
AAC, Blissymbolics, Natural Language Translation
BACKGROUND
Blissymbolics is a semantic based graphic language used
primarily by physically disabled, non-speaking people to
communicate [5]. Bliss users usually communicate by
pointing to one or more Bliss-words from a restricted set of
Bliss-words on a paper-based communication board. The
speaking partner uses the English translation associated
with each Bliss-word to translate the sentence as a whole.
The Bliss user then confirms whether or not the message
has been correctly understood.
The use of Blissymbolics on electronic augmentative and
alternative communication (AAC) devices has, however,
limited the dynamic use of the language.
Blissymbolics and Electronic AAC Devices
Blissymbolics is currently used with AAC devices in two
main ways:
• Access to dedicated AAC devices: Bliss-words are
used on an overlay on an AAC device, e.g. Dynavox
[http://www.dynavoxsys.com/3]. Selecting a Bliss-
word will result in a spoken word or phrase,
whichever has been linked to the Bliss-word.
• Electronic Blissymbol communication boards: Several
software programs allow clinicians to develop
electronic communication boards which allow users to
select from screens of Bliss-words. These systems, e.g.
WinBliss [http://www.anycom.se/products/WinBliss/]
and Bliss for Windows [http://www.cameleon-
web.com/] with Clicker [http://www.cricksoft.com/
allow users to ‘speak’ the gloss associated with each
Bliss-word.
At a basic level these systems can be used to speak the
English (or other language) translation of each Blissymbol
word for word, but these computer programs do not
provide an automatic translation of a Bliss-sentence. For
example, the Bliss sentence,
has a word-for-word translation ‘boy’, ‘to go’, ‘home’
which would not be considered a correct English sentence.
Ideally, one would want an automatic translation to be
‘The boy goes home’ or ‘The boy is going home’.
Automatic Blissymbolic Translation
Several projects have focused on the automatic translation
of Blissymbolics into grammatically correct spoken
language. For example, traditional techniques of natural
language using grammar and parse trees have been applied
to translate Bliss into Swedish [4] and English [7]. These
systems have created word dictionaries for each Bliss-word
which specify information such as the part of speech. Parse
trees are used to create grammatically correct sentences
based on the order of the Bliss-words. These techniques
have a good success rate giving syntactically correct results
most of the time. The main drawback is that of the nature
of the dictionary. Not only must the words be categorised
through a laborious process but the dictionary is also
language dependent. More importantly is the requirement
that Bliss sentences must be written using Bliss-words for
each spoken word required in the target translation.
Returning to our previous example, a Bliss user might
select:
The bliss user might be wanting to say: “The boy goes
home” (articles “the” and “a” are implied in Bliss). Using
the approach described above, the translation might be
“Boy goes home.”, requiring the user to actively select the
Bliss-word for “the” to achieve “The boy goes home.”
Similarly, the user would have to select the Bliss-word “to
be” before “to go” to ensure “Boy is going home.”
Page 2
Using Prediction in Blissymbolic Translation
The use of word prediction to improve the speed of typing
for people with disabilities is well documented [3, 6]. A
more recent system, called Predictability [2], is language
independent. It does not use any syntactical information,
but uses a word tri-gram model which analyses sequences
of up to three words at a time. Words are not tagged with
grammatical information such as parts of speech and the
system does not rely on a complex language dependent
rule base. The system captures information as the user
types and compares this to word tri-gram information
which has been previously captured from a source text.
Word bi-grams (two word sequences), single word
frequency and a measure of recency of use are also used to
increase the accuracy of the prediction of words which may
possibly follow the text already typed.
The concept of a language-independent word prediction
led to the idea of using predictive techniques to translate
Blissymbol sentences into English, and possibly other
languages.
Word prediction reduces the amount of typing required of
a user by offering a choice of whole words or word
completion. The user remains in control and can choose a
word from a list of options or choose to carry on typing if
the target word is not predicted. The reversal of this
process seemed to be of potential use in the problem of
translating Bliss into English. By analysing English source
texts it might be possible to calculate the possible order
and form of the English words needed for a more
grammatically correct translation of a Blissymbolic
sentence.
A pilot project was thus undertaken to examine the
feasibility of using word tri-gram predictive techniques to
translate Bliss sentences into grammatically correct
English.
The project had two objectives:
1. To develop a computer program which used
prediction techniques to present possible English
translations to a Blissymbol sentence;
2. To evaluate the accuracy of these translations.
The ultimate aim of the project is to incorporate the
translation program into a Blissymbol word processor
called Bliss-Word [1]. This development is ongoing and
will not be discussed in this paper.
METHOD
A translation algorithm lies at the heart of the translation
system. The translation system uses a source text file to
create a word association dictionary containing word tri-
gram information. A Blissymbolics vocabulary file
provides the system with a second dictionary containing
information for each Blissymbol word: the standard gloss
(the English translation, e.g. “to walk”); gloss synonyms
(e.g. “to go”); varying declinations of the English word
(e.g. walk, walks, walking1); and the unique ISO code
associated with that Blissymbol word2. The system will
ultimately be given a Blissymbol sentence as input from a
Blissymbol word processor, but in the meantime a research
interface has been developed to monitor the accuracy of the
translation system.
The Source Text File
The source text file is any text file containing English text.
Source texts used for testing have come from the
Gutenberg Project [http://promo.net/pg/ ].
The Word Association Dictionary
The word association dictionary stores word tri-grams as
the source text is read. Each unique word is stored as a
node in a balanced binary tree. Each node also functions as
the root to another binary tree. This second level tree
contains words which follow the word stored in the root
node. A third level of binary tree expansion contains the
third word of any word tri-gram. Each node stores a word
and an occurrence frequency as integers. The development
of a network of balanced binary trees was crucial for the
efficient creation and navigation of the word association
dictionary.
Figure 1: A diagrammatic representation of the Word
Association Dictionary data structure. (Round
nodes represent the first level, square nodes, the
second, and triangles, the third. So, a possible ABC
sequence might be: the + boy + is.)
The word association dictionary has the ability to gather its
own source corpus and create or add to a dictionary from
it. The dictionary can therefore be added to existing
1 The current system only handles present tense.
2 A two-byte graphic character set of 2,304 Blissymbol
characters was registered with the International Organization
for Standardization (ISO) in 1993. Each Blissymbol word is
coded as a character, providing a static vocabulary.
to walk
boy the
home
boy
dog
man
is
The use of word prediction to improve the speed of typing
for people with disabilities is well documented [3, 6]. A
more recent system, called Predictability [2], is language
independent. It does not use any syntactical information,
but uses a word tri-gram model which analyses sequences
of up to three words at a time. Words are not tagged with
grammatical information such as parts of speech and the
system does not rely on a complex language dependent
rule base. The system captures information as the user
types and compares this to word tri-gram information
which has been previously captured from a source text.
Word bi-grams (two word sequences), single word
frequency and a measure of recency of use are also used to
increase the accuracy of the prediction of words which may
possibly follow the text already typed.
The concept of a language-independent word prediction
led to the idea of using predictive techniques to translate
Blissymbol sentences into English, and possibly other
languages.
Word prediction reduces the amount of typing required of
a user by offering a choice of whole words or word
completion. The user remains in control and can choose a
word from a list of options or choose to carry on typing if
the target word is not predicted. The reversal of this
process seemed to be of potential use in the problem of
translating Bliss into English. By analysing English source
texts it might be possible to calculate the possible order
and form of the English words needed for a more
grammatically correct translation of a Blissymbolic
sentence.
A pilot project was thus undertaken to examine the
feasibility of using word tri-gram predictive techniques to
translate Bliss sentences into grammatically correct
English.
The project had two objectives:
1. To develop a computer program which used
prediction techniques to present possible English
translations to a Blissymbol sentence;
2. To evaluate the accuracy of these translations.
The ultimate aim of the project is to incorporate the
translation program into a Blissymbol word processor
called Bliss-Word [1]. This development is ongoing and
will not be discussed in this paper.
METHOD
A translation algorithm lies at the heart of the translation
system. The translation system uses a source text file to
create a word association dictionary containing word tri-
gram information. A Blissymbolics vocabulary file
provides the system with a second dictionary containing
information for each Blissymbol word: the standard gloss
(the English translation, e.g. “to walk”); gloss synonyms
(e.g. “to go”); varying declinations of the English word
(e.g. walk, walks, walking1); and the unique ISO code
associated with that Blissymbol word2. The system will
ultimately be given a Blissymbol sentence as input from a
Blissymbol word processor, but in the meantime a research
interface has been developed to monitor the accuracy of the
translation system.
The Source Text File
The source text file is any text file containing English text.
Source texts used for testing have come from the
Gutenberg Project [http://promo.net/pg/ ].
The Word Association Dictionary
The word association dictionary stores word tri-grams as
the source text is read. Each unique word is stored as a
node in a balanced binary tree. Each node also functions as
the root to another binary tree. This second level tree
contains words which follow the word stored in the root
node. A third level of binary tree expansion contains the
third word of any word tri-gram. Each node stores a word
and an occurrence frequency as integers. The development
of a network of balanced binary trees was crucial for the
efficient creation and navigation of the word association
dictionary.
Figure 1: A diagrammatic representation of the Word
Association Dictionary data structure. (Round
nodes represent the first level, square nodes, the
second, and triangles, the third. So, a possible ABC
sequence might be: the + boy + is.)
The word association dictionary has the ability to gather its
own source corpus and create or add to a dictionary from
it. The dictionary can therefore be added to existing
1 The current system only handles present tense.
2 A two-byte graphic character set of 2,304 Blissymbol
characters was registered with the International Organization
for Standardization (ISO) in 1993. Each Blissymbol word is
coded as a character, providing a static vocabulary.
to walk
boy the
home
boy
dog
man
is
Page 3
dictionaries to combine information. The dictionary is very
large if it is all loaded into memory at one time. Only
segments of the dictionary that are needed in the
translation are loaded.
The creation of the word association dictionary uses a
word filter which contains the words specified in the
Blissymbol vocabulary file. This avoids wasted data and
helps to contain the size of the word association dictionary.
The Translation Algorithm
The translation algorithm uses a Markov model based on
word tri-grams. The translation algorithm attempts to find
the most likely word or combination of words that would
match the sequence of Blissymbol words in the Bliss
sentence.
Consider the Blissymbol sequence: Z A B C
Target sentences are built up by adding the most likely
translation sequences to one another, using the last word of
one sequence to begin the first word of the next.
Translations are made by first finding the probability of Z
A coming together. Note that this is done at the start of a
sentence (this is when Z represents a period). All
sequences that are known to begin with Z (period), end
with A, and contain at most one word in between Z and A
are added to the list of translations. E.g. period (Z) +
‘boy’ (A) + ‘to go’ (B) + ‘home’ (C) may begin with
‘period + the + boy + …’. In this case ‘the’ is added
between Z and A.
The probability of A B C is then found. The algorithm first
consults the word association dictionary to find the
probability of B following A. Each Bliss-word may have
several synonyms or verb declinations. The translation
possibilities for A B may therefore include several
variations, e.g. A B; A1 B; A2 B; A B1, etc. Some of these
combinations will occur within the word association
dictionary and it is these sequences which are first added
to the list of potential translation sequences. E.g. ‘boy’ (A)
may have a synonym ‘lad’, requiring additional sequences
which use ‘lad’ in the place of ‘boy’.
The word tri-grams that have sequences beginning with A
are then consulted to find the list of words which could
occur between A and B, i.e. A Y B. All valid A Y B
sequences are added to the list of potential translations for
A B. Similarly, all variations on A B are also computed.
E.g. ‘boy’ (A) + ‘to go’ (B) may be ‘boy’ + is + ‘going’;
‘lad’ (A1) + ‘to go’ (B) may be ‘lad’ + is + ‘going’. In this
case ‘is’ is added between A and B when using the verb
declination ‘going’.
The candidate sequences are then ranked according to
probability. The probability of word association is
calculated as follows:
P(B, A) = frequency of B / frequency of A
P(C, AB) = frequency of C / frequency of A
P(D, ABC) = frequency of D / frequency of A
The target sentences are built up by taking the last two
words of one segment to begin to start of the next segment.
This ensures that there are always two symbols preceding
the prediction whenever possible.
Evaluation
The following aspects of the program formed the basis for
evaluating the translation system:
• Time taken to build a word association dictionary.
• Accuracy of translation
• Time taken to translate a sentence.
Time taken to build a word association dictionary.
The time taken to create the source dictionaries were
plotted against ten different sizes of source text, the
smallest being 100,000 words while the largest contained
1,000,000 words.
Accuracy of Translation
A test target set of 20 sentences was processed. The
sentences varied between one and five Blissymbol words
length. The test target set was translated using 3 sizes of
source dictionary (see table 1). Each word has a confidence
value associated with it. The larger the number, the more
confidence exists that the translation is correct. If the
sentence has a confidence of zero, one or more of the word
tri-grams that appear in the translation have not been
found in the source texts. (Any translations that have a
confidence of zero are displayed in italic in the table.)
Time taken to translate a sentence
The time that taken to translate each sentence was noted.
RESULTS
The time taken to create the word association dictionary
was below 30 seconds regardless of the size of the source
text.
Table 1 below shows the results of the sentence
translations. 20 Blissymbol sentences were translated using
three sizes of source text. Table 1 shows an improvement
in accuracy as the size of the source dictionary increases.
As more words are added to the word association
dictionary, the less zero confidence translations are made:
50,000 (17), 1,000,000 (13), and 10,000,000 (8).
large if it is all loaded into memory at one time. Only
segments of the dictionary that are needed in the
translation are loaded.
The creation of the word association dictionary uses a
word filter which contains the words specified in the
Blissymbol vocabulary file. This avoids wasted data and
helps to contain the size of the word association dictionary.
The Translation Algorithm
The translation algorithm uses a Markov model based on
word tri-grams. The translation algorithm attempts to find
the most likely word or combination of words that would
match the sequence of Blissymbol words in the Bliss
sentence.
Consider the Blissymbol sequence: Z A B C
Target sentences are built up by adding the most likely
translation sequences to one another, using the last word of
one sequence to begin the first word of the next.
Translations are made by first finding the probability of Z
A coming together. Note that this is done at the start of a
sentence (this is when Z represents a period). All
sequences that are known to begin with Z (period), end
with A, and contain at most one word in between Z and A
are added to the list of translations. E.g. period (Z) +
‘boy’ (A) + ‘to go’ (B) + ‘home’ (C) may begin with
‘period + the + boy + …’. In this case ‘the’ is added
between Z and A.
The probability of A B C is then found. The algorithm first
consults the word association dictionary to find the
probability of B following A. Each Bliss-word may have
several synonyms or verb declinations. The translation
possibilities for A B may therefore include several
variations, e.g. A B; A1 B; A2 B; A B1, etc. Some of these
combinations will occur within the word association
dictionary and it is these sequences which are first added
to the list of potential translation sequences. E.g. ‘boy’ (A)
may have a synonym ‘lad’, requiring additional sequences
which use ‘lad’ in the place of ‘boy’.
The word tri-grams that have sequences beginning with A
are then consulted to find the list of words which could
occur between A and B, i.e. A Y B. All valid A Y B
sequences are added to the list of potential translations for
A B. Similarly, all variations on A B are also computed.
E.g. ‘boy’ (A) + ‘to go’ (B) may be ‘boy’ + is + ‘going’;
‘lad’ (A1) + ‘to go’ (B) may be ‘lad’ + is + ‘going’. In this
case ‘is’ is added between A and B when using the verb
declination ‘going’.
The candidate sequences are then ranked according to
probability. The probability of word association is
calculated as follows:
P(B, A) = frequency of B / frequency of A
P(C, AB) = frequency of C / frequency of A
P(D, ABC) = frequency of D / frequency of A
The target sentences are built up by taking the last two
words of one segment to begin to start of the next segment.
This ensures that there are always two symbols preceding
the prediction whenever possible.
Evaluation
The following aspects of the program formed the basis for
evaluating the translation system:
• Time taken to build a word association dictionary.
• Accuracy of translation
• Time taken to translate a sentence.
Time taken to build a word association dictionary.
The time taken to create the source dictionaries were
plotted against ten different sizes of source text, the
smallest being 100,000 words while the largest contained
1,000,000 words.
Accuracy of Translation
A test target set of 20 sentences was processed. The
sentences varied between one and five Blissymbol words
length. The test target set was translated using 3 sizes of
source dictionary (see table 1). Each word has a confidence
value associated with it. The larger the number, the more
confidence exists that the translation is correct. If the
sentence has a confidence of zero, one or more of the word
tri-grams that appear in the translation have not been
found in the source texts. (Any translations that have a
confidence of zero are displayed in italic in the table.)
Time taken to translate a sentence
The time that taken to translate each sentence was noted.
RESULTS
The time taken to create the word association dictionary
was below 30 seconds regardless of the size of the source
text.
Table 1 below shows the results of the sentence
translations. 20 Blissymbol sentences were translated using
three sizes of source text. Table 1 shows an improvement
in accuracy as the size of the source dictionary increases.
As more words are added to the word association
dictionary, the less zero confidence translations are made:
50,000 (17), 1,000,000 (13), and 10,000,000 (8).
Page 4
Gloss Sequence 50,000 Word Source 1,000,000 Word Source 10,000,000 Word Source
1 boy Boy the boy the boy
2 (to) run Run Run Run
3 green Green Green Green
4 no No No No
5 I/me + (to) walk I walks and I walks I am walking
6 us + (to) eat when we eat if we eat we eat
7 JOHN + (to) play john play King john play this john play
8 that + right all that right that the right that right
9 I/me + (to) want + (to) see I wants sees and I wants sees I want to see
10 my + picture + best my picture best at my picture best in my picture best
11 I/me + (to) go + bed I goes bed I am going to bed I am going to bed
12 what? + his + name what his name do what her own name What is his name
13 I/me + (to) want + (to) go + zoo I wants goes zoo and I wants goes zoo I want going zoo
14 (to) meet + me + at + cabin meet I at cabin the meeting I must at cabin they meet me at the cabin
15 I/me + (to) like + (to) eat + cake I likes eats cake and I likes eat cake I like eating cake
16 my + favourite + colour + orange my favourite colour orange my favourite colour orange my favourite colour orange
17 (to) push + chair + on + far + right push chair on far right push chair on with far right her push chair on as far right
18 I/me + not + enough + time +
today
when I have not enough time
today
I know not force enough time
today
I am not good enough time today
19 girl + (to) laugh + when + he/she +
(to) play girl laugh when he plays
the girl laugh when she plays this girl laugh when suddenly she plays
20 my + brother + outside + (to) wait
+ I/me
my brother outside waiting I my brother outside waits I my brother outside is waiting I
Table 1: The sentences (Blissymbol gloss sequence) are shown in the first column, the translations using different
size source texts are presented in the other columns. Translations in italics denote zero confidence in the translation.
Time taken to translate a sentence
Figure 2 shows the time taken for each sentence to be
processed.
Translation Times using Version 3
0
2
4
6
8
10
Sentence #
Tim
e (
se
co
nd
s)
Figure 2: The time taken to translate individual sentences.
DISCUSSION
Time taken to build a word association dictionary.
The time taken to build a word association dictionary was
insignificant despite the large amount of data stored.
Accuracy of Translation
The accuracy of each translation is based on the content of
the source texts – i.e. if there are no examples of how to
use a word, regardless of the size of the source text,
accurate translated cannot occur. The accuracy of the
translation therefore improves as the size of the source text
increases as more information on word usage is
accumulated.
The translation algorithm does one of the following
transformations to the gloss sequence – it can change the
form of the gloss (e.g. “walks” instead of “to walk”) or it
can insert additional words into the translation sequence.
With the source of 50,000 words, most of the translations
are simply literal translations based on the glosses. This is
because the word dictionary does not contain enough word
associations to make many changes to the original literal
translation. Some words, such as eat, are changed to eats
(see sentence #15), but as there is no confidence in the
translation as a whole, it is likely that the change to the
word will not be suggested as a possible translation by the
algorithm. Of the twenty translations, only three are given
a positive confidence level. Two (sentences # 1 and 4) can
be found through literal translation and the third (sentence
# 6) has an unnecessary word inserted at the beginning of
the sentence. The insertion of unnecessary words is a
common problem when word associations are few and
weak. Since the source text does not provide enough
examples, the use of the word “we” at the start of a
sentence was not found. This sentence is correctly
translated with the 10,000,000 word source.
More translations are made with confidence using a
1,000,000 word source. All single gloss sequences result
in acceptable translations. Of the double gloss sequences,
only 2 of the 4 give confident results. They are both
acceptable translations but the “if we eat” (sentence # 6)
suffers from the insertion of an inappropriate word
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 boy Boy the boy the boy
2 (to) run Run Run Run
3 green Green Green Green
4 no No No No
5 I/me + (to) walk I walks and I walks I am walking
6 us + (to) eat when we eat if we eat we eat
7 JOHN + (to) play john play King john play this john play
8 that + right all that right that the right that right
9 I/me + (to) want + (to) see I wants sees and I wants sees I want to see
10 my + picture + best my picture best at my picture best in my picture best
11 I/me + (to) go + bed I goes bed I am going to bed I am going to bed
12 what? + his + name what his name do what her own name What is his name
13 I/me + (to) want + (to) go + zoo I wants goes zoo and I wants goes zoo I want going zoo
14 (to) meet + me + at + cabin meet I at cabin the meeting I must at cabin they meet me at the cabin
15 I/me + (to) like + (to) eat + cake I likes eats cake and I likes eat cake I like eating cake
16 my + favourite + colour + orange my favourite colour orange my favourite colour orange my favourite colour orange
17 (to) push + chair + on + far + right push chair on far right push chair on with far right her push chair on as far right
18 I/me + not + enough + time +
today
when I have not enough time
today
I know not force enough time
today
I am not good enough time today
19 girl + (to) laugh + when + he/she +
(to) play girl laugh when he plays
the girl laugh when she plays this girl laugh when suddenly she plays
20 my + brother + outside + (to) wait
+ I/me
my brother outside waiting I my brother outside waits I my brother outside is waiting I
Table 1: The sentences (Blissymbol gloss sequence) are shown in the first column, the translations using different
size source texts are presented in the other columns. Translations in italics denote zero confidence in the translation.
Time taken to translate a sentence
Figure 2 shows the time taken for each sentence to be
processed.
Translation Times using Version 3
0
2
4
6
8
10
Sentence #
Tim
e (
se
co
nd
s)
Figure 2: The time taken to translate individual sentences.
DISCUSSION
Time taken to build a word association dictionary.
The time taken to build a word association dictionary was
insignificant despite the large amount of data stored.
Accuracy of Translation
The accuracy of each translation is based on the content of
the source texts – i.e. if there are no examples of how to
use a word, regardless of the size of the source text,
accurate translated cannot occur. The accuracy of the
translation therefore improves as the size of the source text
increases as more information on word usage is
accumulated.
The translation algorithm does one of the following
transformations to the gloss sequence – it can change the
form of the gloss (e.g. “walks” instead of “to walk”) or it
can insert additional words into the translation sequence.
With the source of 50,000 words, most of the translations
are simply literal translations based on the glosses. This is
because the word dictionary does not contain enough word
associations to make many changes to the original literal
translation. Some words, such as eat, are changed to eats
(see sentence #15), but as there is no confidence in the
translation as a whole, it is likely that the change to the
word will not be suggested as a possible translation by the
algorithm. Of the twenty translations, only three are given
a positive confidence level. Two (sentences # 1 and 4) can
be found through literal translation and the third (sentence
# 6) has an unnecessary word inserted at the beginning of
the sentence. The insertion of unnecessary words is a
common problem when word associations are few and
weak. Since the source text does not provide enough
examples, the use of the word “we” at the start of a
sentence was not found. This sentence is correctly
translated with the 10,000,000 word source.
More translations are made with confidence using a
1,000,000 word source. All single gloss sequences result
in acceptable translations. Of the double gloss sequences,
only 2 of the 4 give confident results. They are both
acceptable translations but the “if we eat” (sentence # 6)
suffers from the insertion of an inappropriate word
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Page 5
(similarly with sentence # 5, “and I walks”). The greatest
success from the translations is sentence # 11, “I am going
to bed”. This translation is based on a triple gloss
sequence. As the gloss sequences get larger, the
confidences drop to 0. The translations do however show
encouraging results. For example, the 5-gloss sequence in
sentence 19, “girl (to) laugh when he/she (to) play”
becomes “the girl laugh when she plays”. This shows that
the beginning of the sentence “the girl” is acceptable. The
end is also a good match, “when she plays”. The only
word that is wrong is “laugh”.
A total of 12 sentences are translated with confidence
using a 10,000,000 word source. The first eight
translations, with the exception of the one that contains a
proper noun, are good. Proper nouns can be entered into
the translator vocabulary in the same way as all other
words. However, as with other words, translation would
depend on the use of the proper noun in the source text.
Sentence # 19, the “girl (to) play when he/she (to) play”, is
not translated correctly. Although it was almost correct
with a smaller source, it now has an added word
(“suddenly”). This is because “suddenly” appears in the
bridge connecting the words “when” and “she” and is
more common than simply putting the words together.
The results illustrate that accuracy of translations is
heavily dependent on the sources. If examples of target
word sequences occur in the source text, accurate
translation is possible. The current algorithm is unable to
generalise and cannot make informed guesses in the
absence of specific examples.
Time taken to translate a sentence
The time that it takes to translate a sentence depends upon
the size of the dictionary, the length of the sentence and
the complexity of word associations that have been built in
the dictionary. All translations were processed within 8
seconds.
CONCLUSION AND FUTURE WORK
A computer program which uses statistical information
from core texts to predict the most probable translation of
a Bliss-sentence has been developed and tested. The
program was initially restricted to three word sentences,
but can now translate some five word sentences. For
example, ‘I/me’ + ‘(to) like’ + ‘(to) eat’ + ‘cake’ is
successfully translated as ‘I like to eat cake’.
Improvements in computing speed and space savings were
also noted.
The size and variety of the source text input to the system
determines the accuracy of the translations as the
algorithm “matches” segments of text to the sentence to be
translated. The algorithm is able to insert “missing”
words, a facility not offered by previous rule-based
systems.
Further improvements to the algorithm are focussing on
using information within Blissymbol words to generalizing
knowledge gained from source texts. E.g. “My + favourite
+ colour + orange” would not be translated if the only
reference to colour in the text is “blue”. Matching
Blissymbol words with similar classifiers may increase the
accuracy of translations.
The incorporation of this software into a word-processor
for Bliss users is underway. The calculation of confidence
levels for translation provides an opportunity to offer users
the most probable translations. It is imperative that users
are not confused by inaccurate translations and translations
with zero confidence will therefore be discarded as these
have a high probability of being inaccurate.
Work is also underway to develop a true-type font which
would allow the manipulation described above to be a
reality in practice [8]. This would also allow Bliss users to
spell new Bliss-words using individual Bliss-characters.
This dynamic use of the Blissymbolics language provides
users with access to a potentially unlimited vocabulary.
ACKNOWLEDGMENTS
We thank the Carnegie Trust for the Universities of
Scotland and the Nuffield Foundation for funding the work
undertaken in the form of two undergraduate research
bursaries. More detailed information on the prediction
algorithm will be available as a technical report.
REFERENCES
1.Andreasen, P.N., Waller, A., Gregor, P. BlissWord - Full
access to Blissymbols for all users, in: Proceedings of
the 8th Biennial Conference of the International Society
for Augmentative and Alternative Communication
(Dublin, Ireland, August 1998), ISAAC, 167-168.
2.Claypool, T., Ricketts, I.W., Gregor, P., Booth, L.,
Palazuelos, S. Learning rates of a tri-gram based Gaelic
word predictor, in: Proceedings of the 8th Biennial
Conference of the International Society for
Augmentative and Alternative Communication (Dublin,
Ireland, August 1998), ISAAC, 177-178.
3.Higginbotham, D.J. Evaluation of keystroke savings
across four assistive communication technologies.
Augmentative and Alternative Communication, 8,
(1992). 158-272.
4.Hunnicutt, S. Bliss Symbol-to-Speech Conversion:
'Blisstalk'. Journal of the American Voice I/O Society, 3,
(June, 1996).
5.McNaughton, S. Communication with Blissymbolics.
BCI, Toronto, Canada, 1986.
6.Newell, A.F., Arnott, J.L., Booth, L., Beattie, W. Effect
of the "PAL" word prediction system on the quality and
quantity of text generation. Augmentative and
Alternative Communication, 8, (1992), 304-311.
7.Reich, P. VOICI: A voice output intelligent
communication interface. Augmentative and Alternative
Communication, 6 (1990), 104.
success from the translations is sentence # 11, “I am going
to bed”. This translation is based on a triple gloss
sequence. As the gloss sequences get larger, the
confidences drop to 0. The translations do however show
encouraging results. For example, the 5-gloss sequence in
sentence 19, “girl (to) laugh when he/she (to) play”
becomes “the girl laugh when she plays”. This shows that
the beginning of the sentence “the girl” is acceptable. The
end is also a good match, “when she plays”. The only
word that is wrong is “laugh”.
A total of 12 sentences are translated with confidence
using a 10,000,000 word source. The first eight
translations, with the exception of the one that contains a
proper noun, are good. Proper nouns can be entered into
the translator vocabulary in the same way as all other
words. However, as with other words, translation would
depend on the use of the proper noun in the source text.
Sentence # 19, the “girl (to) play when he/she (to) play”, is
not translated correctly. Although it was almost correct
with a smaller source, it now has an added word
(“suddenly”). This is because “suddenly” appears in the
bridge connecting the words “when” and “she” and is
more common than simply putting the words together.
The results illustrate that accuracy of translations is
heavily dependent on the sources. If examples of target
word sequences occur in the source text, accurate
translation is possible. The current algorithm is unable to
generalise and cannot make informed guesses in the
absence of specific examples.
Time taken to translate a sentence
The time that it takes to translate a sentence depends upon
the size of the dictionary, the length of the sentence and
the complexity of word associations that have been built in
the dictionary. All translations were processed within 8
seconds.
CONCLUSION AND FUTURE WORK
A computer program which uses statistical information
from core texts to predict the most probable translation of
a Bliss-sentence has been developed and tested. The
program was initially restricted to three word sentences,
but can now translate some five word sentences. For
example, ‘I/me’ + ‘(to) like’ + ‘(to) eat’ + ‘cake’ is
successfully translated as ‘I like to eat cake’.
Improvements in computing speed and space savings were
also noted.
The size and variety of the source text input to the system
determines the accuracy of the translations as the
algorithm “matches” segments of text to the sentence to be
translated. The algorithm is able to insert “missing”
words, a facility not offered by previous rule-based
systems.
Further improvements to the algorithm are focussing on
using information within Blissymbol words to generalizing
knowledge gained from source texts. E.g. “My + favourite
+ colour + orange” would not be translated if the only
reference to colour in the text is “blue”. Matching
Blissymbol words with similar classifiers may increase the
accuracy of translations.
The incorporation of this software into a word-processor
for Bliss users is underway. The calculation of confidence
levels for translation provides an opportunity to offer users
the most probable translations. It is imperative that users
are not confused by inaccurate translations and translations
with zero confidence will therefore be discarded as these
have a high probability of being inaccurate.
Work is also underway to develop a true-type font which
would allow the manipulation described above to be a
reality in practice [8]. This would also allow Bliss users to
spell new Bliss-words using individual Bliss-characters.
This dynamic use of the Blissymbolics language provides
users with access to a potentially unlimited vocabulary.
ACKNOWLEDGMENTS
We thank the Carnegie Trust for the Universities of
Scotland and the Nuffield Foundation for funding the work
undertaken in the form of two undergraduate research
bursaries. More detailed information on the prediction
algorithm will be available as a technical report.
REFERENCES
1.Andreasen, P.N., Waller, A., Gregor, P. BlissWord - Full
access to Blissymbols for all users, in: Proceedings of
the 8th Biennial Conference of the International Society
for Augmentative and Alternative Communication
(Dublin, Ireland, August 1998), ISAAC, 167-168.
2.Claypool, T., Ricketts, I.W., Gregor, P., Booth, L.,
Palazuelos, S. Learning rates of a tri-gram based Gaelic
word predictor, in: Proceedings of the 8th Biennial
Conference of the International Society for
Augmentative and Alternative Communication (Dublin,
Ireland, August 1998), ISAAC, 177-178.
3.Higginbotham, D.J. Evaluation of keystroke savings
across four assistive communication technologies.
Augmentative and Alternative Communication, 8,
(1992). 158-272.
4.Hunnicutt, S. Bliss Symbol-to-Speech Conversion:
'Blisstalk'. Journal of the American Voice I/O Society, 3,
(June, 1996).
5.McNaughton, S. Communication with Blissymbolics.
BCI, Toronto, Canada, 1986.
6.Newell, A.F., Arnott, J.L., Booth, L., Beattie, W. Effect
of the "PAL" word prediction system on the quality and
quantity of text generation. Augmentative and
Alternative Communication, 8, (1992), 304-311.
7.Reich, P. VOICI: A voice output intelligent
communication interface. Augmentative and Alternative
Communication, 6 (1990), 104.
Page 6
8.Waller, A., McNaughton, S., Koerselman, E., Jennische,
M., Nelms, G. Blissymbolics – The emergence of a
written language, in: Proceedings of the 9th Biennial
Conference of the International Society for
Augmentative and Alternative Communication
(Washington, DC USA, August 2000) ISAAC, 364-365
M., Nelms, G. Blissymbolics – The emergence of a
written language, in: Proceedings of the 9th Biennial
Conference of the International Society for
Augmentative and Alternative Communication
(Washington, DC USA, August 2000) ISAAC, 364-365
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
4 Readers on Mendeley
by Discipline
by Academic Status
25% Ph.D. Student
25% Researcher (at an Academic Institution)
25% Researcher (at a non-Academic Institution)
by Country
50% United Kingdom
25% Japan
25% Germany


