Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese
- ISSN: 10636676
- DOI: 10.1109/TSA.2002.802541
Abstract
With the rapidly growing use of the audio and multimedia information over the Internet, the technology for retrieving speech information using voice queries is becoming more and more important. In this paper, considering the monosyllabic structure of the Chinese language, a whole class of syllable-based indexing features, including overlapping segments of syllables and syllable pairs separated by a few syllables, is extensively investigated based on a Mandarin broadcast news database. The strong discriminating capabilities of such syllable-based features were verified by comparing with the word- or character-based features. Good approaches for better utilizing such capabilities, including fusion with the word- and character-level information and improved approaches to obtain better syllable-based features and query expressions, were extensively investigated. Very encouraging experimental results were obtained.
Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese
Discriminating Capabilities of Syllable-Based
Features and Approaches of Utilizing Them for Voice
Retrieval of Speech Information in Mandarin Chinese
Berlin Chen, Hsin-min Wang, Member, IEEE, and Lin-shan Lee, Fellow, IEEE
Abstract—With the rapidly growing use of the audio and
multimedia information over the Internet, the technology for re-
trieving speech information using voice queries is becoming more
and more important. In this paper, considering the monosyllabic
structure of the Chinese language, a whole class of syllable-based
indexing features, including overlapping segments of syllables
and syllable pairs separated by a few syllables, is extensively
investigated based on a Mandarin broadcast news database. The
strong discriminating capabilities of such syllable-based features
were verified by comparing with the word- or character-based
features. Good approaches for better utilizing such capabilities,
including fusion with the word- and character-level information
and improved approaches to obtain better syllable-based fea-
tures and query expressions, were extensively investigated. Very
encouraging experimental results were obtained.
Index Terms—Confidence measure, retrieval of speech informa-
tion, syllable-based features, term association matrix.
I. INTRODUCTION
DUE TO THE prevalence of the Internet, huge quantitiesof information are being accumulated very rapidly and
being made available to users. As a result, the primary obstacle
for people to access the information is no longer the spatial or
temporal distances, but instead the lack of efficient ways to re-
trieve the desired information. Information retrieval techniques
which provide the users with convenient access to the desired
information are therefore extremely attractive [1]. Most of the
works on information retrieval have been focused on approaches
using text input to retrieve text information. Substantial efforts
and very encouraging results have been reported and practically
useful systems have been successfully implemented along this
direction [2]–[6]. Recently, with the advances in speech recog-
nition technology [7]–[11], proper integration of information
retrieval and speech recognition has been considered by many
researchers. But most of such works tried to handle either the
text information retrieval using speech queries [12], [13] or the
speech information retrieval using text queries [14]–[24]. Only
very limited works have considered the problem of speech infor-
mation retrieval using speech queries [25], [26]. With the rapidly
Manuscript received February 27, 2001; revised October 23, 2001. The as-
sociate editor coordinating the review of this manuscript and approving it for
publication was Prof. C.-C. Jay Kuo.
The authors are with the Institute of Information Science, Academia Sinica,
Taipei, Taiwan, R.O.C. (e-mail: berlin@iis.sinica.edu.tw; lsl@iis.sinica.edu.tw;
whm@iis.sinica.edu.tw).
Publisher Item Identifier 10.1109/TSA.2002.802541.
growing use of audio and multimedia information on the In-
ternet, an exponentially increasing number of voice records such
as broadcast radio, television programs, digital libraries and so
on, are now being accumulated and made available. However,
most of them are simply stored there and difficult for further
reuse because of the lack of efficient retrieval technology. De-
velopment of the technology to retrieve speech information is
thus becoming more and more important. In any case, retrieval
of huge quantities of speech information using speech queries
directly is apparently the most natural, convenient and attrac-
tive, although the technology involved will be the most diffi-
cult as well. For the Chinese language, because the language is
not alphabetic and there exist a huge number of commonly used
Chinese characters, the input of Chinese characters into com-
puters is a very difficult and unsolved problem even today. As a
result, voice retrieval of speech information will be much more
important and attractive for Mandarin Chinese than for other
languages.
Unlike text information, speech information cannot be re-
trieved at all by directly comparing the input speech queries with
the voice records. Not only can the vocabularies, texts, and topic
domains spoken in the voice records and the speech queries be
completely different, but the differences in acoustic conditions
such as speakers, speaking modes, and background noises add
further complication. Therefore, both the speech queries and the
voice records must be transcribed into some kind of content
features using speech recognition techniques, based on which
the relevance between the speech queries and the voice records
can then be measured. As a result, accurate recognition of Man-
darin speech with a high degree of variability in vocabularies,
topic domains and acoustic conditions is certainly the first key
issue in the problem to be discussed here. Such a high degree
of variability apparently makes the desired accurate recogni-
tion very difficult, and a substantial percentage of recognition
errors will inevitably happen. Such speech recognition errors
definitely make the information retrieval techniques considered
here significantly different from those used in the conventional
text retrieval approaches, and a very high degree of robustness
in these retrieval techniques is obviously needed.
The second issue for voice retrieval of Mandarin speech in-
formation is to choose appropriate content features to represent
both the voice records as well as the speech queries, so that they
can be used in evaluating the relevance measure in the retrieval
processes [10], [15], [26]. There can be at least three areas of
approaches: keyword-based, word-based, and subword-based
1063–6676/02$17.00 © 2002 IEEE
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


