Building multiple automatic speech recognition (ASR) systems and combining their outputs using voting techniques such as ROVER is an effective technique for lowering the overall word error rate. A successful system combination approach requires the construction of multiple systems with complementary errors, or the combination will not outperform any of the individual systems. In general, this is achieved empirically, for example by building systems on different input features. In this paper, we present a systematic approach for building multiple ASR systems in which the decision tree statetying procedure that is used to specify context-dependent acoustic models is randomized. Experiments carried out on two large vocabulary recognition tasks, MALACH and DARPA EARS, illustrate the effectiveness of the approach.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below