Диплом (Master), Helsinki University of Technology, 2007, -66 pp.
The duration of phones play a significant part in the comprehension of speech. Finnish, for example, has several word pairs which can be distinguishable mainly by the duration of their phones. In automatic speech recognition, it is very important to detect these differences. Modern speech recognition systems, however, use hidden Markov models, which are deficient in modeling phone durations due to their intrinsic model assumptions. This thesis studied how the acoustic models of a speech recognition system could be improved to handle phone durations more effectively and improve speech recognition accuracy.
Three different techniques for including improved phone duration models in Markov models were studied. The thesis includes a theoretical study of the techniques. The techniques were also implemented in the speech recognition system developed at the Laboratory of Computer and Information Science. An overview of the system is included in the thesis. Using the speech recognition system experiments were carried out to compare the usefulness of the techniques. The best technique achieved about 8% relative improvement in the letter error rate, which proves that improved modeling of phone durations can benefit automatic speech recognition.
Speech recognition experiments were carried out on Finnish material, using speaker dependent models. This guaranteed that speaker dependent variations in phone durations were minimized, which was necessary given that various factors affect the actual duration of phones. This work only accounted for the effect of the phoneme context. In more general applications, inclusion of other factors are probably also necessary.
Acoustic Modeling
The Speech Recognition System
Duration Modeling for HMMs
Experimental Evaluation