Диссертация на соискание ученой степени доктора философии (PhD). : 6D060200 – Информатика. — Евразийский национальный университет им. Л. Н. Гумилева. — Астана: 2014. — 175 с. Научные руководители: доктор технических наук, профессор Шаріпбай А.Ә. доктор физико-математических наук, профессор Шелепов В.Ю. Целью данной диссертационной работы является комплексное исследование и...
Doctoral thesis for the degree of PhD. : 6D070400 – Computing Systems and Software. — Suleyman Demirel University. — Kaskelen: 2020. — 111 p. Scientific supervisors: Assoc prof PhD Kanat Kozhakmet. Paulo Menezes –PhD Professor of Coimbra University (Portugal). Relevance: Emotions take a significant place in interpersonal human interactions and relationships. Emotion affects our...
Karlsruher Institut Für Technologie, 2014. — 256 p. This thesis aims at enhancing and improving myoelectric Silent Speech recognition. Based on a standard speech recognition toolchain, we systematically develop methods and algorithms to adapt these components in a way specifically suited for the EMG signal. While our main goal is to improve the recognition accuracy of the Silent...
Автореферат диссертации на соискание ученой степени кандидата технических наук. Специальность: 05.12.04 Радиотехника, в том числе системы и устройства телевидения. — Владимир, ЯрГУ, 2011. — 20 с. В настоящее время системы распознавания речи получают все большее распространение, особенно в тех приложениях, где речевой диалог является наиболее удобным средством управления и обмена...
Dissertation. — Universitat Politècnica de Catalunya, 1985. — 250 p. There has been a substantial interest in the last few decades in the problem of training computers to recognize human speech. In spite of the concentrated efforts of conscientious teams of researchers, however, the solution remains elusive, unless the task is kept so restricted as to be uninteresting. These...
Диссертация, Universitat Politècnica de Catalunya, 2008. — 156 p. Automatic speaker recognition is the use of a machine to identify an individual from a spoken sentence. Recently, this technology has been undergone an increasing use in applications such as access control, transaction authentication, law enforcement, forensics, and system customisation, among others. One of the...
Диссертация на соискание ученой степени кандидата технических наук: 05.11.16 - Информационно-измерительные и управляющие системы (приборостроение). — Пензенский государственный университет. — Пенза, 2015. — 222 с. Научный руководитель: д.т.н., профессор Чураков П.П. Введение Аналитический обзор алгоритмов обработки речевых команд и систем голосового управления Анализ предметной...
Автореферат диссертации на соискание ученой степени кандидата технических наук: 05.11.16 - Информационно-измерительные и управляющие системы (приборостроение). — Пензенский государственный университет. — Пенза, 2015. — 24 с. Научный руководитель: д.т.н., профессор Чураков П.П. Целью диссертационного исследования является совершенствование существующих и разработка новых алгоритмов...
PhD dissertation. — University of Cambridge, 1996. — 146 p. HMM-based speech recognition systems have recently demonstrated impressive recognition performance. Many of these systems attempt to provide low error rates for a large range of speakers. However, the performance of these speaker independent systems is generally inferior to speaker dependent systems trained for a...
PhD dissertation. — Cambridge University, 2010. — 191 p. In recent years, systems based on support vector machines (SVMs) have become standard for speaker verification (SV) tasks. An important aspect of these systems is the dynamic kernel. These operate on sequence data and handle the dynamic nature of the speech. In this thesis a number of techniques are proposed for improving...
PhD dissertation. — Cambridge University, 1999. — 156 p. Computer-assisted language learning (CALL) systems which are able to listen to a student's speech and to judge its quality would be very valuable for foreign language teaching. However, currently it is difficult to integrate pronunciation teaching and assessment in computer-assisted language learning systems. Two major...
PhD dissertation. — Cambridge University, 2003. — 163 p. Most modern speech recognition systems use either Mel-frequency cepstral coefficients or perceptual linear prediction as acoustic features. Recently, there has been some interest in alternative speech parameterisations based on using formant features. Formants are the resonant frequencies in the vocal tract which form the...
PhD dissertation. — Cambridge University, 2011. — 336 p. A standard way of improving the robustness of speech recognition systems to noise is model compensation. This replaces a speech recogniser’s distributions over clean speech by ones over noise-corrupted speech. For each clean speech component, model compensation techniques usually approximate the corrupted speech...
PhD dissertation. — Cambridge University, 2001. — 157 p. This dissertation details the development and evaluation of techniques to enhance speech corrupted by unknown independent additive noise when only a single microphone is available. It therefore seeks to address a deficiency of many speech enhancement systems which require a priori knowledge of the interfering noise...
PhD dissertation. — Cambridge University, 1999. — 266 p. The thesis considers a novel technique for adaptation of speaker models, called eigenvoice decomposition (ED), based around reducing the dimension of the search space of acoustic models. The technique is compared both pradically and theoretically with several other adaptation techniques. The use of Principal Component...
PhD dissertation. — Cambridge University, 2013. — 266 p. The discriminative approach to speech recognition offers several advantages over the generative, such as a simple introduction of additional dependencies and direct modelling of sentence posterior probabilities/decision boundaries. However, the number of sentences that can possibly be encoded into an observation sequence...
PhD dissertation. — Cambridge University, 2014. — 221 p. Discriminative training criteria and discriminative models are two e.ective improvements for HMM-based speech recognition. This thesis proposed a structured support vectormachine (SSVM) framework suitable formedium to large vocabulary continuous speech recognition. An important aspect of structured SVMs is the form of...
PhD dissertation. — Cambridge University, 2014. — 231 p. Model-based approaches are a powerful and exible framework for robust speech recognition. This framework has been extensively investigated during the past decades and has been extended in a number of ways to handle distortions caused by various acoustic factors, including speaker di_erences, channel distortions and...
PhD dissertation. — University of Cambridge, 2015. — 259 p. In continuous speech recognition, observations are sequential data with variable length, and labels are sequence of words (or sub-words) possibly having unbounded number of classes. It is thus impractical to robustly constructmodels for the whole word sequence. To address this problem, rather than treating the whole...
Springer, 2014. — 198. The automatic detection of people’s identity from their voices is part of modern telecommunication services. This generally requires the telephone transmission of the speech to remote servers that perform the recognition task. The transmission may introduce severe distortions that degrade the system performance and hence represents one of the major...
PhD dissertation. — Indian Institute of Technology, Madras, 2009. — 195 p. The primary mode of excitation of the vocal-tract system during speech production is due to the vibration of the vocal folds. For voiced speech, the most significant excitation takes place around the instant of glottal closure, called the epoch. The objective of this work is to extract the epoch...
PhD dissertation. — Massachusetts Institute of Technology, 1995. — 159 p. This thesis studies and interprets the inventory of acoustic events associated with the changing vocal-tract configurations that characterize fricatives preceding vowels. Theoretical considerations of the articulatory, aerodynamic and acoustic aspects of the production of fricatives provide the foundation...
PhD dissertation. — Massachusetts Institute of Technology, 1998. — 173 p. The acoustic-phonetic modeling component of most current speech recognition systems calculates a small set of homogeneous frame-based measurements at a single, fixed time-frequency resolution. This thesis presents evidence indicating that recognition performance can be significantly improved through a...
PhD dissertation. — Massachusetts Institute of Technology, 1999. — 111 p. In this thesis, a method for designing a hierarchical speech recognition system at the phonetic level is presented. The system employs various component modules to detect acoustic cues in the signal. These acoustic cues are used to infer values of features that describe segments. Features are considered...
PhD dissertation. — Boston University, 1998. — 216 p. Acoustic modeling and analysis of speech based on phonetic features is explored in the current research for speaker-independent speech recognition. Phonetic features are minimal speech units that describe the manner and place of articulation of the sounds of a language. In this research, it is shown that phonetic features...
Диплом (Master), Massachusetts Institute of Technology, 1997. — 86 p. The problem addressed by this research is the automatic construction of a model of the fundamental frequency (F 0 ) contours of a given speaker to enable the synthesis of new contours for use in Text-to-Speech synthesis. The parametric F 0 generation model designed by Fujisaki is used to analyze observed F 0...
PhD dissertation. — Massachusetts Institute of Technology, 2003. — 115 p. The singing voice is the oldest and most variable of musical instruments. By combining music, lyrics, and expression, the voice is able to affect us in ways that no other instrument can. As listeners, we are innately drawn to the sound of the human voice, and when present it is almost always the focal...
Диплом (Master), Massachusetts Institute of Technology, 1991. — 89 p. One of the most critical and yet unsolved problems in phonetic recognition is the transformation of the continuous speech signal to a discrete ,representation for accessing words in the lexicon. In order to find an efficient description of speech for recognition tasks. our research investigates the use of...
Диплом (Master), Massachusetts Institute of Technology, 2004. — 187 p. Currently, most dialog systems are restricted to single user environments. This thesis aims to promote an untethered multi-person dialog system by exploring approaches to help solve the speech correspondence problem (i.e. who, if anyone, is currently speaking). We adopt a statistical framework in which this...
Диплом (Master), Massachusetts Institute of Technology, 2008. — 135 p. This thesis addresses the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological modeling studies implicating the use of temporal changes in speech by...
PhD dissertation. — Massachusetts Institute of Technology, 2000. — 200 p. Lexical Access From Features (LAFF) is a proposed knowledge-based speech recognition system which uses landmarks to guide the search for distinctive features. The first stage in LAFF must find Vowel landmarks. This task is similar to automatic detection of syllable nuclei (ASD). This thesis adapts and...
PhD dissertation. — Massachusetts Institute of Technology, 1999. — 226 p. This thesis links processing in working memory to prosody in speech, and links different working memory capacities to different prosodic styles. It provides a causal account of prosodic differences and an architecture for reproducing them in synthesized speech. The implemented system mediates text-based...
PhD dissertation. — Massachusetts Institute of Technology, 2014. — 188 p. The ability to infer linguistic structures from noisy speech streams seems to be an innate human capability. However, reproducing the same ability in machines has remained a challenging task. In this thesis, we address this task, and develop a class of probabilistic models that discover the latent...
PhD dissertation. — Purdue University, 2013. — 146 p. The areas of mispronunciation detection (or accent detection more specifically) within the speech recognition community are receiving increased attention now. Two application areas, namely language learning and speech recognition adaptation, are largely driving this research interest and are the focal points of this work....
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2015. — 124 p. Speaker diarization is the task of identifying who spoke when in an audio stream containing multiple speakers. This is an unsupervised task as there is no a priori information about the speakers. Diagnostical studies on state-of-the-art diarization systems have isolated three main issues with the...
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2015. — 128 p. Automatic processing of multiparty interactions is a research domain with important applications in content browsing, summarization and information retrieval. In recent years, several works have been devoted to find regular patterns which speakers exhibit in a multiparty interaction also known as...
Диплом (Master), IDIAP Research Institut, 2015. — 45 p. The general subject of this work is to present mathematical methods encountered in automatic speech recognition (ASR). Learning, evaluation and decoding problems are important parts in ASR and need hidden Markov models to solve them. These processes are explained in the first chapter after some basic definitions. Because...
Springer, 1991. — 402 p. Today there is a great deal of interest and excitement in the investigation of artificial neural networks. Yet, when things sort themselves out, neural networks will do less than their most fervent supporters in their most enthusiastic moments suggest. But they will do more than the most pessimistic estimates of their most adamant detractors. We will...
PhD dissertation. — University of Cambridge, 2001. — 127 p. The work in this thesis concerns Named Entity (NE) recognition from speech and its use in the generation of enhanced speech recognition output with automatic punctuation and automatic capitalisation. A method for the automatic generation of rules is proposed for NE recognition. Punctuation marks are generated using...
PhD dissertation. — University of York, 2014. — 323 p. The research presented in this thesis examines the calculation of numerical likelihood ratios using phonetic and linguistic parameters derived from a corpus of recordings of speakers of Southern Standard British English. The research serves as an investigation into the development of the numerical likelihood ratio as a...
PhD dissertation. — University of Washington, 2008. — 149 p. Increasing amounts of easily available electronic data are precipitating a need for automatic processing that can aid humans in digesting large amounts of data. Speech and video are becoming an increasingly significant portion of on-line information, from news and television broadcasts, to oral histories, on-line...
PhD dissertation. — Stanford University, 2006. — 202 p. In a natural environment, speech often occurs simultaneously with acoustic interference. Many applications, such as automatic speech recognition and telecommunication, require an effective system that segregates speech from interference in the monaural (one-microphone) situation. While this task of monaural speech...
PhD dissertation. — Stanford University, 1985. — 155 p. This thesis is concerned with how a person can listen to one person speaking in the presence of an interfering talker using a monaural recording of the conversation. Of course people have two ears, and the directional capabilities that a person gains from using two ears to focus on one talker are very important. However,...
PhD dissertation. — Griffith University, Brisbane, Australia, 2005. — 208 p. Incorporating information from the short-time phase spectrum into a feature set for automatic speech recognition (ASR) may possibly serve to improve recognition accuracy. Currently, however, it is common practice to discard this information in favour of features that are derived purely from the...
Master Thesis, Norwegian University of Science and Technology, Trondheim, Norway, 2009. — 91 p. The classical front-end analysis in speech recognition is a spectral analysis which produces features vectors consisting of mel-frequency cepstral coefficients (MFCC). MFCC are based on a standard power spectrum estimate which is first subjected to a log-based transform of the...
PhD dissertation. — University of Miami, 2009. — 169 p. Emotion conveys the psychological state of a person. It is expressed by a variety of physiological changes, such as changes in blood pressure, heart beat rate, degree of sweating, and can be manifested in shaking, changes in skin coloration, facial expression, and the acoustics of speech. This research focuses on the...
PhD dissertation. — Queen’s University, Kingston, Ontario, Canada, 2009. — 126 p. Automatic recognition of human emotion in speech aims at recognizing the underly- ing emotional state of a speaker from the speech signal. The area has received rapidly increasing research interest over the past few years. However, designing powerful spec- tral features for high-performance speech...
PhD dissertation. — Queensland University of Technology, Australia, 2005. — 248 p. Keyword Spotting is the task of detecting keywords of interest within continuous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword...
PhD dissertation. — University of Szeged, Hungary, 2010. — 153 p. Even from the beginning of speech recognition technology two aspects proved to be very important, and perhaps the two most important ones. The first one was a goal: to recognize as the word or sentence spoken as accurately as possible has evidently a high focus as this is the purpose of the whole speech...
PhD dissertation. — Carnegie Mellon University, 1990. — 153 p. This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different...
PhD dissertation. — University of California, Berkeley, 2004. — 100 р. From cell phones and PDAs to huge automated call centers, speech recognition is becoming more and more ubiquitous. As demand for automatic speech recognition (ASR) applications increases, so too does the need to run ASR algorithms on a variety of unconventional computer architectures. One such architect ure...
Диссертация к.т.н. : 05.13.11. — Санкт-Петербургский институт информатики и автоматизации. — СПб.: 2011. — 137 с. Целью диссертационной работы является разработка методов, алгоритмов и программных средств акустико-фонетического моделирования вариативности произношения слов и синтаксическо-статистического моделирования языка для повышения точности распознавания разговорной...
PhD dissertation. — Carnegie Mellon University, 2013. — 145 p. Speech is one of the most private forms of personal communication. A sample of a person’s speech contains information about the gender, accent, ethnicity, and the emotional state of the speaker apart from the message content. Speech processing technology is widely used in biometric authentication in the form of...
PhD dissertation. — Brno University of Technology, 2012. — 133 p. Statistical language models are crucial part of many successful applications, such as automatic speech recognition and statistical machine translation (for example well-known Google Translate). Traditional techniques for estimating these models are based on N-gram counts. Despite known weaknesses of N-grams and...
PhD dissertation. — University of Florida, 2012. — 207 p. Advanced signal processing techniques can help us well analyze signals of interests and perform proper operations on signals of interests for many useful applications. In this dissertation, we aim at developing signal processing techniques for speaker recognition (e.g. feature extraction, classifier design) and for...
PhD dissertation. — Columbia University, 2011. — 190 p. A fundamental challenge for current research on speech science and technology is understanding and modeling individual variation in spoken language. Individuals have their own speaking styles, depending on many factors, such as their dialect and accent as well as their socioeconomic background. These individual differences...
PhD dissertation. — Universitat Pompeu Fabra, 2009. — 103 p. The aim of a speech emotion recognizer is to produce an estimate of the emotional state of the speaker given a speech fragment as an input. In other words we seek a solution for the tricky problem: given a speech fragment how to know what the speaker is feeling, even if she did not intend us to know that. Speech...
Автореферат диссертации на соискание ученой степени кандидата технических наук, УлГТУ, Ульяновск, 2006. - 19 с.
Специальность - 05.13.18 Математическое моделирование, численные методы и комплексы программ
Научный руководитель - доктор технических наук, зав. кафедрой САПР УлГТУ, профессор Крашенинников В. Р.
Целью диссертации является разработка эффективных методов...
Автореферат диссертации на соискание ученой степени кандидата технических наук. УлГТУ, Ульяновск, 2007. - 19 с.
Специальность - 05.13.18 Математическое моделирование, численные методы и комплексы программ
Научный руководитель – доктор технических наук, профессор, зав. кафедрой САПР УлГТУ, Крашенинников В. Р.
Целью диссертации является разработка методов, алгоритмов и...
Автореферат диссертации на соискание ученой степени кандидата технических наук. УлГТУ, Ульяновск, 2008. - 19 с.
Специальность - 05.13.18 Математическое моделирование, численные методы и комплексы программ
Научный руководитель - доктор технических наук, профессор Крашенинников В. Р.
Целью диссертации является разработка эффективных алгоритмов обнаружения границ РА на...
PhD dissertation. — Brown University, 2007. — 139 p. Talker recognition and microphone arrays have each been widely studied individually. The problem of distant-talking speech recognition using microphone arrays has become a topic of an increasing number of research papers recently. However, the problem of distant-talking speaker recognition is receiving much less attention. In...
PhD dissertation. — Brown University, 2000. — 122 p. A combination of microphone arrays and sophisticated signal processing has been applied to the remote acquisition of high-quality speech audio. These applications all exploit the spatial filtering ability of an array, which allows the speech signal from one talker to be enhanced as the signals from other talkers and unwanted...
PhD dissertation. — Brown University, 2007. — 131 p. The problem addressed is the real-time labeling of talker identity for conversational speech from several talkers moving freely around a conference-sized room. Because the number of talkers and the identities of the talkers are unknown prior to system startup, labeling consists of marking each speech interval with a unique...
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2005. — 242 p. Nowadays, state-of-the-art automatic speaker recognition systems show very good performance in discriminating between voices of speakers under controlled recording conditions. However, the conditions in which recordings are made in investigative activities (e.g., anonymous calls and wire-tapping)...
PhD dissertation. — University of Illinois, 1999. — 119 p. The standard hidden Markov model (HMM) has been proved to be the most successful model for speech recognition. A most widely addressed problem of the HMM is the assumption of independent observations given the state sequence. In the past few years, a wide range of state-space models and graphical models, such as...
PhD dissertation. — University of Cambridge, 2004. — 157 p. Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions, some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete state that generated...
PhD dissertation. — Helsinki University of Technology, 2009. — 66 p. Automatic speech recognition systems are devices or computer programs that convert human speech into text or make actions based on what is said to the system. Typical applications include dictation, automatic transcription of large audio or video databases, speech-controlled user interfaces, and automated...
PhD dissertation. — Queensland University of Technology, 2010. — 237 p. Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In noise-free environments, word recognition performance of...
PhD dissertation. — University of Illinois, 2010. — 131 p. The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy in automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2010. — 214 p. Conventional speech recognition systems are based on Gaussian hidden Markov models (HMMs). Discriminative techniques such as log-linear modeling have been investigated in speech recognition only recently. This thesis establishes a log-linear modeling framework in the context of discriminative...
PhD dissertation. — Massachusetts Institute of Technology, 2009. — 185 p. Despite the proliferation of speech-enabled applications and devices, speech-driven human-machine interaction still faces several challenges. One of thesis issues is the new word or the out-of-vocabulary (OOV) problem, which occurs when the underlying automatic speech recognizer (ASR) encounters a word it...
PhD dissertation. — Johns Hopkins University, 2009. — 317 p. The output of a speech recognition system is often not what is required for subsequent processing, in part because speakers themselves make mistakes (e.g. stuttering, self-correcting, or using filler words). A system would accomplish speech reconstruction of its spontaneous speech input if its output were to...
PhD dissertation. — University of Cambridge, 2009. — 206 p. State-of-the-art speech recognition systems are based on statistical techniques and use hidden Markov models (HMMs) as acoustic models. These acoustic models are trained from a large amount of speech data usually collected from a large number of speakers and in different acoustic environments. The training data...
PhD dissertation. — Massachusetts Institute of Technology, 2009. — 164 p. This thesis introduces a novel technique for noise robust speech recognition by first describing a speech signal through a set of broad speech units, and then conducting a more detailed analysis from these broad classes. These classes are formed by grouping together parts of the acoustic signal that have...
PhD dissertation. — Massachusetts Institute of Technology, 2009. — 108 p. While automatic speech recognition (ASR) systems have steadily improved and are now in widespread use, their accuracy continues to lag behind human performance, particularly in adverse conditions. This thesis revisits the basic acoustic modeling assumptions common to most ASR systems and argues that...
Диплом (Master), Massachusetts Institute of Technology, 2009. — 90 p. This research explores applications of joint letter-phoneme subwords, known as graphones, in several domains to enable detection and recognition of previously unknown words. For these experiments, graphones models are integrated into the SUMMIT speech recognition framework. First, graphones are applied to...
Диплом (Master), Massachusetts Institute of Technology, 2008. — 75 p. Efficient error correction of recognition output is a major barrier in the adoption of speech interfaces. This thesis addresses this problem through a novel correction framework and user interface. The system uses constraints provided by the user to enhance re-recognition, correcting errors with minimal user...
PhD dissertation. — University of Toronto, 2008. — 269 p. Robust speech recognition in acoustic environments that contain multiple speech sources and/or complex non-stationary noise is a difficult problem, but one of great practical interest. The formalism of probabilistic graphical models constitutes a relatively new and very powerful tool for better understanding and...
PhD dissertation. — Západočeská univerzita v Plzni, 2008. — 125 p. This thesis deals with the problem of building language models for automatic continuous speech recognition of inflectional languages. Impressive progress was made in large vocabulary continuous speech recognition in last decades. However, recognition systems for English perform noticeably better than the other,...
PhD dissertation. — Technischer Universität Berlin, 2008. — 221 p. It has long been a dream of many to be able to speak to a computer and be understood. Whereas this dream will remain in the realm of fantasy for a while, there are some applications which appear worthwile as well as achievable. One of those is speech recognition in car environments, useful, as it may be used to...
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2008. — 178 p. The use of local phoneme posterior probabilities has been increasingly explored for improving speech recognition systems. Hybrid hidden Markov model / artificial neural network (HMM/ANN) and Tandem are the most successful examples of such systems. In this thesis, we present a principled framework for...
PhD dissertation. — Universiteit Twente, 2008. — 184 p. In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions of the audio to be...
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2008. — 159 p. In this thesis, we investigate the use of posterior probabilities of sub-word units directly as input features for automatic speech recognition (ASR). These posteriors, estimated from data-driven methods, display some favourable properties such as increased speaker invariance, but unlike conventional...
PhD dissertation. — Universitat Politècnica de Catalunya, 2006. — 348 p. This PhD thesis verses about the topic of speaker diarization for meetings. While answering to the question ``Who spoke when?'', the presented speaker diarization system is able to process a variable number of microphones spread around the meeting room and determine the optimum output without any prior...
PhD dissertation. — Carnegie Mellon University, 2007. — 177 p. The automatic speaker recognition technologies have developed into more and more important modern technologies required by many speech-aided applications. The main challenge for automatic speaker recognition is to deal with the variability of the environments and channels from where the speech was obtained. In...
PhD dissertation. — Hebrew University, 2007. — 110 p. Automatic speech recognition has long been a considered dream. While ASR does work today, and it is commercially available, it is extremely sensitive to noise, talker variations, and environments. The current state-of-the-art automatic speech recognizers are based on generative models that capture some temporal dependencies...
PhD dissertation. — University of Cambridge, 2007. — 181 p. It is well known that the performance of automatic speech recognition degrades in noisy conditions. To address this, typically the noise is removed from the features or the models are compensated for the noise condition. The former is usually quite efficient, but not as effective as the latter, often computationally...
PhD dissertation. — l’École Nationale Supérieure des Télécommunications, 2007. — 178 p. Speech is one of the most natural ways of communication for human beings. The task which extracts the intended message content in the signal is automatic speech recognition (ASR). Since the human speech carries not only the linguistic information but also the personal information such as the...
PhD dissertation. — University of Pennsylvania, 2007. — 133 p. Automatic speech recognition (ASR) depends critically on building acoustic models for linguistic units. These acoustic models usually take the form of continuous-density hidden Markov models (CD-HMMs), whose parameters are obtained by maximum likelihood estimation. Recently, however, there has been growing interest...
PhD dissertation. — University of Missouri-Columbia, 2007. — 101 p. In this dissertation work, new approaches are proposed for online large vocabulary conversational speech recognition, including a fast confusion network algorithm for aligning competing word hypotheses, novel features and a Random Forests based classifier for word confidence annotation, new improvements in...
Диссертация, Санкт-Петербургский институт информатики и автоматизации, 2007, -176 pp. Основной целью диссертациионной работы является разработка модели дикторонезависимого распознавания русской слитной речи с большим словарем, которая обеспечивает ускорение процесса обработки речи при сохранении точности распознавания. Для достижения поставленной цели в ходе диссертационной...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2006. — 156 p. In this thesis, the use of multiple acoustic features of the speech signal is considered for speech recognition. The goals of this thesis are twofold: on the one hand, new acoustic features are developed, on the other hand, feature combination methods are investigated in order to find an effective...
PhD dissertation. — University of Cambridge, 2006. — 194 p. In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational telephone speech. Hence, it...
PhD dissertation. — University of Cambridge, 2006. — 176 The most extensively and successfully applied acoustic model for speech recognition is the Hidden Markov Model (HMM). In particular, a multivariate Gaussian Mixture Model (GMM) is typically used to represent the output density function of each HMM state. For reasons of efficiency, the covariance matrix associated with...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2006. — 157 p. In this work a number of novel techniques for improved treatment of spontaneous speech variabilities in large vocabulary automatic speech recognition are developed and evaluated on US English conversational speech and spontaneous medical dictations. Two main aspects of spontaneous speech modeling...
PhD dissertation. — Massachusetts Institute of Technology, 2006. — 176 p. We present a novel approach to speech processing based on the principle of pattern discovery. Our work represents a departure from traditional models of speech recognition, where the end goal is to classify speech into categories defined by a pre-specified inventory of lexical units (i.e. phones or...
PhD dissertation. — Kungliga Tekniska högskolan, Stockholm, 2006. — 350 p. Speaker verification is the biometric task of authenticating a claimed identity by means of analyzing a spoken sample of the claimant's voice. The present thesis deals with various topics related to automatic speaker verification (ASV) in the context of its commercial applications, characterized by...
PhD dissertation. — Technischer Universität München, 2006. — 132 p. This thesis presents a system for the interpretation of natural speech which serves as input module for a spoken dialog system. It carries out the task of extracting application-specific pieces of information from the user utterance in order to pass them to the control module of the dialog system. By following...
PhD dissertation. — Massachusetts Institute of Technology, 2006. — 127 p. In this thesis, we have focused on improving the acoustic modeling of speech recognition systems to increase the overall recognition performance. We formulate a novel multi-stream speech recognition framework using multi-tape finite-state transducers (FSTs). The multi-dimensional input labels of the...
PhD dissertation. — Johns Hopkins University, 2005. — 172 p. Automatic Speech Recognition (ASR) is a sequential pattern recognition problem in which the patterns to be hypothesized are words while the evidence presented to the recognizer is the acoustics of a spoken utterance. Given an acoustic signal, a speech recognizer attempts to classify it as the sequence of words that...
PhD dissertation. — Massachusetts Institute of Technology, 2005. — 123 p. Automatic speech recognition (ASR) is a process of applying constraints, as encoded in the computer system (the recognizer), to the speech signal until ambiguity is satisfactorily resolved to the extent that only one sequence of words is hypothesized. Such constraints fall naturally into two categories....
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2003. — 199 p. In this work, the application of across-word phoneme models during large vocabulary continuous speech recognition is studied. A recognition system will be developed which allows for the training of high performance across-word phoneme models, the efficient application of these across-word phoneme...
Диплом (Master), Massachusetts Institute of Technology, 2004. — 105 p. This thesis explores a novel approach to visual speech modeling. Visual speech, or a sequence of images of the speaker's face, is traditionally viewed as a single stream of contiguous units, each corresponding to a phonetic segment. These units are defined heuristically by mapping several visually similar...
PhD dissertation. — University of Cambridge , 2003. — 172 p. This thesis investigates the use of discriminative criteria for training HMM parameters for speech recognition, in particular the Maximum Mutual Information (MMI) criterion and a new criterion called Minimum Phone Error (MPE). Investigations are conducted into the practical issues relating to the use of MMI for speech...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2005. — 172 p. This thesis deals with linear transformations at various stages of the automatic speech recognition process. In current state-of-the-art speech recognition systems linear transformations are widely used to care for a potential mismatch of the training and testing data and thus enhance the...
PhD dissertation. — Carnegie Mellon University, 2004. — 101 p. Accurate recognition of spontaneous speech is one of the most difficult problems in speech recognition today. When speech is produced in a carefully planned manner, automatic speech recognition (ASR) systems are very successful at accurate recognition and transcription. In response to casual speech, ASR systems produce...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2003. — 158 p. In this work, normalization techniques in the acoustic feature space are studied which improve the robustness of automatic speech recognition systems. It is shown that there is a fundamental mismatch between training and test data which causes degraded recognition performance. Adaptation and...
PhD dissertation. — Faculté Polytechnique de Mons, 2004. — 143 p. Confidence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Confidence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system. Speech/speaker recognition systems are usually...
PhD dissertation. — University of California, 2003. — 178 p. This work concerns the automatic speech recognition (ASR) problem, which roughly speaking, consists in converting digitized speech into text. More specifically, we study front ends and acoustic modeling, which together with language modeling and search, constitute a typical ASR system. The main approach is to...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2004. — 170 p. This work describes an algorithm to increase the noise robustness of automatic speech recognition systems. In many practical applications recognition systems have to work in adverse acoustic environment conditions. Distortions and noises caused by the transmission are typical for telephone...
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2005. — 196 p. Standard hidden Markov model (HMM) based automatic speech recognition (ASR) systems usually use cepstral features as acoustic observation and phonemes as subword units. Speech signal exhibits wide range of variability such as, due to environmental variation, speaker variation. This leads to different...
Диплом (Master), Middle East Technical University, 2003. — 115 p. This study aims to build a new language model that can be used in a Turkish large vocabulary continuous speech recognition system. Turkish is a very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable....
PhD dissertation. — Swiss Federal Institute of Technology Lausanne, 2005. — 123 p. The goal of the thesis is to investigate different approaches that combine and integrate Automatic Speech Recognition (ASR) and Speaker Recognition (SR) systems, with applications to (1) User- Customized Password Speaker Verification (UCP-SV) systems, and, (2) joint speech and speaker...
Диплом (Master), Mississippi State University, 2003. — 80 p. Spoken language processing is one of the oldest and most natural modes of information exchange between humans beings. For centuries, people have tried to develop machines that can understand and produce speech the way humans do so naturally. The biggest problem in our inability to model speech with computer programs...
PhD dissertation. — Carnegie Mellon University, 1995. — 190 p. This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their...
PhD dissertation. — Massachusetts Institute of Technology, 2005. — 140 p. Spoken language, especially conversational speech, is characterized by great variability in word pronunciation, including many variants that differ grossly from dictionary prototypes. This is one factor in the poor performance of automatic speech recognizers on conversational speech. One approach to...
PhD dissertation. — University of Cambridge, 2005. — 158 p. Selecting the optimal model structure with the .appropriate. complexity is a standard problem for training large vocabulary continuous speech recognition (LVCSR) systems, and machine learning in general. State-of-the-art LVCSR systems are highly complex. A wide variety of techniques may be used which alter the system...
PhD dissertation. — Purdue University, 2004. — 253 p. Although speech recognition technology has significantly improved during the past few decades, current speech recognition systems output only a stream of words without providing other useful structural information that could aid a human reader and downstream language processing modules. This thesis research focuses on the...
PhD dissertation. — Mississippi State University, 2002. — 200 p. Hidden Markov models (HMM) with Gaussian mixture observation densities are the dominant approach in speech recognition. These systems typically use a representational model for acoustic modeling which can often be prone to overfitting and does not translate to improved discrimination. We propose a new paradigm...
PhD dissertation. — Johns Hopkins University, 2000. — 117 p. This thesis explores new ways of utilizing the information existing in word lattices produced by speech recognition systems to improve the accuracy of the recognition output and obtain a more perspicuous representation of a set of alternative hypotheses. We change the standard problem formulation of searching among a...
Диплом (Master), Massachusetts Institute of Technology, 2002. — 79 p. This thesis is concerned with improving the performance of speaker recognition systems in three areas: speaker modeling, verification score computation, and feature extraction in telephone quality speech. We first seek to improve upon traditional modeling approaches for speaker recognition, which are based on...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2002. — 191 p. In this thesis, the use of word posterior probabilities for large vocabulary continuous speech recognition is investigated in a unified, statistical framework. The word posterior probabilities are directly derived from the sentence posterior probabilities which are an essential part of Bayes’...
PhD dissertation. — Johns Hopkins University, 2002. — 174 p. In this thesis, we have studied how to use non-local dependencies to improve the performance of language models and how to combine useful information obtained from difference sources together in one framework using maximum entropy app roaches. We have presented fast training methods to solve the problem of heavy...
PhD dissertation. — Katholieke Universiteit Nijmegen, 2002. — 149 p. Speech is variable. The way in which a sound, word or sequence of words is pronounced can be different every time it is produced (Strik and Cucchiarini 1999). This pronunciation variation can be the result of: Intra-speaker variability: the variation in pronunciation for one and the same speaker. Inter-speaker...
PhD dissertation. — Mississippi State University, 2002. — 85 p The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement the Gaussian emission probabilities under the belief that...
PhD dissertation. — Massachusetts Institute of Technology, 2002. — 178 p. Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface,...
PhD dissertation. — The University of British Columbia, 2002. — 119 p. Modern speech synthesizers use concatenated words and sub-word segments, such as diphones, to synthesize natural speech. Synthesizers available today can synthesize speech with only a limited selection of voices provided by the vendors. The voice segments (e.g. words & diphones) are often created using...
PhD dissertation. — Massachusetts Institute of Technology, 2001. — 191 p. The general goal of this thesis is to model the prosodic aspects of speech to improve human-computer dialogue systems. Towards this goal, we investigate a variety of ways of utilizing prosodic information to enhance speech recognition and understanding performance, and address some issues and difficulties...
PhD dissertation. — Massachusetts Institute of Technology, 2002. — 153 p. This thesis concerns the problem of unknown or out-of-vocabulary (OOV) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer...
Диплом (Master), Massachusetts Institute of Technology, 2000. — 65 p. The thesis discusses the development and evaluation of a word spotting understanding system within a spoken language system. A word spotting understanding server was implemented within the GALAXY [4] architecture using the JUPITER [3] weather information domain. Word spotting was implemented using a simple...
PhD dissertation. — Katholieke Universiteit Leuven, 2001. — 197 p. The task of a speech recogniser is to transcribe human speech into text. To do so, modern recognisers rely firmly on the principles of statistical pattern recognition. This statistical framework allows the problem of speech recognition to be decomposed into a set of well-defined sub-tasks, namely the extraction...
PhD dissertation. — University of Cambridge, 2001. — 136 p. Most modern automatic speech recognition systems make use of acoustic models based on hidden Markov models. To obtain reasonable recognition performance within a large vocabulary framework, the acoustic models usually include a pronunciation model, together with complex parameter tying schemes. In many cases the...
PhD dissertation. — Mississippi State University, 2000. — 49 p. Progress on speech recognition technology has been impressive. There are now commercial products that allow automatic dictation, telephone voice interfaces, and voice activated appliances. It has been ovTwo to three paragraphs explaining the state of speech research today melding into how the important role that...
PhD dissertation. — Purdue University, 2000. — 223 p. Some of the major research issues in the field of speech recognition revolve around methods of incorporating additional knowledge sources, beyond the short-time spectral information of the speech signal, into the recognition process. These knowledge sources, which may include information about prosody, language structure,...
Диплом (Master), Temple University, 2001. — 41 p. Co-channel speech occurs when one speaker’s speech is corrupted by another speaker’s speech. Speech recognition systems, speaker identification systems, speech coding systems, gisting and natural language processing systems work on the basis that there is only one speaker’s speech. If there is more than one speaker (co-channel...
PhD dissertation. — University of Cambridge, 2001. — 134 p. Systems which automatically transcribe carefully dictated speech are now commercially available, but their performance degrades dramatically when the speaking style of users becomes more relaxed or conversational. This dissertation focuses on techniques that aim to improve the robustness of statistical speech...
PhD dissertation. — Johns Hopkins University, 2000. — 143 p. Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. For this reason, pronunciation modeling has received considerable attention in recent automatic speech recognition literature. Most of the attention...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2000. — 155 p. In this work, a framework for efficient discriminative training and modeling is developed and implemented for both small and large vocabulary continuous speech recognition. Special attention will be directed to the comparison and formalization of varying discriminative training criteria and...
PhD dissertation. — University of Cambridge, 2000. — 141 p. This dissertation concerns the development of statistical language models for use in automatic speech recognition systems. Natural language, which is a complex and variable phenomenon, has been shown to be modelled best using statistical language models. Large training corpora (comprising around one hundred million...
PhD dissertation. — Massachusetts Institute of Technology, 1998. — 181 p. This dissertation addresses the independence of observations assumption which is typically made by today's automatic speech recognition systems. This assumption ignores within-speaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker...
PhD dissertation. — Mississippi State University, 1999. — 95 p. The ability to correctly pronounce names of entities, such as people, places and organizations, is a critical component of effective verbal communication. In many situations, such as looking up information on a person or a place (e.g. airline reservations, directory assistance etc.), it is customary to alternate...
PhD dissertation. — Boston University, 1997. — 186 p. The goal of this dissertation is to develop effective strategies for the adaptation of acoustic parameters for a large vocabulary continuous speech recognition (LVCSR) system from a small amount of speech. Typically this implies adapting a system characterized by millions of parameters from a few minutes of speech. This is...
PhD dissertation. — McGill University, 1991. — 169 p. Hidden Markov Models (HMMs) are one of the most powerful speech recognition tools available today. Even so, the inadequacies of HMMs as a "correct" modeling framework for speech are well known. In that context, we argue that the maximum mutual information estimation (MMIE) formulation for training is more appropriate...
PhD dissertation. — Carnegie Mellon University, 1994. — 114 p. Language modeling is the attempt to characterize, capture and exploit regularities in natural language. In statistical language modeling, large amounts of text are used to automatically determine the model’s parameters. Language modeling is useful in automatic speech recognition, machine translation, and any other...
PhD dissertation. — University of Cambridge, 1995. — 175 p. Hidden Markov models (HMMs) have been used successfully for speech recognition for many years. However, in some respects the assumptions behind HMM models are poor. HMMs model only the within-class data and no attempt is made at discriminating between classes. This is a problem, especially in speaker independent...
PhD dissertation. — Carnegie Mellon University, 1996. — 113 p. Speech recognition systems suffer from degradation in recognition accuracy when faced with input from noisy and reverberant environments. While most users prefer a microphone that is placed in the middle of a conference table, on top of a computer monitor, or mounted in a wall, the recognition accuracy obtained with...
PhD dissertation. — University of Cambridge, 1995. — 170 p. Conventional speech recognition systems require information from two knowledge sources - a family of acoustic models and a language model. The acoustic models incorporate knowledge extracted from the speech waveform and they are commonly based on hidden Markov models (HMMs). HMMs have been used successfully for speech...
PhD dissertation. — Universität Oldenburg, 2003. — 205 p. Why are even the most advanced computers not able to understand speech nearly half as well as human beings? Even though the rapidly growing performance of microprocessors has enabled speech technology to exhibit major, revolutionary advancements within the last decades, we still are not able to communicate with a...
PhD dissertation. — University of Cambridge, 1998. — 97 p. Most modern speech recognition systems are based on hidden Markov models. Yet despite their widespread use many of their properties are not well understood. This work aims to increase our understanding about the training of hidden Markov models for classification. We first examine the question of what is the best...
Диссертация (Master), Carnegie Mellon University, 1995. — 40 p. This report describes a series of experiments that measure speech rate and that attempt to improve speech recognition accuracy for rapidly-spoken speech. Descriptions of several measures of speech rate are presented, with their advantages and disadvantages. Speech recognition results obtained using several...
PhD dissertation. — Cambridge University, 1996. — 199 p. The past 15 years have seen dramatic improvements in the performance of computer algorithms which attempt to recognise human speech. The falling error rates achieved by the best speech recognition systems on limited tasks have recently enabled the development of a diverse range of applications which promise to have a sign...
PhD dissertation. — Massachusetts Institute of Technology, 1995. — 173 p. This thesis is directed toward the characterization of the problem of new out-of-vocabulary words for continuous-speech recognition and understanding. It is motivated by the belief that this problem is critical to the eventual deployment of the technology and that a thorough understanding of the problem...
PhD dissertation. — Daimler-Benz AG, 1998. — 181 p. This thesis deals with the problem of Out-Of-Vocabulary words in speech recognition. The standard response of speech recognition systems whenever they encounter such OOV words is to (silently) misrecognize them without issuing any warning to the user. In order to avoid this undesired behaviour, two different strategies are...
PhD dissertation. — Cambridge University, 1996. — 202 p. This dissertation investigates some aspects of speech processing using linear models and single hidden layer neural networks. The study is divided into two parts which focus on speech modelling and speech classification respectively. The first part of the dissertation examines linear and nonlinear vocal tract models for...
PhD dissertation. — University of Cambridge, 1995. — 146 p. In recent years considerable progress has been made in the field of continuous speech recognition where the predominant technology is based on hidden Markov models (HMMs). HMMs represent sequences of time varying speech spectra using probabilistic functions of an underlying Markov chain. However because the probability...
PhD dissertation. — University of Cambridge, 1995. — 132 p. This thesis details the development of a model-based noise compensation technique, Parallel Model Combination (PMC). The aim of PMC is to alter the parameters of a set of Hidden Markov Model (HMM) based acoustic models, so that they reflect speech spoken in a new acoustic environment. Differences in the acoustic...
PhD dissertation. — Université de Neuchâtel, 1998. — 228 p. Several speech processing applications such as digital hearing aids and personal communications devices are characterized by very tight requirements in power consumption, size, and voltage supply. These requirements are difficult to fulfill, given the complexity and number of functions to be implemented, together with...
Автореферат диссертации.
Специальности:
05.13.01 – Системный анализ, управление и обработка информации
(в науке и технике)
05.11.16 – Информационно-измерительные и управляющие системы
(в промышленности и медицине)
Работа выполнена в Государственном образовательном учреждении высшего
профессионального образования «Ижевский государственный технический
университет» (ГОУ ВПО...
Комментарии