[Colloq] PhD Thesis Proposal , Monday, Aug. 7, 3pm.

Rachel Kalweit rachelb at ccs.neu.edu
Thu Aug 3 15:46:03 EDT 2006


College of Computer and Information Science
presents
PhD Thesis Proposal Presentation
Bing Zhang
will speak on
Discriminative Feature Optimization for Speech Recognition

Monday, August 7, 2006
3:00pm
366 West Village H

Abstract

Feature extraction, whose goal is to obtain a compact and discriminative 
representation of speech data, is an important step in acoustic modeling 
of speech recognition systems. The extraction usually occurs at two 
levels. At the lower level, signal processing methods are used to 
transform raw speech signals to cepstral coefficients. Then at the 
higher level, various feature transformations can be employed to select 
features suitable for the specific application.

In traditional feature transformation techniques, the optimization 
criteria are usually not closely related to recognition errors, hence 
the derived feature transforms are suboptimal in terms of improving the 
accuracy of the whole system.

To solve this problem, a discriminative feature optimization method is 
proposed, based on the Minimum Phoneme Error (MPE) criterion, which has 
been shown to be very well correlated with word error rate (WER). At the 
software infrastructure level, the method is implemented in terms of a 
generic feature transformation framework. Under this framework, various 
feature transforms can be trained uniformly through a back-propagation 
algorithm.

Special attention is devoted to the concept of region-dependent 
transform (RDT). The central idea behind it is to divide the acoustic 
space into multiple regions, and to use different transform functions 
for different regions. This effectively produces a powerful piece-wise 
linear transformation that can be estimated more efficiently compared to 
general non-linear transforms (e.g. neural networks).

In a state-of-the-art speech recognition system, multi-pass decoding and 
multi-model adaptation are commonly used for maximizing accuracy within 
reasonable recognition run-times. In this context, practical issues have 
to be taken into consideration when the feature optimization method is 
developed. These issues include, for instance, the generalization 
problem of the feature transform in different acoustic models, and the 
problem of integrating discriminatively trained feature transforms with 
maximum likelihood based speaker adaptation. Experimental approaches 
will be developed in order to address these issues.

Advisor: Gene Cooperman



More information about the Colloq mailing list