[Colloq] PhD Thesis Proposal , Monday, Aug. 7, 3pm.
Rachel Kalweit
rachelb at ccs.neu.edu
Thu Aug 3 15:46:03 EDT 2006
College of Computer and Information Science
presents
PhD Thesis Proposal Presentation
Bing Zhang
will speak on
Discriminative Feature Optimization for Speech Recognition
Monday, August 7, 2006
3:00pm
366 West Village H
Abstract
Feature extraction, whose goal is to obtain a compact and discriminative
representation of speech data, is an important step in acoustic modeling
of speech recognition systems. The extraction usually occurs at two
levels. At the lower level, signal processing methods are used to
transform raw speech signals to cepstral coefficients. Then at the
higher level, various feature transformations can be employed to select
features suitable for the specific application.
In traditional feature transformation techniques, the optimization
criteria are usually not closely related to recognition errors, hence
the derived feature transforms are suboptimal in terms of improving the
accuracy of the whole system.
To solve this problem, a discriminative feature optimization method is
proposed, based on the Minimum Phoneme Error (MPE) criterion, which has
been shown to be very well correlated with word error rate (WER). At the
software infrastructure level, the method is implemented in terms of a
generic feature transformation framework. Under this framework, various
feature transforms can be trained uniformly through a back-propagation
algorithm.
Special attention is devoted to the concept of region-dependent
transform (RDT). The central idea behind it is to divide the acoustic
space into multiple regions, and to use different transform functions
for different regions. This effectively produces a powerful piece-wise
linear transformation that can be estimated more efficiently compared to
general non-linear transforms (e.g. neural networks).
In a state-of-the-art speech recognition system, multi-pass decoding and
multi-model adaptation are commonly used for maximizing accuracy within
reasonable recognition run-times. In this context, practical issues have
to be taken into consideration when the feature optimization method is
developed. These issues include, for instance, the generalization
problem of the feature transform in different acoustic models, and the
problem of integrating discriminatively trained feature transforms with
maximum likelihood based speaker adaptation. Experimental approaches
will be developed in order to address these issues.
Advisor: Gene Cooperman
More information about the Colloq
mailing list