[Colloq] Thesis Defense 4/1
Patricia Freeman
tricia at ccs.neu.edu
Mon Mar 31 13:02:07 EDT 2008
Wenxu Tong
> Title: SVM and a Novel POOL Method Coupled with THEMATICS for Protein
> Active Site Prediction
>
> Time: April 1st, 2:00 pm
> Place: 366 WVH
> Advisor: Ron Williams
> Committee: Ron Williams, Mary Jo Ondrechen, Jay Aslam, Bob Futrelle
> and David Budil
>
>
> Abstract:
> Protein active site prediction is a very important problem in
> bioinformatics. THEMATICS is a simple and effective method based on
> the special electrostatic properties of ionizable residues to predict
> such sites from protein three-dimensional structure alone. The
> process involves distinguishing computed titration curves with
> perturbed shape from normal ones; the differences are subtle in many
> cases. In this dissertation, I develop and apply special machine
> learning techniques to automate the process and achieve higher
> sensitivity than results from other methods while maintaining high
> specificity. I first present application of support vector machines
> (SVM) to automate the active site prediction using THEMATICS; at the
> time this work was developed, it achieved better performance than any
> other 3D structure based methods. I then present the more recently
> developed Partial Order Optimal Likelihood (POOL) method, which
> estimates the probabilities of residues being active under certain
> natural monotonicity assumptions. The dissertation shows that applying
> the POOL method just on THEMATICS features outperforms the SVM
> results. Furthermore, since the overall approach is based on
> estimating certain probabilities from labeled training data, it
> provides a principled way to combine the use of THEMATICS features
> with other non-electrostatic features proposed by others. In
> particular, I consider the use of geometric features as well, and the
> resulting classifiers are the best structure-only predictors yet
> found. Finally, I show that adding in sequence-based conservation
> scores where applicable yields a method that outperforms all existing
> method while using only whatever combination of structure-based or
> sequence-based features is available.
>
More information about the Colloq
mailing list