[Colloq] Wenxu Tong Thesis Defense Talk 4/1

Patricia Freeman tricia at ccs.neu.edu
Wed Mar 26 14:51:03 EDT 2008


Thesis defense information:
>
> My name:  Wenxu Tong
> Title: SVM and a Novel POOL Method Coupled with THEMATICS for Protein
> Active Site Prediction
>
> Time:  April 1st, 2:00 pm
> Place: 366 WVH
> Advisor: Ron Williams
> Committee: Ron Williams, Mary Jo Ondrechen, Jay Aslam, Bob Futrelle
> and David Budil
>
>
> Abstract:
> Protein active site prediction is a very important problem in
> bioinformatics. THEMATICS is a simple and effective method based on
> the special electrostatic properties of ionizable residues to predict
> such sites from protein three-dimensional structure alone.  The
> process involves distinguishing computed titration curves with
> perturbed shape from normal ones; the differences are subtle in many
> cases.  In this dissertation, I develop and apply special machine
> learning techniques to automate the process and achieve higher
> sensitivity than results from other methods while maintaining high
> specificity.  I first present application of support vector machines
> (SVM) to automate the active site prediction using THEMATICS; at the
> time this work was developed, it achieved better performance than any
> other 3D structure based methods. I then present the more recently
> developed Partial Order Optimal Likelihood (POOL) method, which
> estimates the probabilities of residues being active under certain
> natural monotonicity assumptions. The dissertation shows that applying
> the POOL method just on THEMATICS features outperforms the SVM
> results. Furthermore, since the overall approach is based on
> estimating certain probabilities from labeled training data, it
> provides a principled way to combine the use of THEMATICS features
> with other non-electrostatic features proposed by others. In
> particular, I consider the use of geometric features as well, and the
> resulting classifiers are the best structure-only predictors yet
> found. Finally, I show that adding in sequence-based conservation
> scores where applicable yields a method that outperforms all existing
> method while using only whatever combination of structure-based or
> sequence-based features is available.
>



More information about the Colloq mailing list