[Colloq] CS Colloquium: 10:40 AM, 9/13 in WVH 366

Ravi Sundaram koods at ccs.neu.edu
Wed Sep 5 11:55:32 EDT 2018



Speaker: Prof Anshumali Shrivastava, Rice University

            10:40 AM Thu 9/13/2018 in WVH 366

Title: Hashing Algorithms for Extreme Scale Machine Learning.

Abstract:  In this talk, I will discuss some of my recent and surprising findings on the use of hashing algorithms for large-scale estimations. Locality Sensitive Hashing (LSH) is a hugely popular algorithm for sub-linear near neighbor search. However, it turns out that fundamentally LSH is a constant time (amortized) adaptive sampler from which efficient near-neighbor search is one of the many possibilities. Our observation adds another feather in the cap for LSH. LSH offers a unique capability to do smart sampling and statistical estimations at the cost of few hash lookups. Our observation bridges data structures (probabilistic hash tables) with efficient unbiased statistical estimations. I will demonstrate how this dynamic and efficient sampling beak the computational barriers in adaptive estimations where, for the first time, it is possible that we pay roughly the cost of uniform sampling but get the benefits of adaptive sampling. We will demonstrate the power of one simple idea for three favorite problems 1) Partition function estimation for large NLP models such as word2vec, 2) Adaptive Gradient Estimations for efficient SGD and 3) Sub-Linear Deep Learning with Huge Parameter Space. 

In the end, if time permits, we will switch to memory cost show a simple hashing algorithm that can shrink memory requirements associated with classification problems exponentially! Using our algorithms, we can train 100,000 classes with 400,000 features, on a single Titan X while only needing 5% or less memory required to store all the weights. Running a simple logistic regression on this data, the model size of 320GB is unavoidable.

Bio: Anshumali Shrivastava is an assistant professor in the computer science department at Rice University.  His broad research interests include randomized algorithms for large-scale machine learning.  He is a recipient of National Science Foundation (NSF) CAREER Award, a Young Investigator Award from Air Force Office of Scientific Research (AFOSR), and machine learning research award from Amazon. His research on hashing inner products has won Best Paper Award at NIPS 2014 while his work on representing graphs got the Best Paper Award at IEEE/ACM ASONAM 2014. Anshumali got his PhD in 2015 from Cornell University. 



More information about the Colloq mailing list