[Colloq] Colloq. Talk: Data-Driven Genomic Computing: Making Sense of the Signals from the Genome & Extraction of Evolving Knowledge from Social Media

Tue Jan 15 14:28:38 EST 2019

Date: January 28, 2019
Time: Stefano Ceri - 12:00 - 12:30pm & Marco Brambilla - 12:30 - 1:00pm
Location: 366 West Village H

Talk #1 - Stefano Ceri

Title: Data-Driven Genomic Computing: Making Sense of the Signals from the Genome

Abstract
Genomic computing is a new science focused on understanding the functioning of the genome, as a premise to fundamental discoveries in biology and medicine. Next Generation Sequencing (NGS) allows the production of the entire human genome sequence at a cost of about 1000 US $; many algorithms exist for the extraction of genome features, or "signals", including peaks (enriched regions), variants, or gene expression (intensity of transcription activity). The missing gap is a system supporting data integration and exploration, giving a "biological meaning" to all the available information; such a system can be used, e.g., for better understanding cancer or how environment influences cancer development.

The GeCo Project (Data-Driven Genomic Computing, ERC Advanced Grant, 2016-2021) has the objective or revisiting genomic computing through the lens of basic data management, through models, languages, and instruments, focusing on genomic data integration. Starting from an abstract model, we developed a system that can be used to query processed data produced by several large Genomic Consortia, including Encode and TCGA; the system employs internally the Spark engine, and prototypes can already be accessed from Polimi, from Cineca (Italian supercomputing center) and from the Broad Institute in Cambridge. During the five-years of the ERC project, the system will be enriched with data analysis tools and environments and will be made increasingly efficient. Among the objectives of the project, the creation of an "open source" repository of public data, available to biological and clinical research through queries, web services and search interfaces.

Biography
Stefano Ceri is professor of Database Systems at the Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.elet.polimi.it%2F&data=02%7C01%7CN.Bekerian%40northeastern.edu%7C58d0c2855b394351d85408d67b082c03%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636831672368390151&sdata=uwO2FdFQ5br9WaWGOXsA%2ByqwPMvpjA9LlrazvriZHbA%3D&reserved=0> of Politecnico di Milano<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.polimi.it%2F&data=02%7C01%7CN.Bekerian%40northeastern.edu%7C58d0c2855b394351d85408d67b082c03%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636831672368390151&sdata=14kxBrjmMOYUiJlfuf9NkIJ4AlQk3nryTEPgPYtzXko%3D&reserved=0>. His research work covers four decades (1978-2018) and has been generally concerned with extending database technologies in order to incorporate new features: distribution, object-orientation, rules, streaming data; with the advent of the Web, his research has been targeted towards the engineering of Web-based applications and to search systems. More recently he turned to genomic computing. He authored over 350 publications (H-index 75) and authored or edited 15 books in English. He is the recipient of two ERC Advanced Grants: "Search Computing (SeCo)" (2008-2013), focused upon the rank-aware integration of search engines in order to support multi-domain queries and "Data-Centered Genomic Computing (GeCo)" (2016-2021), focused upon new abstractions for querying and integrating genomic datasets. He is the recipient of the ACM-SIGMOD "Edward T. Codd Innovation Award"<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sigmod.org%2Fsigmod-awards%2Faward-people%2F2013-innovations-Stefano-Ceri&data=02%7C01%7CN.Bekerian%40northeastern.edu%7C58d0c2855b394351d85408d67b082c03%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636831672368400160&sdata=YWND7it1eDsE4Pyax%2Ffegok7TWEUopd94j08h3y8SrE%3D&reserved=0> (New York, June 26, 2013), an ACM Fellow and a member of Academia Europaea.

Talk #2 - Marco Brambilla

Title: Extraction of Evolving Knowledge from Social Media

Abstract
Knowledge in the world continuously evolves. Ontologies that aim at formalizing this knowledge are largely incomplete, especially regarding data belonging to the so-called long tail. On the other side, informal sources such has social media are typically very up to date with respect to facts, events and relations between real-world entities.

We propose a method for discovering emerging knowledge by extracting it from social content. Once initialized by domain experts, the method is capable of finding relevant entities by means of a mixed syntactic-semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors built by using terms occurring in their social content and ranks the candidates by using their distance from the centroid of seeds, returning the top candidates. Our method can run iteratively, using the results as new seeds. The talk will describe the different extraction techniques, the advantages obtained by combining them, and the results of the experiments performed with the different methods.

Biography
Marco Brambilla<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmarco-brambilla.com%2F&data=02%7C01%7CN.Bekerian%40northeastern.edu%7C58d0c2855b394351d85408d67b082c03%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636831672368360132&sdata=%2FLPRLWAG%2FtbiY9%2BbMOCq%2BjXyByxZAVo1ew1Bzjj7A4w%3D&reserved=0> is associate professor at Politecnico di Milano. His research interests include data science, domain specific modeling languages and design patterns, crowdsourcing, social media monitoring, and big data analysis. He has been visiting researcher at CISCO, San Josè, and University of California, San Diego. He has been visiting professor at Dauphine University, Paris.

He is co-founder of the startups Fluxedo, focusing on social media analysis and Social engagement, and WebRatio, devoted to software modeling tools for Web, Mobile and Business Process based software applications. He is author of various international books and research articles in journals and conferences, with over 200 papers. He was awarded various best paper prizes and gave keynotes and speeches at many conferences and organizations. He runs research projects on data science and industrial projects on data-driven innovation and big data. He is the main author of the OMG standard IFML.

Related Papers
Extracting Emerging Knowledge from Social Media. WWW 2017 https://dl.acm.org/citation.cfm?id=3052697<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdl.acm.org%2Fcitation.cfm%3Fid%3D3052697&data=02%7C01%7CN.Bekerian%40northeastern.edu%7C58d0c2855b394351d85408d67b082c03%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636831672368370141&sdata=r9s6RMsd76%2BDZViOVxMnbyWqJvhcTjEqtX71G5jwu2w%3D&reserved=0>
Iterative Knowledge Extraction from Social Networks. WWW Comp. 2018 https://dl.acm.org/citation.cfm?id=3191578<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdl.acm.org%2Fcitation.cfm%3Fid%3D3191578&data=02%7C01%7CN.Bekerian%40northeastern.edu%7C58d0c2855b394351d85408d67b082c03%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C636831672368370141&sdata=oUoXp6lxnWpE807q5jOEfEIVYFT6F9Or5W%2B2uZFypdk%3D&reserved=0>

ccis-dean at northeastern.edu<mailto:ccis-dean at northeastern.edu>
Jan Belmonte/Laura Schumann
Executive Assistants to Dean Carla Brodley
Khoury College of Computer and Information Sciences

Northeastern University
West Village H, Suite 202
Office Tel:  617.373.5204
Jan's cell:  339.927.1649
Laura's cell:  407.619.2974