題目: EM meets Boosting inbig genomic data analysis
主講人:楊燦教授 香港浸會(huì)大學(xué)統(tǒng)計(jì)系
時(shí)間:12月27號(hào)(周二),上午10:30-11:20
地點(diǎn):bwin必贏唯一官網(wǎng)313會(huì)議室
歡迎廣大師生參加!
報(bào)告內(nèi)容
Recent internationalprojects, such as the Encyclopedia of DNA Elements (ENCODE) project, theRoadmap project and the Genotype-Tissue Expression (GTEx) project, havegenerated vast amounts of genomic annotation data, e.g., epigenome andtranscriptome. There is great demanding of effective statistical approaches tointegrate genomic annotations with the results from genome-wide associationstudies. In this talk, we introduce a statistical framework, named IMAC, forintegratingmultipleannotationstocharacterizefunctional roles of genetic variants that underlie human complex phenotypes.For a given phenotype, IMAC can adaptively incorporates relevant annotations forprioritization of genetic risk variants, allowing nonlinear effects among theseannotations, such as interaction effects between genomic features.Specifically, we assume that the prior probability of a variant associated withthe phenotype is a function of its annotations F(X), where X is thecollection of the annotation status and F(X)is an ensemble of decision trees, i.e., F(X)= \sum_kf_k(X) and f_k(X) is a shallow decision tree. We havedeveloped an efficient EM-Boosting algorithm for model fitting, where a shallowdecision tree grows in a gradient-Boosting manner (Friedman J. 2001) at eachEM-iteration. Our framework inherits the nice property of gradient boostedtrees: (1) The gradient accent property of the Boosting algorithm naturallyguarantees the convergence of our EM-Boosting algorithm. (2) Based on thefitted ensemble \hat{F}(X), we areable to rank the importance of annotations, measure the interaction amongannotations and visualize the model via partial plots (Friedman J. 2005). UsingIMAC, we performed integrative analysis of genome-wide association studies onhuman complex phenotypes and genome-wide annotation resources, e.g., Roadmapepigenome. The analysis results revealed interesting regulatory patterns ofrisk variants. These findings deepen our understanding of genetic architecturesof complex phenotypes. Thestatistical framework developed here is also broadly applicable to many otherareas for integrative analysis of rich data sets.
個(gè)人簡(jiǎn)介
楊燦教授于2011年畢業(yè)于香港科技大學(xué)電子信息工程系,獲得博士學(xué)位。2011-2012耶魯大學(xué)做博士后研究。2012-2014年在耶魯大學(xué)做associate researchscientist。2014年起,其進(jìn)入香港浸會(huì)大學(xué)數(shù)學(xué)系做助理教授。2012年他獲得了the winner of the 2012Hong Kong Young Scientist稱號(hào)。其研究興趣主要集中在statisticalgenomics, bioinformatics, pattern recognition and machine learning.
信息管理與電子商務(wù)系
2016.12.23