Friday, May 31, 2019

Abstract for talk at the MRC LMB Cambridge

Personal Genomics & Data Science 

In this seminar, I will discuss issues in transcriptome analysis. I 
will first talk about some core aspects - how we analyze the activity 
patterns of genes in human disease. In particular, I will focus on 
disorders of the brain, which affect nearly a fifth of the world's 
population. Robust phenotype-genotype associations have been 
established for a number of brain disorders including psychiatric 
diseases (e.g., schizophrenia, bipolar disorder). However, 
understanding the molecular causes of brain disorders is still a 
challenge. To address this, the PsychENCODE consortium generated 
thousands of transcriptome (bulk and single-cell) datasets from 1,866 
individuals. Using these data, we have developed a set of 
interpretable machine learning approaches for deciphering functional 
genomic elements and linkages in the brain and psychiatric disorders. 
In particular, we deconvolved the bulk tissue expression across 
individuals using single-cell data via non-negative matrix 
factorization and found that 
differences in the proportions of cell types explain >85% of the 
cross-population variation. Additionally, we developed an 
interpretable deep-learning model embedding the physical regulatory 
network to predict phenotype from genotype. Our model uses a 
conditional Deep Boltzmann Machine architecture and introduces lateral 
connectivity at the visible layer to embed the biological structure 
learned from the regulatory network and QTL linkages. Our model 
improves disease prediction (by 6-fold compared to additive polygenic 
risk scores), highlights key genes for disorders, and allows 
imputation of missing transcriptome information from genotype data 
alone. In the second half of the talk, I will look at the "data 
exhaust" from transcriptome analysis - that is, how one can find 
additional things from this data than what is necessarily intended. 
First, I will focus on genomic privacy: How looking at the 
quantifications of expression levels can potentially reveal something 
about the subjects studied, and how one can take steps to protect 
patient anonymity. Next, I will look at how the social activity of 
researchers generating transcriptome datasets in itself creates 
revealing patterns in the scientific literature. 



No comments: