Personal Genomics & Data Science
In this seminar, I will discuss issues in transcriptome analysis. I
will first talk about some core aspects - how we analyze the activity
patterns of genes in human disease. In particular, I will focus on
disorders of the brain, which affect nearly a fifth of the world's
population. Robust phenotype-genotype associations have been
established for a number of brain disorders including psychiatric
diseases (e.g., schizophrenia, bipolar disorder). However,
understanding the molecular causes of brain disorders is still a
challenge. To address this, the PsychENCODE consortium generated
thousands of transcriptome (bulk and single-cell) datasets from 1,866
individuals. Using these data, we have developed a set of
interpretable machine learning approaches for deciphering functional
genomic elements and linkages in the brain and psychiatric disorders.
In particular, we deconvolved the bulk tissue expression across
individuals using single-cell data via non-negative matrix
factorization and found that
differences in the proportions of cell types explain >85% of the
cross-population variation. Additionally, we developed an
interpretable deep-learning model embedding the physical regulatory
network to predict phenotype from genotype. Our model uses a
conditional Deep Boltzmann Machine architecture and introduces lateral
connectivity at the visible layer to embed the biological structure
learned from the regulatory network and QTL linkages. Our model
improves disease prediction (by 6-fold compared to additive polygenic
risk scores), highlights key genes for disorders, and allows
imputation of missing transcriptome information from genotype data
alone. In the second half of the talk, I will look at the "data
exhaust" from transcriptome analysis - that is, how one can find
additional things from this data than what is necessarily intended.
First, I will focus on genomic privacy: How looking at the
quantifications of expression levels can potentially reveal something
about the subjects studied, and how one can take steps to protect
patient anonymity. Next, I will look at how the social activity of
researchers generating transcriptome datasets in itself creates
revealing patterns in the scientific literature.
==
i0lon19-mrc