ABSTRACT
In my talk, I will discuss interpretable machine learning models for
predicting the impact of genomic variants. These models focus on a
variety of different types of variants, from those in protein-coding
regions to those in non-coding regions, from those associated with
particular diseases, such as cancer or schizophrenia, to those having
a high impact in general, and from those involving single nucleotides
to those that are larger (structural variants). In particular, I will
describe physically based models for predicting cancer driver events,
simple statistical models for finding cancer non-coding drivers, and
interpretable deep learning models for mental disease. For the deep
learning models, I will show how the model's architecture relates to
the overall linear process of splicing or the comprehensive cellular
regulatory network. Finally, I will also highlight a general machine
learning approach for assessing the impact of structural variants.
Tuesday, October 5, 2021
Monday, June 28, 2021
Tuesday, June 15, 2021
Fwd: Next week seminar.
Title:
Approaches to Genomic Privacy: Leakage Measurement, Data Sanitization
& Blockchain Storage
Abstract:
Functional genomics experiments on human subjects present a privacy
conundrum. On the one hand, many of the conclusions we infer from
these experiments are not tied to the identity of individuals but
represent universal statements about biology and disease. On the other
hand, the raw sequencing reads or the phenotypic information inferred
from these experiments can leak information about patients' variants,
presenting privacy challenges in terms of data sharing. There is a
great desire to share data as broadly as possible. Therefore,
measuring the amount of variant information leaked in various
experiments is a key first step in protecting private information. We
propose metrics to quantify private information leakage in functional
genomics data, linking attacks to validate the proposed metrics and
file formats that maximize the potential for data sharing while
protecting individuals' private information. Blockchain potentially
provides a way of storing genomic information and access to it
securely. We show how this can be done efficiently using various index
structures and how blockchain can be combined with our file formats
for sharing functional genomic information.
==
i0cdc
Approaches to Genomic Privacy: Leakage Measurement, Data Sanitization
& Blockchain Storage
Abstract:
Functional genomics experiments on human subjects present a privacy
conundrum. On the one hand, many of the conclusions we infer from
these experiments are not tied to the identity of individuals but
represent universal statements about biology and disease. On the other
hand, the raw sequencing reads or the phenotypic information inferred
from these experiments can leak information about patients' variants,
presenting privacy challenges in terms of data sharing. There is a
great desire to share data as broadly as possible. Therefore,
measuring the amount of variant information leaked in various
experiments is a key first step in protecting private information. We
propose metrics to quantify private information leakage in functional
genomics data, linking attacks to validate the proposed metrics and
file formats that maximize the potential for data sharing while
protecting individuals' private information. Blockchain potentially
provides a way of storing genomic information and access to it
securely. We show how this can be done efficiently using various index
structures and how blockchain can be combined with our file formats
for sharing functional genomic information.
==
i0cdc
Saturday, June 12, 2021
HI
My name is Havilah Anthony, excuse me for bothering you but i have some important information's for you, so contact me back for more details thanks
Thursday, January 14, 2021
Re: [EXT] Abstract for MDACC Hogg seminar series
Title:
Analyzing the non-coding part of the cancer genome
Abstract:
My talk will focus on leveraging thousands of functional genomics
datasets to annotate the cancer genome and perform data mining to
discover cancer-associated regulators and variations.
First, I will go over the ENCODE annotations related to the cancer
genome. I will introduce our computational efforts to perform
large-scale integration to accurately define distal and proximal
regulatory elements (i.e., the MatchedFilter tool). Then I will show
how this extended gene annotation allows us to place oncogenic
transformations in the context of a broad cell space; here, many
normal-to-tumor transitions move towards a stem-like state, while
oncogene knockdowns show an opposing trend.
Next, I will look at our comprehensive regulatory networks of
transcription factors and RNA-binding proteins (TFs and RBPs). I will
showcase their value by highlighting how SUB1, a previously
uncharacterized RBP, drives aberrant tumor expression and amplifies
the well-known oncogenic TF MYC.
Third, I will introduce a workflow to prioritize key elements and
variants. I will showcase the application of this prioritization to
somatic burdening, cancer differential expression, and GWAS (the
LARVA, MOAT & uORF tools). Targeted validations of the prioritized
regulators, elements, and variants demonstrate the value of our
annotation resource.
Finally, I will describe how ENCODE annotations can be applied to the
comprehensive PCAWG mutation dataset. The goal is to determine the
overall burdening of various elements in cancer genomes. I will show
how this correlates with patient survival time and tumor clonality. I
adapt an additive-effects model from complex-trait studies to show
that putative passengers' aggregated effect, including undetected weak
drivers, provides significant additional power (~12% additive
variance) for predicting cancerous phenotypes beyond identified driver
mutations. Furthermore, this framework allows one to estimate
potential weak-driver mutations in samples lacking any
well-characterized driver alterations.
URLs:
http://encodec.encodeproject.org
http://radar.gersteinlab.org
http://MatchedFilter.gersteinlab.org
http://pcawg.gersteinlab.org
==
i0mda21
Analyzing the non-coding part of the cancer genome
Abstract:
My talk will focus on leveraging thousands of functional genomics
datasets to annotate the cancer genome and perform data mining to
discover cancer-associated regulators and variations.
First, I will go over the ENCODE annotations related to the cancer
genome. I will introduce our computational efforts to perform
large-scale integration to accurately define distal and proximal
regulatory elements (i.e., the MatchedFilter tool). Then I will show
how this extended gene annotation allows us to place oncogenic
transformations in the context of a broad cell space; here, many
normal-to-tumor transitions move towards a stem-like state, while
oncogene knockdowns show an opposing trend.
Next, I will look at our comprehensive regulatory networks of
transcription factors and RNA-binding proteins (TFs and RBPs). I will
showcase their value by highlighting how SUB1, a previously
uncharacterized RBP, drives aberrant tumor expression and amplifies
the well-known oncogenic TF MYC.
Third, I will introduce a workflow to prioritize key elements and
variants. I will showcase the application of this prioritization to
somatic burdening, cancer differential expression, and GWAS (the
LARVA, MOAT & uORF tools). Targeted validations of the prioritized
regulators, elements, and variants demonstrate the value of our
annotation resource.
Finally, I will describe how ENCODE annotations can be applied to the
comprehensive PCAWG mutation dataset. The goal is to determine the
overall burdening of various elements in cancer genomes. I will show
how this correlates with patient survival time and tumor clonality. I
adapt an additive-effects model from complex-trait studies to show
that putative passengers' aggregated effect, including undetected weak
drivers, provides significant additional power (~12% additive
variance) for predicting cancerous phenotypes beyond identified driver
mutations. Furthermore, this framework allows one to estimate
potential weak-driver mutations in samples lacking any
well-characterized driver alterations.
URLs:
http://encodec.encodeproject.org
http://radar.gersteinlab.org
http://MatchedFilter.gersteinlab.org
http://pcawg.gersteinlab.org
==
i0mda21
Subscribe to:
Posts (Atom)