Gerstein Lab Abstracts: 2013

Thursday, December 5, 2013

Abstract for Talk at University of Connecticut Health Center (i0ccam)

Human Genome Analysis

Plummeting sequencing costs have led to a great increase in the number
of personal genomes. Interpreting the large number of variants in
them, particularly in non-coding regions, is a central challenge for
genomics. We investigate patterns of selection in DNA elements from
the ENCODE project using the full spectrum of sequence variants from
1,092 individuals in the 1000 Genomes Project Phase 1, including
single-nucleotide variants (SNVs), short insertions and deletions
(indels) and structural variants (SVs). We analyze both coding and
non-coding regions, with the former corroborating the latter. We
identify a specific sub-group of non-coding categories that exhibit
very strong selection constraint, comparable to coding genes:
"ultra-sensitive" regions. We also find variants that are disruptive
due to mechanistic effects on transcription-factor binding (i.e.
"motif-breakers").

We make great use of networks -- contrasting them with linear
annotation -- and describe how we construct a practical instantiation
of the human regulatory network. Using connectivity information
between elements from protein-protein interaction and regulatory
networks, we find that variants in regions with higher network
centrality tend to be deleterious. Indels and SVs follow a similar
pattern as SNVs, with some notable exceptions (e.g. certain deletions
and enhancers).

Using these results, we develop a scheme and a practical tool to
prioritize non-coding variants based on their potential deleterious
impact. As a proof of principle, we experimentally validate and
characterize a small number of candidate variants prioritized by the
tool. Application of the tool to ~90 cancer genomes (breast, prostate
and medulloblastoma) reveals ~100 candidate non-coding cancer drivers.
This approach can be readily used in precision medicine to prioritize
variants.

References

Architecture of the human regulatory network derived from ENCODE data.
Gerstein et al. Nature 489: 91
[encodenets.gersteinlab.org]

Interpretation of genomic variants using a unified biological network approach.
E Khurana, Y Fu, J Chen, M Gerstein (2013). PLoS Comput Biol 9: e1002886.

Integrative annotation of variants from 1,092 humans: application to
cancer genomics
E Khurana et al. (2013) Science 342:1235587.
[funseq.gersteinlab.org]

Monday, December 2, 2013

RE: Abstract for Talk at Keystone Big Data Symposium (i0keybdata)

Dear Dr. Gerstein,

Thank you for your abstract. Co you please provide me with an abstract title and if there are more authors then just you, please send me their names and institutes.

Thank you and have a good day!

Jenny Hindorff
Keystone Symposia
Program Implementation Associate
www.keystonesymposia.org
jennyh@keystonesymposia.org
970-262-2661

-----Original Message-----
From: Mark Gerstein [mailto:mark@gersteinlab.org]
Sent: Saturday, November 30, 2013 11:45 AM
To: Programs
Cc: glabstracts.mbglab@blogger.com
Subject: Abstract for Talk at Keystone Big Data Symposium (i0keybdata)

My talk will discuss Human Genome Analysis from a data science perspective.

Plummeting sequencing costs have led to a great increase in the number of personal genomes. Interpreting the large number of variants in them, particularly in non-coding regions, is a central challenge for genomics.

One data science construct that is particularly useful for genome interpretation is networks. My talk will be concerned with the analysis of networks and the use of networks as a "next-generation annotation" for interpreting personal genomes. I will initially describe current approaches to genome annotation in terms of one-dimensional browser tracks. Here I will discuss approaches for annotating pseudogenes and also for developing predictive models for gene expression.
Then I will describe various aspects of networks. In particular, I will touch on the following topics: (1) I will show how analyzing the structure of the regulatory network indicates that it has a hierarchical layout with the "middle-managers" acting as information-flow bottlenecks and with more "influential" TFs on top. (2) I will show that most human variation occurs at the periphery of the network. (3) I will compare the topology and variation of the regulatory network to the call graph of a computer operating system, showing that they have different patterns of variation. (4) I will talk about web-based tools for the analysis of networks (TopNet and tYNA).

http://networks.gersteinlab.org
http://tyna.gersteinlab.org

Architecture of the human regulatory network derived from ENCODE data.
Gerstein et al. Nature 489: 91

Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors.
KY Yip et al. (2012). Genome Biol 13: R48.

Understanding transcriptional regulation by integrative analysis of transcription factor binding data.
C Cheng et al. (2012). Genome Res 22: 1658-67.

The GENCODE pseudogene resource.
B Pei et al. (2012). Genome Biol 13: R51.

Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks.
KK Yan et al. (2010). Proc Natl Acad Sci U S A 107:9186-91.

Saturday, November 30, 2013

Abstract for Talk at Keystone Big Data Symposium (i0keybdata)

My talk will discuss Human Genome Analysis from a data science perspective.

Plummeting sequencing costs have led to a great increase in the number
of personal genomes. Interpreting the large number of variants in
them, particularly in non-coding regions, is a central challenge for
genomics.

One data science construct that is particularly useful for genome
interpretation is networks. My talk will be concerned with the
analysis of networks and the use of
networks as a "next-generation annotation" for interpreting personal
genomes. I will initially describe current approaches to genome
annotation in terms of one-dimensional browser tracks. Here I will discuss
approaches for annotating pseudogenes and also
for developing predictive models for gene expression.
Then I will describe various aspects of networks. In particular, I will touch on
the following topics: (1) I will show how analyzing the structure of
the regulatory network indicates that it has a hierarchical layout
with the "middle-managers" acting as information-flow bottlenecks and
with more "influential" TFs on top. (2) I will show that most human
variation occurs at the periphery of the network. (3) I will compare
the topology and variation of the regulatory network to the call graph
of a computer operating system, showing that they have different
patterns of variation. (4) I will talk about web-based tools for the
analysis of networks (TopNet and tYNA).

http://networks.gersteinlab.org
http://tyna.gersteinlab.org

Architecture of the human regulatory network derived from ENCODE data.
Gerstein et al. Nature 489: 91

Classification of human genomic regions based on experimentally
determined binding sites of more than 100 transcription-related
factors.
KY Yip et al. (2012). Genome Biol 13: R48.

Understanding transcriptional regulation by integrative analysis of
transcription factor binding data.
C Cheng et al. (2012). Genome Res 22: 1658-67.

The GENCODE pseudogene resource.
B Pei et al. (2012). Genome Biol 13: R51.

Comparing genomes to computer operating systems in terms of the
topology and evolution of their regulatory control networks.
KK Yan et al. (2010). Proc Natl Acad Sci U S A 107:9186-91.

Friday, November 29, 2013

Fwd: Pot. Abstract for Talk at ASHG '14 (i0ashg14)

Network Analysis for Human Genomics

Plummeting sequencing costs have led to a great increase in the number
of personal genomes. Interpreting the large number of variants in
them, particularly in non-coding regions, is a central challenge for
genomics. We investigate patterns of selection in DNA elements from
the ENCODE project using the full spectrum of sequence variants from
1,092 individuals in the 1000 Genomes Project Phase 1, including
single-nucleotide variants (SNVs), short insertions and deletions
(indels) and structural variants (SVs). We analyze both coding and
non-coding regions, with the former corroborating the latter. We
identify a specific sub-group of non-coding categories that exhibit
very strong selection constraint, comparable to coding genes:
"ultra-sensitive" regions. We also find variants that are disruptive
due to mechanistic effects on transcription-factor binding (i.e.
"motif-breakers").

We make great use of networks -- contrasting them with linear
annotation -- and describe how we construct a practical instantiation
of the human regulatory network. Using connectivity information
between elements from protein-protein interaction and regulatory
networks, we find that variants in regions with higher network
centrality tend to be deleterious. Indels and SVs follow a similar
pattern as SNVs, with some notable exceptions (e.g. certain deletions
and enhancers).

Using these results, we develop a scheme and a practical tool to
prioritize non-coding variants based on their potential deleterious
impact. As a proof of principle, we experimentally validate and
characterize a small number of candidate variants prioritized by the
tool. Application of the tool to ~90 cancer genomes (breast, prostate
and medulloblastoma) reveals ~100 candidate non-coding cancer drivers.
This approach can be readily used in precision medicine to prioritize
variants.

References

Architecture of the human regulatory network derived from ENCODE data.
Gerstein et al. Nature 489: 91
[encodenets.gersteinlab.org]

Interpretation of genomic variants using a unified biological network approach.
E Khurana, Y Fu, J Chen, M Gerstein (2013). PLoS Comput Biol 9: e1002886.

Integrative annotation of variants from 1,092 humans: application to
cancer genomics
E Khurana et al. (2013) Science 342:1235587.
[funseq.gersteinlab.org]

Saturday, October 19, 2013

Fwd: Abstract for Talk at Duke (i0duke)

Sunday, September 8, 2013

Abstract for Talk at Georgia Tech (i0gatech)

Human Genome Analysis: Application to Cancer

Plummeting sequencing costs have led to a great increase in the number
of personal genomes. Interpreting the large number of variants in
them, particularly in non-coding regions, is a central challenge for
genomics. We investigate patterns of selection in DNA elements from
the ENCODE project using the full spectrum of sequence variants from
1,092 individuals in the 1000 Genomes Project Phase 1, including
single-nucleotide variants (SNVs), short insertions and deletions
(indels) and structural variants (SVs). We analyze both coding and
non-coding regions, with the former corroborating the latter. We
identify a specific sub-group of non-coding categories that exhibit
very strong selection constraint, comparable to coding genes:
"ultra-sensitive" regions. We also find variants that are disruptive
due to mechanistic effects on transcription-factor binding (i.e.
"motif-breakers").

We make great use of networks -- contrasting them with linear
annotation -- and describe how we construct a practical instantiation
of the human regulatory network. Using connectivity information
between elements from protein-protein interaction and regulatory
networks, we find that variants in regions with higher network
centrality tend to be deleterious. Indels and SVs follow a similar
pattern as SNVs, with some notable exceptions (e.g. certain deletions
and enhancers).

Using these results, we develop a scheme and a practical tool to
prioritize non-coding variants based on their potential deleterious
impact. As a proof of principle, we experimentally validate and
characterize a small number of candidate variants prioritized by the
tool. Application of the tool to ~90 cancer genomes (breast, prostate
and medulloblastoma) reveals ~100 candidate non-coding cancer drivers.
This approach can be readily used in precision medicine to prioritize
variants.

References

Architecture of the human regulatory network derived from ENCODE data.
Gerstein et al. Nature 489: 91

Interpretation of genomic variants using a unified biological network approach.
E Khurana, Y Fu, J Chen, M Gerstein (2013). PLoS Comput Biol 9: e1002886.

Integrative annotation of variants from 1,092 humans: application to
cancer genomics
E Khurana et al. Science (in press)

Thursday, May 16, 2013

Abstract for Talk at Harvard CCSB (i0farb)

My talk will be concerned with the analysis of networks and the use of
networks as a "next-generation annotation" for interpreting personal
genomes. I will initially describe current approaches to genome
annotation in terms of one-dimensional browser tracks. Here I will discuss
approaches for annotating pseudogenes and also
for developing predictive models for gene expression.
Then I will describe various aspects of networks. In particular, I will touch on
the following topics: (1) I will show how analyzing the structure of
the regulatory network indicates that it has a hierarchical layout
with the "middle-managers" acting as information-flow bottlenecks and
with more "influential" TFs on top. (2) I will show that most human
variation occurs at the periphery of the network. (3) I will compare
the topology and variation of the regulatory network to the call graph
of a computer operating system, showing that they have different
patterns of variation. (4) I will talk about web-based tools for the
analysis of networks (TopNet and tYNA).

http://networks.gersteinlab.org
http://tyna.gersteinlab.org

Architecture of the human regulatory network derived from ENCODE data.
Gerstein et al. Nature 489: 91

Classification of human genomic regions based on experimentally
determined binding sites of more than 100 transcription-related
factors.
KY Yip et al. (2012). Genome Biol 13: R48.

Understanding transcriptional regulation by integrative analysis of
transcription factor binding data.
C Cheng et al. (2012). Genome Res 22: 1658-67.

The GENCODE pseudogene resource.
B Pei et al. (2012). Genome Biol 13: R51.

Comparing genomes to computer operating systems in terms of the
topology and evolution of their regulatory control networks.
KK Yan et al. (2010). Proc Natl Acad Sci U S A 107:9186-91.

Monday, March 25, 2013

Abstract for talk i0cmg - Comp_Meth_Prioritizing_Var_Exome_Seq_Pgenes_trueLoF_nethubs--20130319-i0cmg

* Filtering out Artifacts Due to Pseudogenes
-Certain genes have lots of similar pseudogenes
which could confound variant calling

* Finding True LoF Mutations
-Not just stop codon finding: tricky if one takes into
account splicing, NMD, indels, &c

* Using High Network Connectivity
-More connected genes in many networks have a
greater chance of being disease causing

Abstract for Talk at Chicago (i0chi12)

Abstract for Genome_Annotation_Compare_n_Func_Description_SVs_n_Nets--20130322-i0simons

1 ## Annotation via Large-scale Identification of Variable Blocks in
the Population

Methods
Read-depth: MSB+CNVnator
Breakpoints & Split Read: SRiC, AGE & BreakSeq

Applications : 1000G & Somatic Variation

2 ## A Networks View on Large-scale Organization of Genomic Elements

Understanding the human regulatory network as a hierarchy with
information flow bottlenecks
Understanding the impact of variation and constraint on the network
Particularly with network analogies

Gerstein Lab Abstracts