Title: Comparative Transcriptome Analysis
Abstract:
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. The
consortium performed
extensive data integration on this data set. Here
I will give an overview of the data and some of the key analyses.
In particular:
(1) Conservation & Divergence of Transcription
(1a) A novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulting in
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior.
(1b) The extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.
(1c) In contrast, analyses of pseudogene (fossil genes) show that they
diverged greatly between the organisms, much more so than genes.
Nevertheless, they had a consistent amount of residual transcription.
(2) Conservation of Regulation
(2a) A global optimization algorithm to examine the
hierarchical organization of the regulatory network.
Despite extensive rewiring of binding targets, high-level organization
principles such as a three-layer hierarchy are conserved across the
three species.
(2b) The gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expression level for both protein
coding genes and ncRNAs.
encodenets.gersteinlab.org
encodeproject.org/comparative
pseudogene.org/psicube
Monday, December 22, 2014
Friday, November 7, 2014
Re: Invited Talk at 2014 NIPS Workshop on Machine Learning in Computational Biology
Hi Mark,
I'm sorry. I didn't check my gmail yesterday.
Best,
Martin Renqiang
发自我的 iPhone
在 Nov 6, 2014,11:13 AM,Mark Gerstein <mark@gersteinlab.org> 写道:
> Title: Comparative Genome Analysis
>
> Abstract:
>
> The ENCODE and modENCODE consortia have generated a resource
> containing large amounts of transcriptomic data, extensive mapping of
> chromatin states, as well as the binding locations of over 300
> transcription-regulatory factors for human, worm and fly. The
> consortium performed
> extensive data integration on this data set. Here
> I will give an overview of the data and some of the key analyses.
> In particular:
>
> (1) Conservation & Divergence of Transcription
>
> (1a) A novel cross-species clustering algorithm to
> integrate the co-expression networks of the three species, resulting in
> conserved modules shared between the organisms. These modules are
> enriched in developmental genes and exhibited hourglass behavior.
>
> (1b) The extent of the non-coding, non-canonical
> transcription is consistent between worm, fly and human.
>
> (1c) In contrast, analyses of pseudogene (fossil genes) show that they
> diverged greatly between the organisms, much more so than genes.
> Nevertheless, they had a consistent amount of residual transcription.
>
> (2) Conservation of Regulation
>
> (2a) A global optimization algorithm to examine the
> hierarchical organization of the regulatory network.
> Despite extensive rewiring of binding targets, high-level organization
> principles such as a three-layer hierarchy are conserved across the
> three species.
>
> (2b) The gene expression levels in the organisms, both coding
> and non-coding, can be predicted consistently based on their upstream
> histone marks. In fact, a "universal model" with a single set of
> cross-organism parameters can predict expression level for both protein
> coding genes and ncRNAs.
>
>
> encodenets.gersteinlab.org
> encodeproject.org/comparative
> pseudogene.org/psicube
>
> ==
>
> i0nips
I'm sorry. I didn't check my gmail yesterday.
Best,
Martin Renqiang
发自我的 iPhone
在 Nov 6, 2014,11:13 AM,Mark Gerstein <mark@gersteinlab.org> 写道:
> Title: Comparative Genome Analysis
>
> Abstract:
>
> The ENCODE and modENCODE consortia have generated a resource
> containing large amounts of transcriptomic data, extensive mapping of
> chromatin states, as well as the binding locations of over 300
> transcription-regulatory factors for human, worm and fly. The
> consortium performed
> extensive data integration on this data set. Here
> I will give an overview of the data and some of the key analyses.
> In particular:
>
> (1) Conservation & Divergence of Transcription
>
> (1a) A novel cross-species clustering algorithm to
> integrate the co-expression networks of the three species, resulting in
> conserved modules shared between the organisms. These modules are
> enriched in developmental genes and exhibited hourglass behavior.
>
> (1b) The extent of the non-coding, non-canonical
> transcription is consistent between worm, fly and human.
>
> (1c) In contrast, analyses of pseudogene (fossil genes) show that they
> diverged greatly between the organisms, much more so than genes.
> Nevertheless, they had a consistent amount of residual transcription.
>
> (2) Conservation of Regulation
>
> (2a) A global optimization algorithm to examine the
> hierarchical organization of the regulatory network.
> Despite extensive rewiring of binding targets, high-level organization
> principles such as a three-layer hierarchy are conserved across the
> three species.
>
> (2b) The gene expression levels in the organisms, both coding
> and non-coding, can be predicted consistently based on their upstream
> histone marks. In fact, a "universal model" with a single set of
> cross-organism parameters can predict expression level for both protein
> coding genes and ncRNAs.
>
>
> encodenets.gersteinlab.org
> encodeproject.org/comparative
> pseudogene.org/psicube
>
> ==
>
> i0nips
Thursday, November 6, 2014
Invited Talk at 2014 NIPS Workshop on Machine Learning in Computational Biology
Title: Comparative Genome Analysis
Abstract:
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. The
consortium performed
extensive data integration on this data set. Here
I will give an overview of the data and some of the key analyses.
In particular:
(1) Conservation & Divergence of Transcription
(1a) A novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulting in
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior.
(1b) The extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.
(1c) In contrast, analyses of pseudogene (fossil genes) show that they
diverged greatly between the organisms, much more so than genes.
Nevertheless, they had a consistent amount of residual transcription.
(2) Conservation of Regulation
(2a) A global optimization algorithm to examine the
hierarchical organization of the regulatory network.
Despite extensive rewiring of binding targets, high-level organization
principles such as a three-layer hierarchy are conserved across the
three species.
(2b) The gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expression level for both protein
coding genes and ncRNAs.
encodenets.gersteinlab.org
encodeproject.org/comparative
pseudogene.org/psicube
==
i0nips
Abstract:
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. The
consortium performed
extensive data integration on this data set. Here
I will give an overview of the data and some of the key analyses.
In particular:
(1) Conservation & Divergence of Transcription
(1a) A novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulting in
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior.
(1b) The extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.
(1c) In contrast, analyses of pseudogene (fossil genes) show that they
diverged greatly between the organisms, much more so than genes.
Nevertheless, they had a consistent amount of residual transcription.
(2) Conservation of Regulation
(2a) A global optimization algorithm to examine the
hierarchical organization of the regulatory network.
Despite extensive rewiring of binding targets, high-level organization
principles such as a three-layer hierarchy are conserved across the
three species.
(2b) The gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expression level for both protein
coding genes and ncRNAs.
encodenets.gersteinlab.org
encodeproject.org/comparative
pseudogene.org/psicube
==
i0nips
Saturday, November 1, 2014
Fwd: CSHL Meeting Abstract Submission
Abstract for Biological Data Science 2014 [i0biods14]
Title: A computational framework to prioritize regulatory variants
from whole-genome sequencing in cancer
Mark Gerstein1, Yao Fu1, Zhu Liu2, Shaoke Lou3, Jason Bedford1,
Xinmeng J Mu4, Kevin Y Yip3, Ekta Khurana1
1Yale University, Molecular Biophysics & Biochemistry, New Haven, CT,
2Fudan University, School of Life Science, Shanghai, China, 3The
Chinese University of Hong Kong, Department of Computer Science and
Engineering, Shatin, Hong Kong, 4 Broad Institute of Harvard and MIT,
Broad Institute of Harvard and MIT, Cambridge, MA
Identification of noncoding cancer "drivers" from thousands of somatic
alterations is a difficult and unsolved problem. Here, we developed a
computational framework to annotate and prioritize cancer regulatory
mutations. The framework combines an adjustable data context
summarizing large-scale genomics and cancer-relevant datasets with an
efficient variant prioritization pipeline. To prioritize high impact
variants, we developed a weighted scoring scheme to score each
mutation's impact through analyzing conservation, loss-of and gain-of
function events, gene associations, network topology and across-sample
recurrence. Cancer specific information is used to further highlight
potential oncogenic relevant candidates.
Title: A computational framework to prioritize regulatory variants
from whole-genome sequencing in cancer
Mark Gerstein1, Yao Fu1, Zhu Liu2, Shaoke Lou3, Jason Bedford1,
Xinmeng J Mu4, Kevin Y Yip3, Ekta Khurana1
1Yale University, Molecular Biophysics & Biochemistry, New Haven, CT,
2Fudan University, School of Life Science, Shanghai, China, 3The
Chinese University of Hong Kong, Department of Computer Science and
Engineering, Shatin, Hong Kong, 4 Broad Institute of Harvard and MIT,
Broad Institute of Harvard and MIT, Cambridge, MA
Identification of noncoding cancer "drivers" from thousands of somatic
alterations is a difficult and unsolved problem. Here, we developed a
computational framework to annotate and prioritize cancer regulatory
mutations. The framework combines an adjustable data context
summarizing large-scale genomics and cancer-relevant datasets with an
efficient variant prioritization pipeline. To prioritize high impact
variants, we developed a weighted scoring scheme to score each
mutation's impact through analyzing conservation, loss-of and gain-of
function events, gene associations, network topology and across-sample
recurrence. Cancer specific information is used to further highlight
potential oncogenic relevant candidates.
Friday, October 31, 2014
Fwd: FOLLOW-UP re: abstract - Confirmed - Speaking Invitation Clinical Genomics Track this April 21-23 at BioIT World 2015 in Boston
i0bioit15
Human Genome Analysis
Identification of noncoding cancer "drivers" from thousands of somatic
alterations is a difficult and unsolved problem. Here, we developed a
computational framework to annotate and prioritize cancer regulatory
mutations. The framework combines an adjustable data context
summarizing large-scale genomics and cancer-relevant datasets with an
efficient variant prioritization pipeline. To prioritize high impact
variants, we developed a weighted scoring scheme to score each
mutation's impact through analyzing conservation, loss-of and gain-of
function events, gene associations, network topology and across-sample
recurrence. Cancer specific information is used to further highlight
potential oncogenic relevant candidates.
Human Genome Analysis
Identification of noncoding cancer "drivers" from thousands of somatic
alterations is a difficult and unsolved problem. Here, we developed a
computational framework to annotate and prioritize cancer regulatory
mutations. The framework combines an adjustable data context
summarizing large-scale genomics and cancer-relevant datasets with an
efficient variant prioritization pipeline. To prioritize high impact
variants, we developed a weighted scoring scheme to score each
mutation's impact through analyzing conservation, loss-of and gain-of
function events, gene associations, network topology and across-sample
recurrence. Cancer specific information is used to further highlight
potential oncogenic relevant candidates.
Saturday, August 30, 2014
Re: Abstract for SAMSI meeting, Opening workshop '14 (i0samsi)
Abstract for
http://www.samsi.info/bio-ow
Human Genome Analysis
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. The consortium performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory models, revealing fundamental
principles of transcription and network organization that are
conserved across the three highly divergent animals.
http://www.samsi.info/bio-ow
Human Genome Analysis
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. The consortium performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory models, revealing fundamental
principles of transcription and network organization that are
conserved across the three highly divergent animals.
I will give an overview of the data and some of the key analyses.
(1) A novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulted at
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior.
(2) A global optimization algorithm to examine the
hierarchical organization of the regulatory network.
In particular:
(1) A novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulted at
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior.
(2) A global optimization algorithm to examine the
hierarchical organization of the regulatory network.
Despite extensive rewiring of binding targets, high-level organization
principles such as a three-layer heirarchy are conserved across the
three species.
(3) The gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expression level for both protein
coding genes and ncRNAs.
principles such as a three-layer heirarchy are conserved across the
three species.
(3) The gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expression level for both protein
coding genes and ncRNAs.
(4) Their have been many analyses of pseudogenes.
(5) Finally, the extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.
encodenets.gersteinlab.org
encodeproject.org/comparative
(5) Finally, the extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.
encodenets.gersteinlab.org
encodeproject.org/comparative
Tuesday, August 26, 2014
Abstract for SAMSI meeting, Opening workshop '14 (i0samsi)
Abstract for
encodeproject.org/comparative
Human Genome Analysis
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. We performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory models, revealing fundamental
principles of transcription and network organization that are
conserved across the three highly divergent animals. In particular:
(1) We developed a novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulted at
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior. They
were then used to align the stages in worm and fly development,
finding the normal embryo-to-embryo and larvae-to-larvae pairings in
addition to a novel pairing between worm embryo and fly pupae.
(2) We developed a global optimization algorithm to examine the
hierarchical organization of the regulatory network. We found that,
despite extensive rewiring of binding targets, high-level organization
principles such as a three-layer heirarchy are conserved across the
three species.
(3) We found the gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expression level for both protein
coding genes and ncRNAs.
(4) Carrying out the same type of "predictions" for TFs,
we found that information in their binding is more localized to near the TSS
region than that of histone marks but is largely redundant with that
of the marks. Surprisingly, only a small number of TFs are necessary in the models
to successfully predict expression (e.g. ~5 of the >1000 in human).
(5) Finally, we found that the extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.
encodenets.gersteinlab.org
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. We performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory models, revealing fundamental
principles of transcription and network organization that are
conserved across the three highly divergent animals. In particular:
(1) We developed a novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulted at
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior. They
were then used to align the stages in worm and fly development,
finding the normal embryo-to-embryo and larvae-to-larvae pairings in
addition to a novel pairing between worm embryo and fly pupae.
(2) We developed a global optimization algorithm to examine the
hierarchical organization of the regulatory network. We found that,
despite extensive rewiring of binding targets, high-level organization
principles such as a three-layer heirarchy are conserved across the
three species.
(3) We found the gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expression level for both protein
coding genes and ncRNAs.
(4) Carrying out the same type of "predictions" for TFs,
we found that information in their binding is more localized to near the TSS
region than that of histone marks but is largely redundant with that
of the marks. Surprisingly, only a small number of TFs are necessary in the models
to successfully predict expression (e.g. ~5 of the >1000 in human).
(5) Finally, we found that the extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.
encodenets.gersteinlab.org
Monday, August 18, 2014
Re: Abstract for talk BioConference Live August 2014 Genetics and Genomics (i0bioconfweb0mg)
yes
On Mon, Aug 18, 2014 at 5:18 PM, Don Cruikshank
<don.cruikshank@labroots.com> wrote:
> Thanks Mark. I take it that you want us to replace the abstract that we already received from you???
>
> Don
>
> -----Original Message-----
> From: Mark Gerstein [mailto:mark@gersteinlab.org]
> Sent: Monday, August 18, 2014 1:47 PM
> To: Don Cruikshank
> Cc: glabstracts.mbglab@blogger.com; x57v@gersteinlab.org
> Subject: Abstract for talk BioConference Live August 2014 Genetics and Genomics (i0bioconfweb0mg)
>
> The ENCODE and modENCODE consortia have generated a resource containing large amounts of transcriptomic data, extensive mapping of chromatin states, as well as the binding locations of >300 transcription factors (TFs) for human, worm and fly. We performed extensive data integration by constructing genome-wide co-expression networks and transcriptional regulatory models, revealing fundamental principles of transcription conserved across the three highly divergent animals.
> In particular, we found the gene expression levels in the organisms, both coding and non-coding, can be predicted consistently based on their upstream histone marks. In fact, a "universal model" with a single set of cross-organism parameters can predict expression level for both protein coding genes and ncRNAs. Carrying out the same type of "predictions" for TFs, we found that information in their binding is more localized to near the TSS region than that of histone marks but is largely redundant with that of the marks.
> Surprisingly, only a small number of TFs are necessary in the models to successfully predict expression (e.g. ~5 of the >1000 in human).
>
>
> hashtag #BCLgenetics for this event
>
> http://www.bioconferencelive.com/events.php?event_id=17
>
On Mon, Aug 18, 2014 at 5:18 PM, Don Cruikshank
<don.cruikshank@labroots.com> wrote:
> Thanks Mark. I take it that you want us to replace the abstract that we already received from you???
>
> Don
>
> -----Original Message-----
> From: Mark Gerstein [mailto:mark@gersteinlab.org]
> Sent: Monday, August 18, 2014 1:47 PM
> To: Don Cruikshank
> Cc: glabstracts.mbglab@blogger.com; x57v@gersteinlab.org
> Subject: Abstract for talk BioConference Live August 2014 Genetics and Genomics (i0bioconfweb0mg)
>
> The ENCODE and modENCODE consortia have generated a resource containing large amounts of transcriptomic data, extensive mapping of chromatin states, as well as the binding locations of >300 transcription factors (TFs) for human, worm and fly. We performed extensive data integration by constructing genome-wide co-expression networks and transcriptional regulatory models, revealing fundamental principles of transcription conserved across the three highly divergent animals.
> In particular, we found the gene expression levels in the organisms, both coding and non-coding, can be predicted consistently based on their upstream histone marks. In fact, a "universal model" with a single set of cross-organism parameters can predict expression level for both protein coding genes and ncRNAs. Carrying out the same type of "predictions" for TFs, we found that information in their binding is more localized to near the TSS region than that of histone marks but is largely redundant with that of the marks.
> Surprisingly, only a small number of TFs are necessary in the models to successfully predict expression (e.g. ~5 of the >1000 in human).
>
>
> hashtag #BCLgenetics for this event
>
> http://www.bioconferencelive.com/events.php?event_id=17
>
RE: Abstract for talk BioConference Live August 2014 Genetics and Genomics (i0bioconfweb0mg)
Thanks Mark. I take it that you want us to replace the abstract that we already received from you???
Don
-----Original Message-----
From: Mark Gerstein [mailto:mark@gersteinlab.org]
Sent: Monday, August 18, 2014 1:47 PM
To: Don Cruikshank
Cc: glabstracts.mbglab@blogger.com; x57v@gersteinlab.org
Subject: Abstract for talk BioConference Live August 2014 Genetics and Genomics (i0bioconfweb0mg)
The ENCODE and modENCODE consortia have generated a resource containing large amounts of transcriptomic data, extensive mapping of chromatin states, as well as the binding locations of >300 transcription factors (TFs) for human, worm and fly. We performed extensive data integration by constructing genome-wide co-expression networks and transcriptional regulatory models, revealing fundamental principles of transcription conserved across the three highly divergent animals.
In particular, we found the gene expression levels in the organisms, both coding and non-coding, can be predicted consistently based on their upstream histone marks. In fact, a "universal model" with a single set of cross-organism parameters can predict expression level for both protein coding genes and ncRNAs. Carrying out the same type of "predictions" for TFs, we found that information in their binding is more localized to near the TSS region than that of histone marks but is largely redundant with that of the marks.
Surprisingly, only a small number of TFs are necessary in the models to successfully predict expression (e.g. ~5 of the >1000 in human).
hashtag #BCLgenetics for this event
http://www.bioconferencelive.com/events.php?event_id=17
Don
-----Original Message-----
From: Mark Gerstein [mailto:mark@gersteinlab.org]
Sent: Monday, August 18, 2014 1:47 PM
To: Don Cruikshank
Cc: glabstracts.mbglab@blogger.com; x57v@gersteinlab.org
Subject: Abstract for talk BioConference Live August 2014 Genetics and Genomics (i0bioconfweb0mg)
The ENCODE and modENCODE consortia have generated a resource containing large amounts of transcriptomic data, extensive mapping of chromatin states, as well as the binding locations of >300 transcription factors (TFs) for human, worm and fly. We performed extensive data integration by constructing genome-wide co-expression networks and transcriptional regulatory models, revealing fundamental principles of transcription conserved across the three highly divergent animals.
In particular, we found the gene expression levels in the organisms, both coding and non-coding, can be predicted consistently based on their upstream histone marks. In fact, a "universal model" with a single set of cross-organism parameters can predict expression level for both protein coding genes and ncRNAs. Carrying out the same type of "predictions" for TFs, we found that information in their binding is more localized to near the TSS region than that of histone marks but is largely redundant with that of the marks.
Surprisingly, only a small number of TFs are necessary in the models to successfully predict expression (e.g. ~5 of the >1000 in human).
hashtag #BCLgenetics for this event
http://www.bioconferencelive.com/events.php?event_id=17
Abstract for talk BioConference Live August 2014 Genetics and Genomics (i0bioconfweb0mg)
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of >300
transcription factors (TFs) for human, worm and fly. We performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory models, revealing fundamental
principles of transcription conserved across the three highly divergent animals.
In particular, we found the gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expression level for both protein
coding genes and ncRNAs. Carrying out the same type of "predictions" for TFs,
we found that information in their binding is more localized to near the TSS
region than that of histone marks but is largely redundant with that
of the marks.
Surprisingly, only a small number of TFs are necessary in the models
to successfully predict expression (e.g. ~5 of the >1000 in human).
hashtag #BCLgenetics for this event
http://www.bioconferencelive.com/events.php?event_id=17
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of >300
transcription factors (TFs) for human, worm and fly. We performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory models, revealing fundamental
principles of transcription conserved across the three highly divergent animals.
In particular, we found the gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expression level for both protein
coding genes and ncRNAs. Carrying out the same type of "predictions" for TFs,
we found that information in their binding is more localized to near the TSS
region than that of histone marks but is largely redundant with that
of the marks.
Surprisingly, only a small number of TFs are necessary in the models
to successfully predict expression (e.g. ~5 of the >1000 in human).
hashtag #BCLgenetics for this event
http://www.bioconferencelive.com/events.php?event_id=17
Sunday, August 10, 2014
Abstract for Talk at Beyond the Genome (i0beyond)
Abstract for
http://www.beyond-the-genome.com/2014/
Human Genome Analysis
Plummeting sequencing costs have led to a great increase in the number
of personal genomes. Interpreting the large number of variants in
them, particularly in non-coding regions, is a central challenge for
genomics. We investigate patterns of selection in DNA elements from
the ENCODE project using the full spectrum of sequence variants from
1,092 individuals in the 1000 Genomes Project Phase 1, including
single-nucleotide variants (SNVs), short insertions and deletions
(indels) and structural variants (SVs). We analyze both coding and
non-coding regions, with the former corroborating the latter. We
identify a specific sub-group of non-coding categories that exhibit
very strong selection constraint, comparable to coding genes:
"ultra-sensitive" regions. We also find variants that are disruptive
due to mechanistic effects on transcription-factor binding (i.e.
"motif-breakers").
We make great use of networks -- contrasting them with linear
annotation -- and describe how we construct a practical instantiation
of the human regulatory network. Using connectivity information
between elements from protein-protein interaction and regulatory
networks, we find that variants in regions with higher network
centrality tend to be deleterious. Indels and SVs follow a similar
pattern as SNVs, with some notable exceptions (e.g. certain deletions
and enhancers).
Using these results, we develop a scheme and a practical tool to
prioritize non-coding variants based on their potential deleterious
impact. As a proof of principle, we experimentally validate and
characterize a small number of candidate variants prioritized by the
tool. Application of the tool to ~90 cancer genomes (breast, prostate
and medulloblastoma) reveals ~100 candidate non-coding cancer drivers.
This approach can be readily used in precision medicine to prioritize
variants.
References
Architecture of the human regulatory network derived from ENCODE data.
Gerstein et al. Nature 489: 91
[encodenets.gersteinlab.org]
Interpretation of genomic variants using a unified biological network approach.
E Khurana, Y Fu, J Chen, M Gerstein (2013). PLoS Comput Biol 9: e1002886.
Integrative annotation of variants from 1,092 humans: application to
cancer genomics
E Khurana et al. (2013) Science 342:1235587.
[funseq.gersteinlab.org]
http://www.beyond-the-genome.com/2014/
Human Genome Analysis
Plummeting sequencing costs have led to a great increase in the number
of personal genomes. Interpreting the large number of variants in
them, particularly in non-coding regions, is a central challenge for
genomics. We investigate patterns of selection in DNA elements from
the ENCODE project using the full spectrum of sequence variants from
1,092 individuals in the 1000 Genomes Project Phase 1, including
single-nucleotide variants (SNVs), short insertions and deletions
(indels) and structural variants (SVs). We analyze both coding and
non-coding regions, with the former corroborating the latter. We
identify a specific sub-group of non-coding categories that exhibit
very strong selection constraint, comparable to coding genes:
"ultra-sensitive" regions. We also find variants that are disruptive
due to mechanistic effects on transcription-factor binding (i.e.
"motif-breakers").
We make great use of networks -- contrasting them with linear
annotation -- and describe how we construct a practical instantiation
of the human regulatory network. Using connectivity information
between elements from protein-protein interaction and regulatory
networks, we find that variants in regions with higher network
centrality tend to be deleterious. Indels and SVs follow a similar
pattern as SNVs, with some notable exceptions (e.g. certain deletions
and enhancers).
Using these results, we develop a scheme and a practical tool to
prioritize non-coding variants based on their potential deleterious
impact. As a proof of principle, we experimentally validate and
characterize a small number of candidate variants prioritized by the
tool. Application of the tool to ~90 cancer genomes (breast, prostate
and medulloblastoma) reveals ~100 candidate non-coding cancer drivers.
This approach can be readily used in precision medicine to prioritize
variants.
References
Architecture of the human regulatory network derived from ENCODE data.
Gerstein et al. Nature 489: 91
[encodenets.gersteinlab.org]
Interpretation of genomic variants using a unified biological network approach.
E Khurana, Y Fu, J Chen, M Gerstein (2013). PLoS Comput Biol 9: e1002886.
Integrative annotation of variants from 1,092 humans: application to
cancer genomics
E Khurana et al. (2013) Science 342:1235587.
[funseq.gersteinlab.org]
Abstract for Epigenomics-Boston-2014 Summit (i0epi14)
Abstract for
http://www.beyond-the-genome.com/2014/program
Human Genome Analysis
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. We performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory models, revealing fundamental
principles of transcription and network organization that are
conserved across the three highly divergent animals. In particular:
(1) We developed a novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulted at
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior. They
were then used to align the stages in worm and fly development,
finding the normal embryo-to-embryo and larvae-to-larvae pairings in
addition to a novel pairing between worm embryo and fly pupae.
(2) We developed a global optimization algorithm to examine the
hierarchical organization of the regulatory network. We found that,
despite extensive rewiring of binding targets, high-level organization
principles such as a three-layer heirarchy are conserved across the
three species.
(3) We found the gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expressionlevel for both protein
coding genes and ncRNAs.
(4) Finally, we found that the extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.
encodenets.gersteinlab.org
http://www.beyond-the-genome.com/2014/program
Human Genome Analysis
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. We performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory models, revealing fundamental
principles of transcription and network organization that are
conserved across the three highly divergent animals. In particular:
(1) We developed a novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulted at
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior. They
were then used to align the stages in worm and fly development,
finding the normal embryo-to-embryo and larvae-to-larvae pairings in
addition to a novel pairing between worm embryo and fly pupae.
(2) We developed a global optimization algorithm to examine the
hierarchical organization of the regulatory network. We found that,
despite extensive rewiring of binding targets, high-level organization
principles such as a three-layer heirarchy are conserved across the
three species.
(3) We found the gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expressionlevel for both protein
coding genes and ncRNAs.
(4) Finally, we found that the extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.
encodenets.gersteinlab.org
Sunday, May 18, 2014
Re: Uppsala ICM seminar -- request for title and abstract
Great Mark!
Thanks,
Jan
Jan Komorowski, Professor of Bioinformatics
Program in Computational and Systems Biology
Department of Cell and Molecular Biology
Uppsala University
jan.komorowski@icm.uu.se
and
Institute of Computer Science, PAN, Warsaw
On 17 maj 2014, at 22:14, Mark Gerstein <mark@gersteinlab.org> wrote:
> The ENCODE and modENCODE consortia have generated a resource
> containing large amounts of transcriptomic data, extensive mapping of
> chromatin states, as well as the binding locations of over 300
> transcription-regulatory factors for human, worm and fly. We performed
> extensive data integration by constructing genome-wide co-expression
> networks and transcriptional regulatory models, revealing fundamental
> principles of transcription and network organization that are
> conserved across the three highly divergent animals. In particular:
>
> (1) We developed a novel cross-species clustering algorithm to
> integrate the co-expression networks of the three species, resulted at
> conserved modules shared between the organisms. These modules are
> enriched in developmental genes and exhibited hourglass behavior. They
> were then used to align the stages in worm and fly development,
> finding the normal embryo-to-embryo and larvae-to-larvae pairings in
> addition to a novel pairing between worm embryo and fly pupae.
>
> (2) We developed a global optimization algorithm to examine the
> hierarchical organization of the regulatory network. We found that,
> despite extensive rewiring of binding targets, high-level organization
> principles such as a three-layer heirarchy are conserved across the
> three species.
>
> (3) We found the gene expression levels in the organisms, both coding
> and non-coding, can be predicted consistently based on their upstream
> histone marks. In fact, a "universal model" with a single set of
> cross-organism parameters can predict expressionlevel for both protein
> coding genes and ncRNAs.
>
> (4) Finally, we found that the extent of the non-coding, non-canonical
> transcription is consistent between worm, fly and human.
>
> encodenets.gersteinlab.org
>
> ==
> cmptxn,i0se14
Thanks,
Jan
Jan Komorowski, Professor of Bioinformatics
Program in Computational and Systems Biology
Department of Cell and Molecular Biology
Uppsala University
jan.komorowski@icm.uu.se
and
Institute of Computer Science, PAN, Warsaw
On 17 maj 2014, at 22:14, Mark Gerstein <mark@gersteinlab.org> wrote:
> The ENCODE and modENCODE consortia have generated a resource
> containing large amounts of transcriptomic data, extensive mapping of
> chromatin states, as well as the binding locations of over 300
> transcription-regulatory factors for human, worm and fly. We performed
> extensive data integration by constructing genome-wide co-expression
> networks and transcriptional regulatory models, revealing fundamental
> principles of transcription and network organization that are
> conserved across the three highly divergent animals. In particular:
>
> (1) We developed a novel cross-species clustering algorithm to
> integrate the co-expression networks of the three species, resulted at
> conserved modules shared between the organisms. These modules are
> enriched in developmental genes and exhibited hourglass behavior. They
> were then used to align the stages in worm and fly development,
> finding the normal embryo-to-embryo and larvae-to-larvae pairings in
> addition to a novel pairing between worm embryo and fly pupae.
>
> (2) We developed a global optimization algorithm to examine the
> hierarchical organization of the regulatory network. We found that,
> despite extensive rewiring of binding targets, high-level organization
> principles such as a three-layer heirarchy are conserved across the
> three species.
>
> (3) We found the gene expression levels in the organisms, both coding
> and non-coding, can be predicted consistently based on their upstream
> histone marks. In fact, a "universal model" with a single set of
> cross-organism parameters can predict expressionlevel for both protein
> coding genes and ncRNAs.
>
> (4) Finally, we found that the extent of the non-coding, non-canonical
> transcription is consistent between worm, fly and human.
>
> encodenets.gersteinlab.org
>
> ==
> cmptxn,i0se14
Saturday, May 17, 2014
Fwd: Uppsala ICM seminar -- request for title and abstract
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. We performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory models, revealing fundamental
principles of transcription and network organization that are
conserved across the three highly divergent animals. In particular:
(1) We developed a novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulted at
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior. They
were then used to align the stages in worm and fly development,
finding the normal embryo-to-embryo and larvae-to-larvae pairings in
addition to a novel pairing between worm embryo and fly pupae.
(2) We developed a global optimization algorithm to examine the
hierarchical organization of the regulatory network. We found that,
despite extensive rewiring of binding targets, high-level organization
principles such as a three-layer heirarchy are conserved across the
three species.
(3) We found the gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expressionlevel for both protein
coding genes and ncRNAs.
(4) Finally, we found that the extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.
encodenets.gersteinlab.org
==
cmptxn,i0se14
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. We performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory models, revealing fundamental
principles of transcription and network organization that are
conserved across the three highly divergent animals. In particular:
(1) We developed a novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulted at
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior. They
were then used to align the stages in worm and fly development,
finding the normal embryo-to-embryo and larvae-to-larvae pairings in
addition to a novel pairing between worm embryo and fly pupae.
(2) We developed a global optimization algorithm to examine the
hierarchical organization of the regulatory network. We found that,
despite extensive rewiring of binding targets, high-level organization
principles such as a three-layer heirarchy are conserved across the
three species.
(3) We found the gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expressionlevel for both protein
coding genes and ncRNAs.
(4) Finally, we found that the extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.
encodenets.gersteinlab.org
==
cmptxn,i0se14
Fwd: CSHL Meeting Abstract Submission
Thank you for submitting your abstract for Systems Biology: Global
Regulation of Gene Expression 2014 .
Comparative network analysis of ENCODE and modENCODE data
Mark Gerstein1,2, Koon-Kiu Yan1,2, modENCODE/ENCODE transcriptome group1
1Yale University, Program of Computational Biology and Bioinformatics,
New Haven, CT, 2Yale University, Molecular Biophysics and
Biochemistry, New Haven, CT
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. We performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory networks, revealing
fundamental principles of transcription and network organization that
are conserved across the three highly divergent animals. First, we
developed a novel cross-species clustering algorithm to integrate the
co-expression networks of the three species, resulted at conserved
modules shared between the organisms. These modules are enriched in
developmental genes and exhibited hourglass behavior. They were then
used to align the stages in worm and fly development, finding the
normal embryo-to-embryo and larvae-to-larvae pairings in addition to a
novel pairing between worm embryo and fly pupae. Second, we defined a
new score to quantify the degree of hierarchy (preponderance of
downward information flow) of a network, and developed a global
optimization algorithm to compare the hierarchical organization of the
three species. We found that, despite extensive rewiring of binding
targets, high-level organization principles like the hierarchical
structures are conserved across three species. Finally, we found the
gene expression levels in the organisms, both coding and non-coding,
can be predicted consistently based on their upstream histone marks.
In fact, a "universal model" with a single set of cross-organism
parameters can predict expression level for both protein coding genes
and ncRNAs. The algorithms introduced by this study can be easily
applied to other model datasets such as those from yeast.
i0cshsb
Regulation of Gene Expression 2014 .
Comparative network analysis of ENCODE and modENCODE data
Mark Gerstein1,2, Koon-Kiu Yan1,2, modENCODE/ENCODE transcriptome group1
1Yale University, Program of Computational Biology and Bioinformatics,
New Haven, CT, 2Yale University, Molecular Biophysics and
Biochemistry, New Haven, CT
The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. We performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory networks, revealing
fundamental principles of transcription and network organization that
are conserved across the three highly divergent animals. First, we
developed a novel cross-species clustering algorithm to integrate the
co-expression networks of the three species, resulted at conserved
modules shared between the organisms. These modules are enriched in
developmental genes and exhibited hourglass behavior. They were then
used to align the stages in worm and fly development, finding the
normal embryo-to-embryo and larvae-to-larvae pairings in addition to a
novel pairing between worm embryo and fly pupae. Second, we defined a
new score to quantify the degree of hierarchy (preponderance of
downward information flow) of a network, and developed a global
optimization algorithm to compare the hierarchical organization of the
three species. We found that, despite extensive rewiring of binding
targets, high-level organization principles like the hierarchical
structures are conserved across three species. Finally, we found the
gene expression levels in the organisms, both coding and non-coding,
can be predicted consistently based on their upstream histone marks.
In fact, a "universal model" with a single set of cross-organism
parameters can predict expression level for both protein coding genes
and ncRNAs. The algorithms introduced by this study can be easily
applied to other model datasets such as those from yeast.
i0cshsb
Subscribe to:
Posts (Atom)