Tuesday, August 26, 2014

Abstract for SAMSI meeting, Opening workshop '14 (i0samsi)

Abstract for

Human Genome Analysis

The ENCODE and modENCODE consortia have generated a resource
containing large amounts of transcriptomic data, extensive mapping of
chromatin states, as well as the binding locations of over 300
transcription-regulatory factors for human, worm and fly. We performed
extensive data integration by constructing genome-wide co-expression
networks and transcriptional regulatory models, revealing fundamental
principles of transcription and network organization that are
conserved across the three highly divergent animals. In particular:

(1) We developed a novel cross-species clustering algorithm to
integrate the co-expression networks of the three species, resulted at
conserved modules shared between the organisms. These modules are
enriched in developmental genes and exhibited hourglass behavior. They
were then used to align the stages in worm and fly development,
finding the normal embryo-to-embryo and larvae-to-larvae pairings in
addition to a novel pairing between worm embryo and fly pupae.

(2) We developed a global optimization algorithm to examine the
hierarchical organization of the regulatory network. We found that,
despite extensive rewiring of binding targets, high-level organization
principles such as a three-layer heirarchy are conserved across the
three species.

(3) We found the gene expression levels in the organisms, both coding
and non-coding, can be predicted consistently based on their upstream
histone marks. In fact, a "universal model" with a single set of
cross-organism parameters can predict expression level for both protein
coding genes and ncRNAs.

(4) Carrying out the same type of "predictions" for TFs,
we found that information in their binding is more localized to near the TSS
region than that of histone marks but is largely redundant with that
of the marks. Surprisingly, only a small number of TFs are necessary in the models
to successfully predict expression (e.g. ~5 of the >1000 in human).

(5) Finally, we found that the extent of the non-coding, non-canonical
transcription is consistent between worm, fly and human.



No comments: