Friday, September 21, 2007

abstract for talk [I] at AACR Special Conference 'The Role of Non-Coding RNAs in Cancer'

Title: Human Genome Annotation, Focusing on Intergenic Regions and ncRNAs

A central problem for 21st century science will be the analysis and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils) in the genome. I will discuss a comprehensive pseudogene
identification pipeline and storage database we have built. This has
enabled use to identify >10K pseudogenes in the human and mouse
genomes and analyze their distribution with respect to age, protein
family, and chromosomal location. One interesting finding is the large
number of ribosomal pseudogenes in the human genome, with 80
functional ribosomal proteins giving rise to ~2,000 ribosomal protein
pseudogenes.

I will try to inter-relate our studies on pseudogenes with those on
tiling arrays, which enable one to comprehensively probe the activity
of intergenic regions. Through this work we have been able to annotate
regulatory sites and regions of unannotated transcription in the
genome.

At the end I will bring these together, trying to assess the
biochemical activity of pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation and
tiling arrays -- i.e. the construction of annotation pipelines,
developing algorithms for optimal tiling, and refining approaches for
scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

References (4 most relevant are starred with "*")

Millions of years of evolution preserved: a comprehensive catalog of
the processed pseudogenes in the human genome. Z Zhang, PM Harrison,
Y Liu, M Gerstein (2003) Genome Res 13: 2541-58.

Patterns of nucleotide substitution, insertion and deletion in the
human genome inferred from pseudogenes. Z Zhang, M Gerstein (2003)
Nucleic Acids Res 31: 5338-48.

The ambiguous boundary between genes and pseudogenes: the dead rise
up, or do they? D Zheng, MB Gerstein (2007) Trends Genet

Pseudogene.org: a comprehensive database and comparison platform for
pseudogene annotation. JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero,
P Cayting, P Harrrison, M Gerstein (2007) Nucleic Acids Res 35:
D55-60.

A computational approach for identifying pseudogenes in the ENCODE
regions. D Zheng, MB Gerstein (2006) Genome Biol 7 Suppl 1: S13.1-10.

* The real life of pseudogenes. M Gerstein, D Zheng (2006) Sci Am 295:
48-55.

PseudoPipe: an automated pseudogene identification pipeline. Z Zhang,
N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006)
Bioinformatics 22: 1437-9.

Toward a universal microarray: prediction of gene expression through
nearest-neighbor probe sequence identification. TE Royce, JS
Rozowsky, MB Gerstein (2007) Nucleic Acids Res

* Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution. D Zheng, A Frankish, R Baertsch, P
Kapranov, A Reymond, SW Choo, Y Lu, F Denoeud, SE Antonarakis, M
Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J Harrow, MB Gerstein
(2007) Genome Res 17: 839-51.

* Statistical analysis of the genomic distribution and correlation of
regulatory elements in the ENCODE regions. ZD Zhang, A Paccanaro, Y
Fu, S Weissman, Z Weng, J Chang, M Snyder, MB Gerstein (2007) Genome
Res 17: 787-97.

The DART classification of unannotated transcription within the ENCODE
regions: associating transcription with known and novel loci. JS
Rozowsky, D Newburger, F Sayward, J Wu, G Jordan, JO Korbel, U
Nagalakshmi, J Yang, D Zheng, R Guigo, TR Gingeras, S Weissman, P
Miller, M Snyder, MB Gerstein (2007) Genome Res 17: 732-45.

* What is a gene, post-ENCODE? History and updated definition. MB
Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O
Emanuelsson, ZD Zhang, S Weissman, M Snyder (2007) Genome Res 17:
669-81.

Friday, August 24, 2007

Abstract for talk [I] at 65th Annual Pittsburgh Diffraction Conference

TITLE:

Surveying structural flexibility on a proteomic scale

Mark Gerstein

Yale University

Motions

An area of focus in the lab is analyzing small populations of structures in
terms of their detailed 3D-geometry and physical properties. Here, we try to
interpret macromolecular motions in terms of packing. We have set up a database
of macromolecular motions and coupled it with simulation tools to interpolate
between structural conformations; the database also has tools to predict likely
motions based on simple models, such as normal modes and localized hinges
connecting rigid domains. Part of this project involves devising a system for
characterizing motions in a highly standardized fashion. Our motions
classification scheme is motivated by the fact that protein interiors are packed
exceedingly tightly, and the tight packing can greatly constrains a protein's
mobility. We have developed tools for measuring and comparing the packing
efficiency at different interfaces (e.g. inter-domain, protein surface,
helix-helix, protein vs. RNA) using specialized geometric constructions (e.g.
Voronoi polyhedra).

http://molmovdb.org/

# The citation for the FlexOracle hinge predictor is SC Flores, MB Gerstein
(2007). BMC Bioinformatics 8: 215.

# Flores, Echols, Milburn, Hespenheide, Keating, Lu, Wells, Yu, Thorpe, Gerstein
(2006). Nucleic Acids Res. 34:D296-301.

SC Flores, LJ Lu, J Yang, N Carriero, MB Gerstein (2007). "Hinge Atlas: relating
protein sequence to sites of structural flexibility." BMC Bioinformatics 8: 167

http://papers.gersteinlab.org/papers/subject/motions/
http://papers.gersteinlab.org/papers/subject/volumes/

abstract for talk [I] at UPenn Biochem. & Biophys. 27-Sept-2007

Title: Human Genome Annotation, Focusing on Intergenic Regions

A central problem for 21st century science will be the analysis and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils) in the genome. I will discuss a comprehensive pseudogene
identification pipeline and storage database we have built. This has
enabled use to identify >10K pseudogenes in the human and mouse
genomes and analyze their distribution with respect to age, protein
family, and chromosomal location. One interesting finding is the large
number of ribosomal pseudogenes in the human genome, with 80
functional ribosomal proteins giving rise to ~2,000 ribosomal protein
pseudogenes.

I will try to inter-relate our studies on pseudogenes with those on
tiling arrays, which enable one to comprehensively probe the activity
of intergenic regions. Through this work we have been able to annotate
regulatory sites and regions of unannotated transcription in the
genome.

At the end I will bring these together, trying to assess the
biochemical activity of pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation and
tiling arrays -- i.e. the construction of annotation pipelines,
developing algorithms for optimal tiling, and refining approaches for
scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

References (4 most relevant are starred with "*")

Millions of years of evolution preserved: a comprehensive catalog of
the processed pseudogenes in the human genome. Z Zhang, PM Harrison,
Y Liu, M Gerstein (2003) Genome Res 13: 2541-58.

Patterns of nucleotide substitution, insertion and deletion in the
human genome inferred from pseudogenes. Z Zhang, M Gerstein (2003)
Nucleic Acids Res 31: 5338-48.

The ambiguous boundary between genes and pseudogenes: the dead rise
up, or do they? D Zheng, MB Gerstein (2007) Trends Genet

Pseudogene.org: a comprehensive database and comparison platform for
pseudogene annotation. JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero,
P Cayting, P Harrrison, M Gerstein (2007) Nucleic Acids Res 35:
D55-60.

A computational approach for identifying pseudogenes in the ENCODE
regions. D Zheng, MB Gerstein (2006) Genome Biol 7 Suppl 1: S13.1-10.

* The real life of pseudogenes. M Gerstein, D Zheng (2006) Sci Am 295:
48-55.

PseudoPipe: an automated pseudogene identification pipeline. Z Zhang,
N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006)
Bioinformatics 22: 1437-9.

Toward a universal microarray: prediction of gene expression through
nearest-neighbor probe sequence identification. TE Royce, JS
Rozowsky, MB Gerstein (2007) Nucleic Acids Res

* Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution. D Zheng, A Frankish, R Baertsch, P
Kapranov, A Reymond, SW Choo, Y Lu, F Denoeud, SE Antonarakis, M
Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J Harrow, MB Gerstein
(2007) Genome Res 17: 839-51.

* Statistical analysis of the genomic distribution and correlation of
regulatory elements in the ENCODE regions. ZD Zhang, A Paccanaro, Y
Fu, S Weissman, Z Weng, J Chang, M Snyder, MB Gerstein (2007) Genome
Res 17: 787-97.

The DART classification of unannotated transcription within the ENCODE
regions: associating transcription with known and novel loci. JS
Rozowsky, D Newburger, F Sayward, J Wu, G Jordan, JO Korbel, U
Nagalakshmi, J Yang, D Zheng, R Guigo, TR Gingeras, S Weissman, P
Miller, M Snyder, MB Gerstein (2007) Genome Res 17: 732-45.

* What is a gene, post-ENCODE? History and updated definition. MB
Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O
Emanuelsson, ZD Zhang, S Weissman, M Snyder (2007) Genome Res 17:
669-81.

Thursday, August 2, 2007

Abstract for talk [I] at DE Shaw 10 August 2007

TITLE:

Computational Proteomics: Networks & Structures

Mark Gerstein

Yale University

Motions

An area of focus in the lab is analyzing small populations of structures in
terms of their detailed 3D-geometry and physical properties. Here, we try to
interpret macromolecular motions in terms of packing. We have set up a database
of macromolecular motions and coupled it with simulation tools to interpolate
between structural conformations; the database also has tools to predict likely
motions based on simple models, such as normal modes and localized hinges
connecting rigid domains. Part of this project involves devising a system for
characterizing motions in a highly standardized fashion. Our motions
classification scheme is motivated by the fact that protein interiors are packed
exceedingly tightly, and the tight packing can greatly constrains a protein's
mobility. We have developed tools for measuring and comparing the packing
efficiency at different interfaces (e.g. inter-domain, protein surface,
helix-helix, protein vs. RNA) using specialized geometric constructions (e.g.
Voronoi polyhedra).

http://molmovdb.org/

# The citation for the FlexOracle hinge predictor is SC Flores, MB Gerstein
(2007). BMC Bioinformatics 8: 215.

# Flores, Echols, Milburn, Hespenheide, Keating, Lu, Wells, Yu, Thorpe, Gerstein
(2006). Nucleic Acids Res. 34:D296-301.

SC Flores, LJ Lu, J Yang, N Carriero, MB Gerstein (2007). "Hinge Atlas: relating
protein sequence to sites of structural flexibility." BMC Bioinformatics 8: 167

http://papers.gersteinlab.org/papers/subject/motions/
http://papers.gersteinlab.org/papers/subject/volumes/

Networks

My talk will be concerned with topics in proteomics, in particular
predicting protein function on a genomic scale. We approach this
through the prediction and analysis of biological networks, focusing
on protein-protein interaction and transcription-factor-target ones. I
will describe how these networks can be determined through integration
of many genomic features and how they can be analyzed in terms of
various simple topological statistics. In particular, I will discuss a
number of specific analyses: (1) Integrating gene expression data with
the regulatory network illuminates transient hubs; (2) Integration of
the protein interaction network with 3D molecular structures reveals
different types of hubs, depending on the number of interfaces
involved in interactions (one or many); (3) Analysis of betweenness in
biological networks reveals that this quantity is more strongly
correlated with essentially than degree; (4) Analysis of structure of
the regulatory network shows that it has a hierarchiel layout with the
"middle-managers" acting as information bottlenecks. (5) Development
of a useful web-based tools for the analysis of networks, TopNet and
tYNA.

http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org

http://papers.gersteinlab.org/papers/subject/interactions/

TopNet: a tool for comparing biological sub-networks, correlating
protein properties with topological statistics. H Yu, X Zhu, D
Greenbaum, J Karro, M Gerstein (2004) Nucleic Acids Res 32: 328-37.

Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
Teichmann, M Gerstein (2004) Nature 431: 308-12.

Annotation transfer between genomes: protein-protein interologs and
protein-DNA regulogs. H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han,
N Bertin, S Chung, M Vidal, M Gerstein (2004) Genome Res 14: 1107-18.

Integrated prediction of the helical membrane protein interactome in
yeast. Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.

Relating three-dimensional structures to protein networks provides
evolutionary insights. PM Kim, LJ Lu, Y Xia, MB Gerstein (2006)
Science 314: 1938-41.

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Positive Selection at the Protein Network Periphery: Evaluation in
Terms of Structural Constraints and Cellular Context. Philip M. Kim,
Jan O. Korbel and Mark B. Gerstein PNAS (in press)

The importance of bottlenecks in protein networks: correlation with
gene essentiality and expression dynamics. H Yu, PM Kim, E Sprecher,
V Trifonov, M Gerstein (2007) PLoS Comput Biol 3: e59.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

Saturday, June 30, 2007

Abstract for talk [I] at Engingeering Cell Biology (ECB) meeting to be held at MIT on Aug. 5-8, 2007

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with topics in proteomics, in particular
predicting protein function on a genomic scale. We approach this
through the prediction and analysis of biological networks, focusing
on protein-protein interaction and transcription-factor-target ones. I
will describe how these networks can be determined through integration
of many genomic features and how they can be analyzed in terms of
various simple topological statistics. In particular, I will discuss a
number of specific analyses: (1) Integrating gene expression data with
the regulatory network illuminates transient hubs; (2) Integration of
the protein interaction network with 3D molecular structures reveals
different types of hubs, depending on the number of interfaces
involved in interactions (one or many); (3) Analysis of betweenness in
biological networks reveals that this quantity is more strongly
correlated with essentially than degree; (4) Analysis of structure of
the regulatory network shows that it has a hierarchiel layout with the
"middle-managers" acting as information bottlenecks. (5) Development
of a useful web-based tools for the analysis of networks, TopNet and
tYNA.

http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org

TopNet: a tool for comparing biological sub-networks, correlating
protein properties with topological statistics. H Yu, X Zhu, D
Greenbaum, J Karro, M Gerstein (2004) Nucleic Acids Res 32: 328-37.

Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
Teichmann, M Gerstein (2004) Nature 431: 308-12.

Annotation transfer between genomes: protein-protein interologs and
protein-DNA regulogs. H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han,
N Bertin, S Chung, M Vidal, M Gerstein (2004) Genome Res 14: 1107-18.

Integrated prediction of the helical membrane protein interactome in
yeast. Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.

Relating three-dimensional structures to protein networks provides
evolutionary insights. PM Kim, LJ Lu, Y Xia, MB Gerstein (2006)
Science 314: 1938-41.

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Positive Selection at the Protein Network Periphery: Evaluation in
Terms of Structural Constraints and Cellular Context. Philip M. Kim,
Jan O. Korbel and Mark B. Gerstein PNAS (in press)

The importance of bottlenecks in protein networks: correlation with
gene essentiality and expression dynamics. H Yu, PM Kim, E Sprecher,
V Trifonov, M Gerstein (2007) PLoS Comput Biol 3: e59.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

Saturday, June 23, 2007

Abstract for talk [I] at MSCBB: 1st Annual Midwestern Computational Biology and Bioinformatics Symposium

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with topics in proteomics, in particular
predicting protein function on a genomic scale. We approach this
through the prediction and analysis of biological networks, focusing
on protein-protein interaction and transcription-factor-target ones. I
will describe how these networks can be determined through integration
of many genomic features and how they can be analyzed in terms of
various simple topological statistics. In particular, I will discuss a
number of specific analyses: (1) Integrating gene expression data with
the regulatory network illuminates transient hubs; (2) Integration of
the protein interaction network with 3D molecular structures reveals
different types of hubs, depending on the number of interfaces
involved in interactions (one or many); (3) Analysis of betweenness in
biological networks reveals that this quantity is more strongly
correlated with essentially than degree; (4) Analysis of structure of
the regulatory network shows that it has a hierarchiel layout with the
"middle-managers" acting as information bottlenecks. (5) Development
of a useful web-based tools for the analysis of networks, TopNet and
tYNA.

http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org

TopNet: a tool for comparing biological sub-networks, correlating
protein properties with topological statistics. H Yu, X Zhu, D
Greenbaum, J Karro, M Gerstein (2004) Nucleic Acids Res 32: 328-37.

Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
Teichmann, M Gerstein (2004) Nature 431: 308-12.

Annotation transfer between genomes: protein-protein interologs and
protein-DNA regulogs. H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han,
N Bertin, S Chung, M Vidal, M Gerstein (2004) Genome Res 14: 1107-18.

Integrated prediction of the helical membrane protein interactome in
yeast. Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.

Relating three-dimensional structures to protein networks provides
evolutionary insights. PM Kim, LJ Lu, Y Xia, MB Gerstein (2006)
Science 314: 1938-41.

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Positive Selection at the Protein Network Periphery: Evaluation in
Terms of Structural Constraints and Cellular Context. Philip M. Kim,
Jan O. Korbel and Mark B. Gerstein PNAS (in press)

The importance of bottlenecks in protein networks: correlation with
gene essentiality and expression dynamics. H Yu, PM Kim, E Sprecher,
V Trifonov, M Gerstein (2007) PLoS Comput Biol 3: e59.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.


--
Mark.Gerstein@yale.edu * 203 432-6105 *

http://bioinfo.mbb.yale.edu

Saturday, June 2, 2007

abstract for talk [I] at CPI-2007

Title: Human Genome Annotation, Focusing on Intergenic Regions

A central problem for 21st century science will be the analysis and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils) in the genome. I will discuss a comprehensive pseudogene
identification pipeline and storage database we have built. This has
enabled use to identify >10K pseudogenes in the human and mouse
genomes and analyze their distribution with respect to age, protein
family, and chromosomal location. One interesting finding is the large
number of ribosomal pseudogenes in the human genome, with 80
functional ribosomal proteins giving rise to ~2,000 ribosomal protein
pseudogenes.

I will try to inter-relate our studies on pseudogenes with those on
tiling arrays, which enable one to comprehensively probe the activity
of intergenic regions. At the end I will bring these together, trying
to assess the transcriptional activity of pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation and
tiling arrays -- i.e. the construction of annotation pipelines,
developing algorithms for optimal tiling, and refining approaches for
scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org


Comparative analysis of processed pseudogenes in the mouse and human
genomes.
Z Zhang, N Carriero, M Gerstein (2004) Trends Genet 20: 62-7.

Millions of years of evolution preserved: a comprehensive catalog of
the processed pseudogenes in the human genome.
Z Zhang, PM Harrison, Y Liu, M Gerstein (2003) Genome Res 13: 2541-58.

Patterns of nucleotide substitution, insertion and deletion in the
human genome inferred from pseudogenes. Z Zhang, M Gerstein (2003)
Nucleic Acids Res 31: 5338-48.

Integrated pseudogene annotation for human chromosome 22: evidence for
transcription.
D Zheng, Z Zhang, PM Harrison, J Karro, N Carriero, M Gerstein (2005)
J Mol Biol 349: 27-45.

P. Bertone, F. Schubert, V. Trifonov, J. Rozowsky, O. Emanuelsson,
J. Karro, M-Y Kao, M. Snyder, M. Gerstein. Design optimization methods
for genomic DNA tiling arrays. Genome Research (in press).

TE Royce, JS Rozowsky, P Bertone, M Samanta, V Stolc, S Weissman, M
Snyder, M Gerstein (2005). "Issues in the analysis of oligonucleotide
tiling microarrays for transcript mapping." Trends Genet 21: 466-75.

Pseudogenes in the ENCODE Regions: Consensus Annotation, Analysis of
Transcription and Evolution
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J
Harrow, MB Gerstein (in press) Genome Research.

The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they?
D Zheng, MB Gerstein (2007) Trends Genet

Pseudogene.org: a comprehensive database and comparison platform for pseudogene
annotation.
JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M
Gerstein (2007) Nucleic Acids Res 35: D55-60.

A computational approach for identifying pseudogenes in the ENCODE regions.
D Zheng, MB Gerstein (2006) Genome Biol 7 Suppl 1: S13.1-10.

The real life of pseudogenes.
M Gerstein, D Zheng (2006) Sci Am 295: 48-55.

PseudoPipe: an automated pseudogene identification pipeline.
Z Zhang, N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006)
Bioinformatics 22: 1437-9.

Abstract for talk [I] at NetSci2007

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein
Yale University

My talk will be concerned with topics in proteomics, in particular
predicting protein function on a genomic scale. We approach this
through the prediction and analysis of biological networks, focusing
on protein-protein interaction and transcription-factor-target ones. I
will describe how these networks can be determined through integration
of many genomic features and how they can be analyzed in terms of
various simple topological statistics. In particular, I will discuss a
number of specific analyses: (1) Integrating gene expression data with
the regulatory network illuminates transient hubs; (2) Integration of
the protein interaction network with 3D molecular structures reveals
different types of hubs, depending on the number of interfaces
involved in interactions (one or many); (3) Analysis of betweenness in
biological networks reveals that this quantity is more strongly
correlated with essentially than degree; (4) Analysis of structure of
the regulatory network shows that it has a hierarchiel layout with the
"middle-managers" acting as information bottlenecks. (5) Development
of a useful web-based tools for the analysis of networks, TopNet and
tYNA.

http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org

TopNet: a tool for comparing biological sub-networks, correlating protein
properties with topological statistics.
H Yu, X Zhu, D Greenbaum, J Karro, M Gerstein (2004) Nucleic Acids Res 32: 328-37.

Genomic analysis of regulatory network dynamics reveals large topological changes.
NM Luscombe, MM Babu, H Yu, M Snyder, SA Teichmann, M Gerstein (2004)
Nature 431: 308-12.

Annotation transfer between genomes: protein-protein interologs and protein-DNA
regulogs.
H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han, N Bertin, S Chung, M Vidal,
M Gerstein (2004) Genome Res 14: 1107-18.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A

Integrated prediction of the helical membrane protein interactome in yeast.
Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.

Relating three-dimensional structures to protein networks provides evolutionary
insights.
PM Kim, LJ Lu, Y Xia, MB Gerstein (2006) Science 314: 1938-41.

The tYNA platform for comparative interactomics: a web tool for managing,
comparing and mining multiple networks.
KY Yip, H Yu, PM Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

--
Mark.Gerstein@yale.edu * 203 432-6105 *

http://bioinfo.mbb.yale.edu

abstract for talk [I] at Cistrome2007

Title: Human Genome Annotation, Focussing on Intergenic Regions

A central problem for 21st century science will be the analysis and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils) in the genome. I will discuss a comprehensive pseudogene
identification pipeline and storage database we have built. This has
enabled use to identify >10K pseudogenes in the human and mouse
genomes and analyze their distribution with respect to age, protein
family, and chromosomal location. One interesting finding is the large
number of ribosomal pseudogenes in the human genome, with 80
functional ribosomal proteins giving rise to ~2,000 ribosomal protein
pseudogenes.

I will try to inter-relate our studies on pseudogenes with those on
tiling arrays, which enable one to comprehensively probe the activity
of intergenic regions. At the end I will bring these together, trying
to assess the transcriptional activity of pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation and
tiling arrays -- i.e. the construction of annotation pipelines,
developing algorithms for optimal tiling, and refining approaches for
scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org


Comparative analysis of processed pseudogenes in the mouse and human
genomes.
Z Zhang, N Carriero, M Gerstein (2004) Trends Genet 20: 62-7.

Millions of years of evolution preserved: a comprehensive catalog of
the processed pseudogenes in the human genome.
Z Zhang, PM Harrison, Y Liu, M Gerstein (2003) Genome Res 13: 2541-58.

Patterns of nucleotide substitution, insertion and deletion in the
human genome inferred from pseudogenes. Z Zhang, M Gerstein (2003)
Nucleic Acids Res 31: 5338-48.

Integrated pseudogene annotation for human chromosome 22: evidence for
transcription.
D Zheng, Z Zhang, PM Harrison, J Karro, N Carriero, M Gerstein (2005)
J Mol Biol 349: 27-45.

P. Bertone, F. Schubert, V. Trifonov, J. Rozowsky, O. Emanuelsson,
J. Karro, M-Y Kao, M. Snyder, M. Gerstein. Design optimization methods
for genomic DNA tiling arrays. Genome Research (in press).

TE Royce, JS Rozowsky, P Bertone, M Samanta, V Stolc, S Weissman, M
Snyder, M Gerstein (2005). "Issues in the analysis of oligonucleotide
tiling microarrays for transcript mapping." Trends Genet 21: 466-75.

Pseudogenes in the ENCODE Regions: Consensus Annotation, Analysis of
Transcription and Evolution
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J
Harrow, MB Gerstein (in press) Genome Research.

The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they?
D Zheng, MB Gerstein (2007) Trends Genet

Pseudogene.org: a comprehensive database and comparison platform for pseudogene
annotation.
JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M
Gerstein (2007) Nucleic Acids Res 35: D55-60.

A computational approach for identifying pseudogenes in the ENCODE regions.
D Zheng, MB Gerstein (2006) Genome Biol 7 Suppl 1: S13.1-10.

The real life of pseudogenes.
M Gerstein, D Zheng (2006) Sci Am 295: 48-55.

PseudoPipe: an automated pseudogene identification pipeline.
Z Zhang, N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006)
Bioinformatics 22: 1437-9.

Sunday, May 20, 2007

Abstract for talk [I] at Brown

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

N Luscombe, Y Xia, H Yu, R Jansen, L Lu, Y Yip, P Kim, S Douglas, A Paccnarro

Yale University

My talk will be concerned with topics in proteomics, in particular predicting
protein function on a genomic scale. We approach this through the prediction and
analysis of biological networks -- both of protein-protein interactions and
transcription-factor-target relationships. I will describe how these networks
can be determined through integration of many genomic features and how
they can be analyzed in terms of various simple topological statistics. I will
discuss the accuracy of various reconstructed quantities.

http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org

TopNet: a tool for comparing biological sub-networks, correlating protein
properties with topological statistics.
H Yu, X Zhu, D Greenbaum, J Karro, M Gerstein (2004) Nucleic Acids Res 32: 328-37.

Genomic analysis of regulatory network dynamics reveals large topological changes.
NM Luscombe, MM Babu, H Yu, M Snyder, SA Teichmann, M Gerstein (2004)
Nature 431: 308-12.

Annotation transfer between genomes: protein-protein interologs and protein-DNA
regulogs.
H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han, N Bertin, S Chung, M Vidal,
M Gerstein (2004) Genome Res 14: 1107-18.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A

Integrated prediction of the helical membrane protein interactome in yeast.
Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.

Relating three-dimensional structures to protein networks provides evolutionary
insights.
PM Kim, LJ Lu, Y Xia, MB Gerstein (2006) Science 314: 1938-41.

The tYNA platform for comparative interactomics: a web tool for managing,
comparing and mining multiple networks.
KY Yip, H Yu, PM Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

--
Mark.Gerstein@yale.edu * 203 432-6105 *

http://bioinfo.mbb.yale.edu

Saturday, May 19, 2007

abstract for talk [I] at McGill

Title: Human Genome Annotation

A central problem for 21st century science will be the analysis and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils) in the genome. I will discuss a comprehensive pseudogene
identification pipeline and storage database we have built. This has
enabled use to identify >10K pseudogenes in the human and mouse
genomes and analyze their distribution with respect to age, protein
family, and chromosomal location. One interesting finding is the large
number of ribosomal pseudogenes in the human genome, with 80
functional ribosomal proteins giving rise to ~2,000 ribosomal protein
pseudogenes.

I will try to inter-relate our studies on pseudogenes with those on
tiling arrays, which enable one to comprehensively probe the activity
of intergenic regions. At the end I will bring these together, trying
to assess the transcriptional activity of pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation and
tiling arrays -- i.e. the construction of annotation pipelines,
developing algorithms for optimal tiling, and refining approaches for
scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org


Comparative analysis of processed pseudogenes in the mouse and human
genomes.
Z Zhang, N Carriero, M Gerstein (2004) Trends Genet 20: 62-7.

Millions of years of evolution preserved: a comprehensive catalog of
the processed pseudogenes in the human genome.
Z Zhang, PM Harrison, Y Liu, M Gerstein (2003) Genome Res 13: 2541-58.

Patterns of nucleotide substitution, insertion and deletion in the
human genome inferred from pseudogenes. Z Zhang, M Gerstein (2003)
Nucleic Acids Res 31: 5338-48.

Integrated pseudogene annotation for human chromosome 22: evidence for
transcription.
D Zheng, Z Zhang, PM Harrison, J Karro, N Carriero, M Gerstein (2005)
J Mol Biol 349: 27-45.

P. Bertone, F. Schubert, V. Trifonov, J. Rozowsky, O. Emanuelsson,
J. Karro, M-Y Kao, M. Snyder, M. Gerstein. Design optimization methods
for genomic DNA tiling arrays. Genome Research (in press).

TE Royce, JS Rozowsky, P Bertone, M Samanta, V Stolc, S Weissman, M
Snyder, M Gerstein (2005). "Issues in the analysis of oligonucleotide
tiling microarrays for transcript mapping." Trends Genet 21: 466-75.

Pseudogenes in the ENCODE Regions: Consensus Annotation, Analysis of
Transcription and Evolution
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J
Harrow, MB Gerstein (in press) Genome Research.

The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they?
D Zheng, MB Gerstein (2007) Trends Genet

Pseudogene.org: a comprehensive database and comparison platform for pseudogene
annotation.
JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M
Gerstein (2007) Nucleic Acids Res 35: D55-60.

A computational approach for identifying pseudogenes in the ENCODE regions.
D Zheng, MB Gerstein (2006) Genome Biol 7 Suppl 1: S13.1-10.

The real life of pseudogenes.
M Gerstein, D Zheng (2006) Sci Am 295: 48-55.

PseudoPipe: an automated pseudogene identification pipeline.
Z Zhang, N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006)
Bioinformatics 22: 1437-9.


Monday, January 22, 2007

Abstract for nyas nyas-20060126

See http://www.nyas.org/ebrief/miniEB.asp?ebriefID=559
for an e-briefing

Title: Human Genome Annotation

A central problem for 21st century science will be the analysis and understanding of the human genome. My talk will be concerned with topics within this area, in particular annotating pseudogenes (protein fossils) in the genome. I will discuss a comprehensive pseudogene identification pipeline and storage database we have built. This has enabled use to identify >10K pseudogenes in the human and mouse genomes and analyze their distribution with respect to age, protein family, and chromosomal location. One interesting finding is the large number of ribosomal pseudogenes in the human genome, with 80 functional ribosomal proteins giving rise to ~2,000 ribosomal protein pseudogenes. I will try to inter-relate our studies on pseudogenes with those on tiling arrays, which enable one to comprehensively probe the activity of intergenic regions. At the end I will bring these together, trying to assess the transcriptional activity of pseudogenes. Throughout I will try to introduce some of the computational algorithms and approaches that are required for genome annotation and tiling arrays -- i.e. the construction of annotation pipelines, developing algorithms for optimal tiling, and refining approaches for scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

Comparative analysis of processed pseudogenes in the mouse and human genomes. Z Zhang, N Carriero, M Gerstein (2004) Trends Genet 20: 62-7. Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Z Zhang, PM Harrison, Y Liu, M Gerstein (2003) Genome Res 13: 2541-58. Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. Z Zhang, M Gerstein (2003) Nucleic Acids Res 31: 5338-48. Integrated pseudogene annotation for human chromosome 22: evidence for transcription. D Zheng, Z Zhang, PM Harrison, J Karro, N Carriero, M Gerstein (2005) J Mol Biol 349: 27-45. P. Bertone, F. Schubert, V. Trifonov, J. Rozowsky, O. Emanuelsson, J. Karro, M-Y Kao, M. Snyder, M. Gerstein. Design optimization methods for genomic DNA tiling arrays. Genome Research (in press). TE Royce, JS Rozowsky, P Bertone, M Samanta, V Stolc, S Weissman, M Snyder, M Gerstein (2005). "Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping." Trends Genet 21: 466-75.

Monday, January 15, 2007

Abstract for orfeome06-20061116

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

N Luscombe, Y Xia, H Yu, R Jansen, L Lu, Y Yip, P Kim, S Douglas, A Paccnarro

Yale University

My talk will be concerned with topics in proteomics, in particular predicting protein function on a genomic scale. We approach this through the prediction and analysis of biological networks -- both of protein-protein interactions and transcription-factor-target relationships. I will describe how these networks can be determined through integration of many genomic features and how they can be analyzed in terms of various simple topological statistics. I will discuss the accuracy of various reconstructed quantities.

http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org

TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics.
H Yu, X Zhu, D Greenbaum, J Karro, M Gerstein (2004) Nucleic Acids Res 32: 328-37.

Genomic analysis of regulatory network dynamics reveals large topological changes.
NM Luscombe, MM Babu, H Yu, M Snyder, SA Teichmann, M Gerstein (2004) Nature 431: 308-12.

Annotation transfer between genomes: protein-protein interologs and protein-DNA
regulogs.
H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han, N Bertin, S Chung, M Vidal, M Gerstein (2004) Genome Res 14: 1107-18.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A

Integrated prediction of the helical membrane protein interactome in yeast.
Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.

Abstract for talk [I] at ENAR

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

N Luscombe, Y Xia, H Yu, R Jansen, L Lu, Y Yip, P Kim, S Douglas, A Paccnarro

Yale University

My talk will be concerned with topics in proteomics, in particular predicting protein function on a genomic scale. We approach this through the prediction and analysis of biological networks -- both of protein-protein interactions and transcription-factor-target relationships. I will describe how these networks can be determined through integration of many genomic features and how they can be analyzed in terms of various simple topological statistics. I will
discuss the accuracy of various reconstructed quantities.

http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org

TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics.
H Yu, X Zhu, D Greenbaum, J Karro, M Gerstein (2004) Nucleic Acids Res 32: 328-37.

Genomic analysis of regulatory network dynamics reveals large topological changes.
NM Luscombe, MM Babu, H Yu, M Snyder, SA Teichmann, M Gerstein (2004) Nature 431: 308-12.

Annotation transfer between genomes: protein-protein interologs and protein-DNA
regulogs.
H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han, N Bertin, S Chung, M Vidal, M Gerstein (2004) Genome Res 14: 1107-18.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A

Integrated prediction of the helical membrane protein interactome in yeast.
Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.