Sunday, October 19, 2008

abstract for talk at Robert Cedergren Colloquium on 3-Nov-2008 [I:CEDERGREN]

Title: Human Genome Annotation

A central problem for 21st century science will be the annotation and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils), binding sites, CNVs, and novel transcribed regions in the
genome. Much of this work has been carried out in the framework of the
ENCODE and modENCODE projects.

In particular, I will discuss how we identify regulatory regions and
novel, non-genic transcribed regions in the genome based on processing
of tiling array and next-generation sequencing experiments. I will
further discuss how we cluster together groups of binding sites and
novel transcribed regions.

Next, I will discuss a comprehensive pseudogene identification
pipeline and storage database we have built. This has enabled us to
identify >10K pseudogenes in the human and mouse genomes and analyze
their distribution with respect to age, protein family, and
chromosomal location. I will try to inter-relate our studies on
pseudogenes with those on transcribed regions. At the end I will bring
these together, trying to assess the transcriptional activity of
pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation --
e.g., the construction of annotation pipelines, developing algorithms
for optimal tiling, and refining approaches for scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

Toward a universal microarray: prediction of gene expression through
nearest-neighbor probe sequence identification.
TE Royce, JS Rozowsky, MB Gerstein (2007) Nucleic Acids Res 35: e99.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J
Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of regulatory
elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB Gerstein
(2007) Genome Res 17: 787-97.

What is a gene, post-ENCODE? History and updated definition.
MB Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O Emanuelsson, ZD
Zhang, S Weissman, M Snyder (2007) Genome Res 17: 669-81.

Systematic prediction and validation of breakpoints associated with copy-number
variants in the human genome.
JO Korbel, AE Urban, F Grubert, J Du, TE Royce, P Starr, G Zhong, BS Emanuel, SM
Weissman, M Snyder, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 10110-5.

Analysis of Copy Number Variants and Segmental Duplications in the Human Genome:
Evidence for a Change in the Process of Formation Mechanism in Recent
Evolutionary History
Philip M. Kim Hugo Y. K. Lam Alexander E. Urban, Jan Korbel, Xueying Chen,
Michael Snyder and Mark B. Gerstein
Genome Res. (in press, 2008)

Modeling ChIP sequencing in silico with applications.
ZD Zhang, J Rozowsky, M Snyder, J Chang, M Gerstein (2008) PLoS Comput Biol 4:
e1000158.

Pseudofam: The Pseudogene Families Database
Lam, Hugo; Khurana, Ekta; Fang, Gang; Cayting, Philip; Carriero, Nicholas;
Cheung, Kei-Hoi; Gerstein, Mark
NAR (in press, 2009)

MSB: A Mean-shift-based Approach for the Analysis of Structural Variation in the
Genome
Lu-yong Wang,Alexej Abyzov, Jan O. Korbel, Michael Snyder, Mark Gerstein
Gen. Res. (in press, 2008)

Mismatch oligonucleotides in human and yeast: guidelines for probe
design on tiling microarrays
Michael Seringhaus, Joel Rozowsky, Thomas Royce, Ugrappa Nagalkshmi,
Justin Jee, Michael Snyder and Mark Gerstein
BMC Genomics (submitted)

Sunday, October 12, 2008

abstract for talk at RECOMB Regulatory Genomics 2008 on ~31-Oct-2008 [I:RECOMB08SAT] - vers 2

Title: Human Genome Annotation, Focusing on Intergenic Regions

A central problem for 21st century science will be the annotation and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils), binding sites, CNVs, and novel transcribed regions in the
genome. Much of this work has been carried out in the framework of the
ENCODE and modENCODE projects.

In particular, I will discuss how we identify regulatory regions and
novel, non-genic transcribed regions in the genome based on processing
of tiling array and next-generation sequencing experiments. I will
further discuss how we cluster together groups of binding sites and
novel transcribed regions.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation --
e.g., the construction of annotation pipelines, developing algorithms
for optimal tiling, and refining approaches for scoring microarrays.

http://gersteinlab.org
http://pseudogene.org
http://tiling.gersteinlab.org

Toward a universal microarray: prediction of gene expression through
nearest-neighbor probe sequence identification.
TE Royce, JS Rozowsky, MB Gerstein (2007) Nucleic Acids Res 35: e99.

Statistical analysis of the genomic distribution and correlation of regulatory
elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB Gerstein
(2007) Genome Res 17: 787-97.

The DART classification of unannotated transcription within the ENCODE regions:
associating transcription with known and novel loci.
JS Rozowsky, D Newburger, F Sayward, J Wu, G Jordan, JO Korbel, U Nagalakshmi, J
Yang, D Zheng, R Guigo, TR Gingeras, S Weissman, P Miller, M Snyder, MB Gerstein
(2007) Genome Res 17: 732-45.

What is a gene, post-ENCODE? History and updated definition.
MB Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O Emanuelsson, ZD
Zhang, S Weissman, M Snyder (2007) Genome Res 17: 669-81.

Analysis of Copy Number Variants and Segmental Duplications in the Human Genome:
Evidence for a Change in the Process of Formation Mechanism in Recent
Evolutionary History
Philip M. Kim Hugo Y. K. Lam Alexander E. Urban, Jan Korbel, Xueying Chen,
Michael Snyder and Mark B. Gerstein
Genome Res. (in press, 2008)

Modeling ChIP sequencing in silico with applications.
ZD Zhang, J Rozowsky, M Snyder, J Chang, M Gerstein (2008) PLoS Comput Biol 4:
e1000158.

Thursday, October 9, 2008

abstract for talk at RECOMB Regulatory Genomics 2008 on ~31-Oct-2008 [I:RECOMB08SAT]

Title: Human Genome Annotation

A central problem for 21st century science will be the annotation and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils), binding sites, CNVs, and novel transcribed regions in the
genome. Much of this work has been carried out in the framework of the
ENCODE and modENCODE projects.

In particular, I will discuss how we identify regulatory regions and
novel, non-genic transcribed regions in the genome based on processing
of tiling array and next-generation sequencing experiments. I will
further discuss how we cluster together groups of binding sites and
novel transcribed regions.

Next, I will discuss a comprehensive pseudogene identification
pipeline and storage database we have built. This has enabled us to
identify >10K pseudogenes in the human and mouse genomes and analyze
their distribution with respect to age, protein family, and
chromosomal location. I will try to inter-relate our studies on
pseudogenes with those on transcribed regions. At the end I will bring
these together, trying to assess the transcriptional activity of
pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation --
e.g., the construction of annotation pipelines, developing algorithms
for optimal tiling, and refining approaches for scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they?
D Zheng, MB Gerstein (2007) Trends Genet 23: 219-24.

Pseudogene.org: a comprehensive database and comparison platform for pseudogene
annotation.
JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M
Gerstein (2007) Nucleic Acids Res 35: D55-60.

The real life of pseudogenes.
M Gerstein, D Zheng (2006) Sci Am 295: 48-55.

Toward a universal microarray: prediction of gene expression through
nearest-neighbor probe sequence identification.
TE Royce, JS Rozowsky, MB Gerstein (2007) Nucleic Acids Res 35: e99.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J
Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of regulatory
elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB Gerstein
(2007) Genome Res 17: 787-97.

The DART classification of unannotated transcription within the ENCODE regions:
associating transcription with known and novel loci.
JS Rozowsky, D Newburger, F Sayward, J Wu, G Jordan, JO Korbel, U Nagalakshmi, J
Yang, D Zheng, R Guigo, TR Gingeras, S Weissman, P Miller, M Snyder, MB Gerstein
(2007) Genome Res 17: 732-45.

What is a gene, post-ENCODE? History and updated definition.
MB Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O Emanuelsson, ZD
Zhang, S Weissman, M Snyder (2007) Genome Res 17: 669-81.

Systematic prediction and validation of breakpoints associated with copy-number
variants in the human genome.
JO Korbel, AE Urban, F Grubert, J Du, TE Royce, P Starr, G Zhong, BS Emanuel, SM
Weissman, M Snyder, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 10110-5.

Analysis of Copy Number Variants and Segmental Duplications in the Human Genome:
Evidence for a Change in the Process of Formation Mechanism in Recent
Evolutionary History
Philip M. Kim Hugo Y. K. Lam Alexander E. Urban, Jan Korbel, Xueying Chen,
Michael Snyder and Mark B. Gerstein
Genome Res. (in press, 2008)

Tuesday, June 3, 2008

Abstract for talk [I:BIOMARKER] at Biomarker Discovery Summit, 30 Sep and 1 Oct 2008

TITLE:

Using Networks to Integrate Omic and Semantic Data: Towards Understanding
Protein Function on a Genome-scale

Mark Gerstein

Yale University

My talk will be concerned with topics in proteomics, in particular
predicting protein function on a genomic scale. We approach this
through the prediction and analysis of biological networks, focusing
on protein-protein interaction and transcription-factor-target ones. I
will describe how these networks can be determined through integration
of many genomic features (including those derived from using the semantic web
and text mining) and how they can be analyzed in terms of
various simple topological statistics. In particular, I will discuss: (1)
Integrating gene expression data with
the regulatory network illuminates transient hubs; (2) Integration of
the protein interaction network with 3D molecular structures reveals
different types of hubs, depending on the number of interfaces
involved in interactions (one or many); (3) Analysis of betweenness in
biological networks reveals that this quantity is more strongly
correlated with essentially than degree; (4) Analysis of structure of
the regulatory network shows that it has a hierarchiel layout with the
"middle-managers" acting as information bottlenecks; (5) Development
of a useful web-based tools for the analysis of networks, PubNet and
tYNA; (6) Using known semantic web relationships as training sets to
improve biological query applications. And (7) using literature data to predict
protein interactions.

http://bioinfo.mbb.yale.edu
http://networks.gersteinlab.org

Integrated prediction of the helical membrane protein interactome in
yeast. Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.

Relating three-dimensional structures to protein networks provides
evolutionary insights. PM Kim, LJ Lu, Y Xia, MB Gerstein (2006)
Science 314: 1938-41.

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

The importance of bottlenecks in protein networks: correlation with
gene essentiality and expression dynamics. H Yu, PM Kim, E Sprecher,
V Trifonov, M Gerstein (2007) PLoS Comput Biol 3: e59.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

The role of disorder in interaction networks: a structural analysis.
PM Kim, A Sboner, Y Xia, M Gerstein (2008) Mol Syst Biol 4: 179.

Positive selection at the protein network periphery: evaluation in terms of
structural constraints and cellular context.
PM Kim, JO Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Leveraging the structure of the Semantic Web to enhance information retrieval
for proteomics.
A Smith, K Cheung, M Krauthammer, M Schultz, M Gerstein (2007) Bioinformatics
23: 3073-9.

LinkHub: a Semantic Web system that facilitates cross-database queries and
information retrieval in proteomics.
AK Smith, KH Cheung, KY Yip, M Schultz, MK Gerstein (2007) BMC Bioinformatics 8
Suppl 3: S5.

Data mining on the web.
A Smith, M Gerstein (2006) Science 314: 1682;

Sunday, June 1, 2008

abstract for talk at ISMB SIG on Genome-scale Pattern Analysis in the Post-ENCODE Era on 21-Jul-08 [I:ISMB08-SIG]

Title: Human Genome Annotation

A central problem for 21st century science will be the annotation and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils), binding sites, CNVs, and novel transcribed regions in the
genome. Much of this work has been carried out in the framework of the
ENCODE and modENCODE projects.

In particular, I will discuss how we identify regulatory regions and
novel, non-genic transcribed regions in the genome based on processing
of tiling array and next-generation sequencing experiments. I will
further discuss how we cluster together groups of binding sites and
novel transcribed regions.

Next, I will discuss a comprehensive pseudogene identification
pipeline and storage database we have built. This has enabled us to
identify >10K pseudogenes in the human and mouse genomes and analyze
their distribution with respect to age, protein family, and
chromosomal location. I will try to inter-relate our studies on
pseudogenes with those on transcribed regions. At the end I will bring
these together, trying to assess the transcriptional activity of
pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation --
e.g., the construction of annotation pipelines, developing algorithms
for optimal tiling, and refining approaches for scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they?
D Zheng, MB Gerstein (2007) Trends Genet 23: 219-24.

Pseudogene.org: a comprehensive database and comparison platform for pseudogene
annotation.
JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M
Gerstein (2007) Nucleic Acids Res 35: D55-60.

The real life of pseudogenes.
M Gerstein, D Zheng (2006) Sci Am 295: 48-55.

Toward a universal microarray: prediction of gene expression through
nearest-neighbor probe sequence identification.
TE Royce, JS Rozowsky, MB Gerstein (2007) Nucleic Acids Res 35: e99.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J
Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of regulatory
elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB Gerstein
(2007) Genome Res 17: 787-97.

The DART classification of unannotated transcription within the ENCODE regions:
associating transcription with known and novel loci.
JS Rozowsky, D Newburger, F Sayward, J Wu, G Jordan, JO Korbel, U Nagalakshmi, J
Yang, D Zheng, R Guigo, TR Gingeras, S Weissman, P Miller, M Snyder, MB Gerstein
(2007) Genome Res 17: 732-45.

What is a gene, post-ENCODE? History and updated definition.
MB Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O Emanuelsson, ZD
Zhang, S Weissman, M Snyder (2007) Genome Res 17: 669-81.

Systematic prediction and validation of breakpoints associated with copy-number
variants in the human genome.
JO Korbel, AE Urban, F Grubert, J Du, TE Royce, P Starr, G Zhong, BS Emanuel, SM
Weissman, M Snyder, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 10110-5.

Wednesday, May 28, 2008

Abstract for talk [I:ISBM08PUB] The Future of Scientific Publishing at ISMB-2008 on 21 July 2008

TITLE:

Understanding Protein Function on a Genome-scale using Networks

The Future of Scientific Publishing: Mining Publications to Study the Structure
of Science

My talk will focus on a vision for Scientific Publishing: How we could
potentially mine all the information in publications, to glean new scientific
facts and to study the structure of science itself. I'll provide some
illustrations of the latter. I will also talk about some impediments to
realizing this vision and potential solutions.

Mark Gerstein
Yale University

http://bioinfo.mbb.yale.edu

Open access: taking full advantage of the content.
PE Bourne, JL Fink, M Gerstein (2008) PLoS Comput Biol 4: e1000037.

Uncovering trends in gene naming.
MR Seringhaus, PD Cayting, MB Gerstein (2008) Genome Biol 9: 401.

Structured digital abstract makes text mining easy.
M Gerstein, M Seringhaus, S Fields (2007) Nature 447: 142.

RNAi development.
M Gerstein, SM Douglas (2007) PLoS Comput Biol 3: e80.

Chemistry Nobel rich in structure.
M Seringhaus, M Gerstein (2007) Science 315: 40-1.

Data mining on the web.
A Smith, M Gerstein (2006) Science 314: 1682; author reply 1682.

Tools needed to navigate landscape of the genome.
M Gerstein (2006) Nature 440: 740.

PubNet: a flexible system for visualizing literature derived networks.
SM Douglas, GT Montelione, M Gerstein (2005) Genome Biol 6: R80.

Annotation of the human genome.
M Gerstein (2000) Science 288: 1590.

E-publishing on the Web: promises, pitfalls, and payoffs for bioinformatics.
M Gerstein (1999) Bioinformatics 15: 429-31.

Sunday, April 27, 2008

Abstract for talk [I:ISMB08-SS] Special Session at ISMB-2008 on 21 July 2008

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with topics in proteomics, in particular
predicting protein function on a genomic scale. We approach this
through the prediction and analysis of biological networks, focusing
on protein-protein interaction and transcription-factor-target ones. I
will describe how these networks can be determined through integration
of many genomic features and how they can be analyzed in terms of
various simple topological statistics. In particular, I will discuss a
number of specific analyses: (1) Integrating gene expression data with
the regulatory network illuminates transient hubs; (2) Integration of
the protein interaction network with 3D molecular structures reveals
different types of hubs, depending on the number of interfaces
involved in interactions (one or many); (3) Analysis of betweenness in
biological networks reveals that this quantity is more strongly
correlated with essentially than degree; (4) Analysis of structure of
the regulatory network shows that it has a hierarchiel layout with the
"middle-managers" acting as information bottlenecks. (5) Development
of a useful web-based tools for the analysis of networks, TopNet and
tYNA.

http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org

Integrated prediction of the helical membrane protein interactome in
yeast. Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.

Relating three-dimensional structures to protein networks provides
evolutionary insights. PM Kim, LJ Lu, Y Xia, MB Gerstein (2006)
Science 314: 1938-41.

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

The importance of bottlenecks in protein networks: correlation with
gene essentiality and expression dynamics. H Yu, PM Kim, E Sprecher,
V Trifonov, M Gerstein (2007) PLoS Comput Biol 3: e59.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

The role of disorder in interaction networks: a structural analysis.
PM Kim, A Sboner, Y Xia, M Gerstein (2008) Mol Syst Biol 4: 179.

Positive selection at the protein network periphery: evaluation in terms of
structural constraints and cellular context.
PM Kim, JO Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Wednesday, April 9, 2008

abstract for talk at Center For Computational Biology & Bioinformatics at Indiana U. on 28-Apr 2008 [I:IUPUI]

Title: Human Genome Annotation

A central problem for 21st century science will be the analysis and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils) in the genome. I will discuss a comprehensive pseudogene
identification pipeline and storage database we have built. This has
enabled use to identify >10K pseudogenes in the human and mouse
genomes and analyze their distribution with respect to age, protein
family, and chromosomal location. One interesting finding is the large
number of ribosomal pseudogenes in the human genome, with 80
functional ribosomal proteins giving rise to ~2,000 ribosomal protein
pseudogenes.

I will try to inter-relate our studies on pseudogenes with those on
tiling arrays, which enable one to comprehensively probe the activity
of intergenic regions. At the end I will bring these together, trying
to assess the transcriptional activity of pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation and
tiling arrays -- i.e. the construction of annotation pipelines,
developing algorithms for optimal tiling, and refining approaches for
scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

Millions of years of evolution preserved: a comprehensive catalog of
the processed pseudogenes in the human genome.
Z Zhang, PM Harrison, Y Liu, M Gerstein (2003) Genome Res 13: 2541-58.

The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they?
D Zheng, MB Gerstein (2007) Trends Genet 23: 219-24.

Pseudogene.org: a comprehensive database and comparison platform for pseudogene
annotation.
JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M
Gerstein (2007) Nucleic Acids Res 35: D55-60.

A computational approach for identifying pseudogenes in the ENCODE regions.
D Zheng, MB Gerstein (2006) Genome Biol 7 Suppl 1: S13.1-10.

The real life of pseudogenes.
M Gerstein, D Zheng (2006) Sci Am 295: 48-55.

PseudoPipe: an automated pseudogene identification pipeline.
Z Zhang, N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006)
Bioinformatics 22: 1437-9.

Toward a universal microarray: prediction of gene expression through
nearest-neighbor probe sequence identification.
TE Royce, JS Rozowsky, MB Gerstein (2007) Nucleic Acids Res 35: e99.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J
Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of regulatory
elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB Gerstein
(2007) Genome Res 17: 787-97.

The DART classification of unannotated transcription within the ENCODE regions:
associating transcription with known and novel loci.
JS Rozowsky, D Newburger, F Sayward, J Wu, G Jordan, JO Korbel, U Nagalakshmi, J
Yang, D Zheng, R Guigo, TR Gingeras, S Weissman, P Miller, M Snyder, MB Gerstein
(2007) Genome Res 17: 732-45.

What is a gene, post-ENCODE? History and updated definition.
MB Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O Emanuelsson, ZD
Zhang, S Weissman, M Snyder (2007) Genome Res 17: 669-81.

Systematic prediction and validation of breakpoints associated with copy-number
variants in the human genome.
JO Korbel, AE Urban, F Grubert, J Du, TE Royce, P Starr, G Zhong, BS Emanuel, SM
Weissman, M Snyder, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 10110-5.

Wednesday, February 27, 2008

abstract for talk at the Joint CMU-Pitt Ph.D. Program in Computational Biology on March 21, 2008 [I:CMU]

Hi,

Here's an abstract.

cheers, marK

###
### Mark Gerstein, PhD
### Albert Williams Professor of
### Biomedical Informatics,
### Molecular Biophysics & Biochemistry,
### Computer Science
### Yale University
###

http://bioinfo.mbb.yale.edu
###
### Mailing Address: MB&B, PO Box 208114
### New Haven, CT 06520-8114 USA
### For deliveries/fedex: Bass 432A, 266 Whitney Ave.
### Phone 203 432-6105, e-mail Mark.Gerstein@Yale.edu
### Main Fax 360 838-7861 [Dept. Fax 203 432-5175]
###

Title: Human Genome Annotation

A central problem for 21st century science will be the analysis and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils) in the genome. I will discuss a comprehensive pseudogene
identification pipeline and storage database we have built. This has
enabled use to identify >10K pseudogenes in the human and mouse
genomes and analyze their distribution with respect to age, protein
family, and chromosomal location. One interesting finding is the large
number of ribosomal pseudogenes in the human genome, with 80
functional ribosomal proteins giving rise to ~2,000 ribosomal protein
pseudogenes.

I will try to inter-relate our studies on pseudogenes with those on
tiling arrays, which enable one to comprehensively probe the activity
of intergenic regions. At the end I will bring these together, trying
to assess the transcriptional activity of pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation and
tiling arrays -- i.e. the construction of annotation pipelines,
developing algorithms for optimal tiling, and refining approaches for
scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

Millions of years of evolution preserved: a comprehensive catalog of
the processed pseudogenes in the human genome.
Z Zhang, PM Harrison, Y Liu, M Gerstein (2003) Genome Res 13: 2541-58.

The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they?
D Zheng, MB Gerstein (2007) Trends Genet 23: 219-24.

Pseudogene.org: a comprehensive database and comparison platform for pseudogene
annotation.
JE Karro, Y Yan, D Zheng, Z Zhang, N Carriero, P Cayting, P Harrrison, M
Gerstein (2007) Nucleic Acids Res 35: D55-60.

A computational approach for identifying pseudogenes in the ENCODE regions.
D Zheng, MB Gerstein (2006) Genome Biol 7 Suppl 1: S13.1-10.

The real life of pseudogenes.
M Gerstein, D Zheng (2006) Sci Am 295: 48-55.

PseudoPipe: an automated pseudogene identification pipeline.
Z Zhang, N Carriero, D Zheng, J Karro, PM Harrison, M Gerstein (2006)
Bioinformatics 22: 1437-9.

Toward a universal microarray: prediction of gene expression through
nearest-neighbor probe sequence identification.
TE Royce, JS Rozowsky, MB Gerstein (2007) Nucleic Acids Res 35: e99.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J
Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of regulatory
elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB Gerstein
(2007) Genome Res 17: 787-97.

The DART classification of unannotated transcription within the ENCODE regions:
associating transcription with known and novel loci.
JS Rozowsky, D Newburger, F Sayward, J Wu, G Jordan, JO Korbel, U Nagalakshmi, J
Yang, D Zheng, R Guigo, TR Gingeras, S Weissman, P Miller, M Snyder, MB Gerstein
(2007) Genome Res 17: 732-45.

What is a gene, post-ENCODE? History and updated definition.
MB Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O Emanuelsson, ZD
Zhang, S Weissman, M Snyder (2007) Genome Res 17: 669-81.

Systematic prediction and validation of breakpoints associated with copy-number
variants in the human genome.
JO Korbel, AE Urban, F Grubert, J Du, TE Royce, P Starr, G Zhong, BS Emanuel, SM
Weissman, M Snyder, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 10110-5.

Tuesday, February 26, 2008

Abstract for talk [I:USHUPO] at US HUPO on 19-Feb-08

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with topics in proteomics, in particular
predicting protein function on a genomic scale. We approach this
through the prediction and analysis of biological networks, focusing
on protein-protein interaction and transcription-factor-target ones. I
will describe how these networks can be determined through integration
of many genomic features and how they can be analyzed in terms of
various simple topological statistics. In particular, I will discuss a
number of specific analyses: (1) Integrating gene expression data with
the regulatory network illuminates transient hubs; (2) Integration of
the protein interaction network with 3D molecular structures reveals
different types of hubs, depending on the number of interfaces
involved in interactions (one or many); (3) Analysis of betweenness in
biological networks reveals that this quantity is more strongly
correlated with essentially than degree; (4) Analysis of structure of
the regulatory network shows that it has a hierarchiel layout with the
"middle-managers" acting as information bottlenecks. (5) Development
of a useful web-based tools for the analysis of networks, TopNet and
tYNA.

http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org

Integrated prediction of the helical membrane protein interactome in
yeast. Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.

Relating three-dimensional structures to protein networks provides
evolutionary insights. PM Kim, LJ Lu, Y Xia, MB Gerstein (2006)
Science 314: 1938-41.

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Positive Selection at the Protein Network Periphery: Evaluation in
Terms of Structural Constraints and Cellular Context. Philip M. Kim,
Jan O. Korbel and Mark B. Gerstein PNAS 104: 20274-9

The importance of bottlenecks in protein networks: correlation with
gene essentiality and expression dynamics. H Yu, PM Kim, E Sprecher,
V Trifonov, M Gerstein (2007) PLoS Comput Biol 3: e59.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

Saturday, February 9, 2008

Abstract for talk [I:UPENN08] at UPenn on 22 Feb. 2008

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with topics in proteomics, in particular
predicting protein function on a genomic scale. We approach this
through the prediction and analysis of biological networks, focusing
on protein-protein interaction and transcription-factor-target ones. I
will describe how these networks can be determined through integration
of many genomic features and how they can be analyzed in terms of
various simple topological statistics. In particular, I will discuss a
number of specific analyses: (1) Integrating gene expression data with
the regulatory network illuminates transient hubs; (2) Integration of
the protein interaction network with 3D molecular structures reveals
different types of hubs, depending on the number of interfaces
involved in interactions (one or many); (3) Analysis of betweenness in
biological networks reveals that this quantity is more strongly
correlated with essentially than degree; (4) Analysis of structure of
the regulatory network shows that it has a hierarchiel layout with the
"middle-managers" acting as information bottlenecks. (5) Development
of a useful web-based tools for the analysis of networks, TopNet and
tYNA.

http://bioinfo.mbb.yale.edu
http://topnet.gersteinlab.org

TopNet: a tool for comparing biological sub-networks, correlating
protein properties with topological statistics. H Yu, X Zhu, D
Greenbaum, J Karro, M Gerstein (2004) Nucleic Acids Res 32: 328-37.

Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
Teichmann, M Gerstein (2004) Nature 431: 308-12.

Annotation transfer between genomes: protein-protein interologs and
protein-DNA regulogs. H Yu, NM Luscombe, HX Lu, X Zhu, Y Xia, JD Han,
N Bertin, S Chung, M Vidal, M Gerstein (2004) Genome Res 14: 1107-18.

Integrated prediction of the helical membrane protein interactome in
yeast. Y Xia, LJ Lu, M Gerstein (2006) J Mol Biol 357: 339-49.

Relating three-dimensional structures to protein networks provides
evolutionary insights. PM Kim, LJ Lu, Y Xia, MB Gerstein (2006)
Science 314: 1938-41.

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Positive Selection at the Protein Network Periphery: Evaluation in
Terms of Structural Constraints and Cellular Context. Philip M. Kim,
Jan O. Korbel and Mark B. Gerstein PNAS (in press)

The importance of bottlenecks in protein networks: correlation with
gene essentiality and expression dynamics. H Yu, PM Kim, E Sprecher,
V Trifonov, M Gerstein (2007) PLoS Comput Biol 3: e59.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.