Monday, November 9, 2009

abstract for talk at LMB 1-Dec-2009 [I:LMB]

TITLE:

Human Genome Annotation

Mark Gerstein, Yale University

ABSTRACT:

A central problem for 21st century science is annotating the human genome and making this annotation
useful for the interpretation of personal genomes. My talk will focus on annotating the 99% of the
genome that does not code for canonical genes, concentrating on intergenic features such as
structural variants (SVs), pseudogenes (protein fossils), binding sites, and novel transcribed RNAs
(ncRNAs). In particular, I will describe how we identify regulatory sites and variable blocks (SVs)
based on processing next-generation sequencing experiments. I will further explain how we cluster
together groups of sites to create larger annotations. Next, I will discuss a comprehensive
pseudogene identification pipeline, which has enabled us to identify >10K pseudogenes in the genome
and analyze their distribution with respect to age, protein family, and chromosomal location.
Throughout, I will try to introduce some of the computational algorithms and approaches that are
required for genome annotation. Much of this work has been carried out in the framework of the
ENCODE, modENCODE, and 1000 genomes projects.


URLS:

http://pseudogene.org
http://GenomeTECH.Gersteinlab.org

RELEVANT PAPERS:

Comparative analysis of processed ribosomal protein pseudogenes in four
mammalian genomes.
S Balasubramanian, D Zheng, YJ Liu, G Fang, A Frankish, N Carriero, R
Robilotto, P Cayting, M Gerstein (2009) Genome Biol 10: R2.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to
controls.
J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R
Bjornson, N Carriero, M Snyder, MB Gerstein (2009) Nat Biotechnol 27: 66-75

MSB: A mean-shift-based approach for the analysis of structural
variation in the genome.
LY Wang, A Abyzov, JO Korbel, M Snyder, M Gerstein (2009) Genome
Res 19: 106-17.

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB
Gerstein (2009) Nucleic Acids Res 37: D738-43.

Analysis of copy number variants and segmental duplications in the human
genome: Evidence for a change in the process of formation in recent
evolutionary history.
PM Kim, HY Lam, AE Urban, JO Korbel, J Affourtit, F Grubert, X
Chen, S Weissman, M Snyder, MB Gerstein (2008) Genome Res 18: 1865-74.

Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of
structural variants.
J Du, RD Bjornson, ZD Zhang, Y Kong, M Snyder, MB Gerstein (2009) PLoS Comput Biol 5: e1000432.

Personal phenotypes to go with personal genomes.
M Snyder, S Weissman, M Gerstein (2009) Mol Syst Biol 5: 273.

PEMer: a computational framework with simulation-based error models for inferring genomic structural
variants from massive paired-end sequencing data.
JO Korbel, A Abyzov, XJ Mu, N Carriero, P Cayting, Z Zhang, M Snyder, MB Gerstein (2009) Genome Biol
10: R23.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo,
J Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of
regulatory elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB
Gerstein (2007) Genome Res 17: 787-97.

Sunday, October 18, 2009

abstract for talk at Welcome Trust Functional Genomics & Systems Biology Workshop 30 Nov. 2009 [I:WTSYSBIO]

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3)
Showing how integrating gene expression data with regulatory networks
identifies transient hubs for characterizing of proteins of unknown
function; (4) Analyzing the structure of the regulatory network shows
that it has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (5) Showing that most human variation occurs
on the periphery of the protein interaction network; and (6)
Developing useful web-based tools for the analysis of networks (TopNet
and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions. KY
Yip, M Gerstein (2008) Bioinformatics (in press)

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I
Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein
(2009) PNAS (in press)

Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
Teichmann, M Gerstein (2004) Nature 431: 308-12.

Monday, September 28, 2009

abstract for talk at VIB workshop Systems Proteomics October 8th 2009 [I:VIB]

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3)
Showing how integrating gene expression data with regulatory networks
identifies transient hubs for characterizing of proteins of unknown
function; (4) Analyzing the structure of the regulatory network shows
that it has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (5) Showing that most human variation occurs
on the periphery of the protein interaction network; and (6)
Developing useful web-based tools for the analysis of networks (TopNet
and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions. KY
Yip, M Gerstein (2008) Bioinformatics (in press)

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I
Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein
(2009) PNAS (in press)

Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
Teichmann, M Gerstein (2004) Nature 431: 308-12.

Tuesday, September 15, 2009

abstract for talk at CAMDA on 5-Oct-2009 [I:CAMDA]

Title: Human Genome Annotation

A central problem for 21st century science will be the annotation and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils), binding sites, CNVs, and novel transcribed regions in the
genome. Much of this work has been carried out in the framework of the
ENCODE and modENCODE projects.

In particular, I will discuss how we identify regulatory regions and
novel, non-genic transcribed regions in the genome based on processing
of tiling array and next-generation sequencing experiments. I will
further discuss how we cluster together groups of binding sites and
novel transcribed regions.

Next, I will discuss a comprehensive pseudogene identification
pipeline and storage database we have built. This has enabled us to
identify >10K pseudogenes in the human and mouse genomes and analyze
their distribution with respect to age, protein family, and
chromosomal location. I will try to inter-relate our studies on
pseudogenes with those on transcribed regions. At the end I will bring
these together, trying to assess the transcriptional activity of
pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation --
e.g., the construction of annotation pipelines, developing algorithms
for optimal tiling, and refining approaches for scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

Comparative analysis of processed ribosomal protein pseudogenes in four
mammalian genomes.
S Balasubramanian, D Zheng, YJ Liu, G Fang, A Frankish, N Carriero, R
Robilotto, P Cayting, M Gerstein (2009) Genome Biol 10: R2.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo,
J Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of
regulatory
elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB
Gerstein
(2007) Genome Res 17: 787-97.

What is a gene, post-ENCODE? History and updated definition.
MB Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O
Emanuelsson, ZD
Zhang, S Weissman, M Snyder (2007) Genome Res 17: 669-81.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to
controls.
J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R
Bjornson, N Carriero, M Snyder, MB Gerstein (2009) Nat Biotechnol

MSB: A mean-shift-based approach for the analysis of structural
variation in the genome.
LY Wang, A Abyzov, JO Korbel, M Snyder, M Gerstein (2009) Genome
Res 19: 106-17.

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB
Gerstein (2009) Nucleic Acids Res 37: D738-43.


Analysis of copy number variants and segmental duplications in the human
genome: Evidence for a change in the process of formation in recent
evolutionary history.
PM Kim, HY Lam, AE Urban, JO Korbel, J Affourtit, F Grubert, X
Chen, S Weissman, M Snyder, MB Gerstein (2008) Genome Res 18: 1865-74.

Sunday, September 13, 2009

abstract for talk at Laufer Center Inaugural Symposium on 25-Sep-2009 [I:STONYBROOK]

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3)
Showing how integrating gene expression data with regulatory networks
identifies transient hubs for characterizing of proteins of unknown
function; (4) Analyzing the structure of the regulatory network shows
that it has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (5) Showing that most human variation occurs
on the periphery of the protein interaction network; and (6)
Developing useful web-based tools for the analysis of networks (TopNet
and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions. KY
Yip, M Gerstein (2008) Bioinformatics (in press)

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I
Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein
(2009) PNAS (in press)

Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
Teichmann, M Gerstein (2004) Nature 431: 308-12.

Tuesday, July 14, 2009

abstract for talk at Mathematical Biosciences Institute on 14-Sep-2009 [I:MBINETS]

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3)
Showing how integrating gene expression data with regulatory networks
identifies transient hubs for characterizing of proteins of unknown
function; (4) Analyzing the structure of the regulatory network shows
that it has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (5) Showing that most human variation occurs
on the periphery of the protein interaction network; and (6)
Developing useful web-based tools for the analysis of networks (TopNet
and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions. KY
Yip, M Gerstein (2008) Bioinformatics (in press)

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I
Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein
(2009) PNAS (in press)

Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
Teichmann, M Gerstein (2004) Nature 431: 308-12.

Saturday, May 16, 2009

abstract for talk at SRI on 21-May-2009 [I:SRI]

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3)
Showing how integrating gene expression data with regulatory networks
identifies transient hubs for characterizing of proteins of unknown
function; (4) Analyzing the structure of the regulatory network shows
that it has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (5) Showing that most human variation occurs
on the periphery of the protein interaction network; and (6)
Developing useful web-based tools for the analysis of networks (TopNet
and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions. KY
Yip, M Gerstein (2008) Bioinformatics (in press)

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I
Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein
(2009) PNAS (in press)

Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
Teichmann, M Gerstein (2004) Nature 431: 308-12.

Monday, May 11, 2009

Re: abstract for talk at 3rd Summit on Systems Biology on 18-Jun-2009 [I:3RDSUMMIT]

Thanks a lot, Mark!
Zhongming

Mark Gerstein wrote:
> TITLE:
>
> Understanding Protein Function on a Genome-scale using Networks
>
> Mark Gerstein
>
> Yale University
>
> My talk will be concerned with understanding protein function on a
> genomic scale. My lab approaches this through the prediction and
> analysis of biological networks, focusing on protein-protein
> interaction and transcription-factor-target ones. I will describe how
> these networks can be determined through integration of many genomic
> features and how they can be analyzed in terms of various topological
> statistics. In particular, I will discuss a number of recent analyses:
> (1) Improving the prediction of molecular networks through systematic
> training-set expansion; (2) Showing how the analysis of pathways
> across environments potentially allows them to act as biosensors; (3)
> Showing how integrating gene expression data with regulatory networks
> identifies transient hubs for characterizing of proteins of unknown
> function; (4) Analyzing the structure of the regulatory network shows
> that it has a hierarchical layout with the "middle-managers" acting as
> information bottlenecks; (5) Showing that most human variation occurs
> on the periphery of the protein interaction network; and (6)
> Developing useful web-based tools for the analysis of networks (TopNet
> and tYNA).
>
> http://networks.gersteinlab.org
> http://topnet.gersteinlab.org
>
> The tYNA platform for comparative interactomics: a web tool for
> managing, comparing and mining multiple networks. KY Yip, H Yu, PM
> Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.
>
> Genomic analysis of the hierarchical structure of regulatory networks.
> H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.
>
> Positive selection at the protein network periphery: evaluation in
> terms of structural constraints and cellular context. PM Kim, JO
> Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.
>
> Training Set Expansion: An Approach to Improving the Reconstruction of
> Biological Networks from Limited and Uneven Reliable Interactions. KY
> Yip, M Gerstein (2008) Bioinformatics (in press)
>
> Quantifying environmental adaptation of metabolic pathways in
> metagenomics T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I
> Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein
> (2009) PNAS (in press)
>
> Genomic analysis of regulatory network dynamics reveals large
> topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
> Teichmann, M Gerstein (2004) Nature 431: 308-12.
>
>
>
>
>
>

abstract for talk at 3rd Summit on Systems Biology on 18-Jun-2009 [I:3RDSUMMIT]

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3)
Showing how integrating gene expression data with regulatory networks
identifies transient hubs for characterizing of proteins of unknown
function; (4) Analyzing the structure of the regulatory network shows
that it has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (5) Showing that most human variation occurs
on the periphery of the protein interaction network; and (6)
Developing useful web-based tools for the analysis of networks (TopNet
and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions. KY
Yip, M Gerstein (2008) Bioinformatics (in press)

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I
Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein
(2009) PNAS (in press)

Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
Teichmann, M Gerstein (2004) Nature 431: 308-12.

abstract for talk at UCSC on 22-May-2009 [I:UCSC]

Title: Human Genome Annotation

A central problem for 21st century science will be the annotation and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils), binding sites, CNVs, and novel transcribed regions in the
genome. Much of this work has been carried out in the framework of the
ENCODE and modENCODE projects.

In particular, I will discuss how we identify regulatory regions and
novel, non-genic transcribed regions in the genome based on processing
of tiling array and next-generation sequencing experiments. I will
further discuss how we cluster together groups of binding sites and
novel transcribed regions.

Next, I will discuss a comprehensive pseudogene identification
pipeline and storage database we have built. This has enabled us to
identify >10K pseudogenes in the human and mouse genomes and analyze
their distribution with respect to age, protein family, and
chromosomal location. I will try to inter-relate our studies on
pseudogenes with those on transcribed regions. At the end I will bring
these together, trying to assess the transcriptional activity of
pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation --
e.g., the construction of annotation pipelines, developing algorithms
for optimal tiling, and refining approaches for scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

Comparative analysis of processed ribosomal protein pseudogenes in four
mammalian genomes.
S Balasubramanian, D Zheng, YJ Liu, G Fang, A Frankish, N Carriero, R
Robilotto, P Cayting, M Gerstein (2009) Genome Biol 10: R2.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo,
J Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of
regulatory
elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB
Gerstein
(2007) Genome Res 17: 787-97.

What is a gene, post-ENCODE? History and updated definition.
MB Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O
Emanuelsson, ZD
Zhang, S Weissman, M Snyder (2007) Genome Res 17: 669-81.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to
controls.
J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R
Bjornson, N Carriero, M Snyder, MB Gerstein (2009) Nat Biotechnol

MSB: A mean-shift-based approach for the analysis of structural
variation in the genome.
LY Wang, A Abyzov, JO Korbel, M Snyder, M Gerstein (2009) Genome
Res 19: 106-17.

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB
Gerstein (2009) Nucleic Acids Res 37: D738-43.


Analysis of copy number variants and segmental duplications in the human
genome: Evidence for a change in the process of formation in recent
evolutionary history.
PM Kim, HY Lam, AE Urban, JO Korbel, J Affourtit, F Grubert, X
Chen, S Weissman, M Snyder, MB Gerstein (2008) Genome Res 18: 1865-74.

Monday, February 23, 2009

abstract for talk at Sarkar Lecture in Toronto on 29-Apr-2009 [I:SARKAR]

Title: Human Genome Annotation

A central problem for 21st century science will be the annotation and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils), binding sites, CNVs, and novel transcribed regions in the
genome. Much of this work has been carried out in the framework of the
ENCODE and modENCODE projects.

In particular, I will discuss how we identify regulatory regions and
novel, non-genic transcribed regions in the genome based on processing
of tiling array and next-generation sequencing experiments. I will
further discuss how we cluster together groups of binding sites and
novel transcribed regions.

Next, I will discuss a comprehensive pseudogene identification
pipeline and storage database we have built. This has enabled us to
identify >10K pseudogenes in the human and mouse genomes and analyze
their distribution with respect to age, protein family, and
chromosomal location. I will try to inter-relate our studies on
pseudogenes with those on transcribed regions. At the end I will bring
these together, trying to assess the transcriptional activity of
pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation --
e.g., the construction of annotation pipelines, developing algorithms
for optimal tiling, and refining approaches for scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

Comparative analysis of processed ribosomal protein pseudogenes in four
mammalian genomes.
S Balasubramanian, D Zheng, YJ Liu, G Fang, A Frankish, N Carriero, R
Robilotto, P Cayting, M Gerstein (2009) Genome Biol 10: R2.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo,
J Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of
regulatory
elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB
Gerstein
(2007) Genome Res 17: 787-97.

What is a gene, post-ENCODE? History and updated definition.
MB Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O
Emanuelsson, ZD
Zhang, S Weissman, M Snyder (2007) Genome Res 17: 669-81.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to
controls.
J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R
Bjornson, N Carriero, M Snyder, MB Gerstein (2009) Nat Biotechnol

MSB: A mean-shift-based approach for the analysis of structural
variation in the genome.
LY Wang, A Abyzov, JO Korbel, M Snyder, M Gerstein (2009) Genome
Res 19: 106-17.

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB
Gerstein (2009) Nucleic Acids Res 37: D738-43.


Analysis of copy number variants and segmental duplications in the human
genome: Evidence for a change in the process of formation in recent
evolutionary history.
PM Kim, HY Lam, AE Urban, JO Korbel, J Affourtit, F Grubert, X
Chen, S Weissman, M Snyder, MB Gerstein (2008) Genome Res 18: 1865-74.

Sunday, February 8, 2009

abstract for talk at CSHL on 29-Apr-2009 [I:CSHL]

Title: Human Genome Annotation

A central problem for 21st century science will be the annotation and
understanding of the human genome. My talk will be concerned with
topics within this area, in particular annotating pseudogenes (protein
fossils), binding sites, CNVs, and novel transcribed regions in the
genome. Much of this work has been carried out in the framework of the
ENCODE and modENCODE projects.

In particular, I will discuss how we identify regulatory regions and
novel, non-genic transcribed regions in the genome based on processing
of tiling array and next-generation sequencing experiments. I will
further discuss how we cluster together groups of binding sites and
novel transcribed regions.

Next, I will discuss a comprehensive pseudogene identification
pipeline and storage database we have built. This has enabled us to
identify >10K pseudogenes in the human and mouse genomes and analyze
their distribution with respect to age, protein family, and
chromosomal location. I will try to inter-relate our studies on
pseudogenes with those on transcribed regions. At the end I will bring
these together, trying to assess the transcriptional activity of
pseudogenes.

Throughout I will try to introduce some of the computational
algorithms and approaches that are required for genome annotation --
e.g., the construction of annotation pipelines, developing algorithms
for optimal tiling, and refining approaches for scoring microarrays.

http://bioinfo.mbb.yale.edu
http://pseudogene.org
http://tiling.gersteinlab.org

Toward a universal microarray: prediction of gene expression through
nearest-neighbor probe sequence identification.
TE Royce, JS Rozowsky, MB Gerstein (2007) Nucleic Acids Res 35: e99.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo, J
Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of
regulatory
elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB
Gerstein
(2007) Genome Res 17: 787-97.

What is a gene, post-ENCODE? History and updated definition.
MB Gerstein, C Bruce, JS Rozowsky, D Zheng, J Du, JO Korbel, O
Emanuelsson, ZD
Zhang, S Weissman, M Snyder (2007) Genome Res 17: 669-81.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to
controls.
J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R
Bjornson, N Carriero, M Snyder, MB Gerstein (2009) Nat Biotechnol

MSB: A mean-shift-based approach for the analysis of structural
variation in the genome.
LY Wang, A Abyzov, JO Korbel, M Snyder, M Gerstein (2009) Genome
Res 19: 106-17.

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB
Gerstein (2009) Nucleic Acids Res 37: D738-43.


Analysis of copy number variants and segmental duplications in the human
genome: Evidence for a change in the process of formation in recent
evolutionary history.
PM Kim, HY Lam, AE Urban, JO Korbel, J Affourtit, F Grubert, X
Chen, S Weissman, M Snyder, MB Gerstein (2008) Genome Res 18: 1865-74.

Sunday, February 1, 2009

Abstract 305094 for talk at JSM 2009 3-Aug-09 [I:JSM]

Abstract Number 305094 has been submitted.


Abstract Information
*Abstract Type:* Topic Contributed
*Sub Type:* Papers
*Sponsor:* Biometrics Section

*Title:* Understanding Protein Function on a Genome-scale using Networks

*Abstract:* My talk will be concerned with understanding protein
function on a genomic scale. We approach this through the prediction and
analysis of biological networks, focusing on protein-protein interaction
and transcription-factor-target ones. I will describe how these networks
can be determined through integration of genomic features and how they
can be analyzed in terms of various topological statistics. In
particular, I will discuss: (1) Improving the prediction of molecular
networks through systematic training-set expansion; (2) Showing how the
analysis of pathways across environments potentially allows them to act
as biosensors; (3) analyzing the structure of regulatory networks shows
they have hierarchical layouts with "middle-managers" acting as
information bottlenecks. [REFS: K Yip & M Gerstein ('09) Bioinformatics
25:243; T Gianoulis et al ('09) PNAS 16:1374; Networks.GersteinLab.org]

*Key words:* bioinformatics, network, training set, metagenomics, cca,
integration

Friday, January 16, 2009

Re: abstract for talk at National Academy of Engineering meeting at Columbia on 14-Apr-2009 [I:NAECU]

Mark: I am impressed by your research depth and prowess. For the NAE
audience, communication is most important....not necessary the details
of a research presentation. Most of the audience are quite senior and
you will need to reach them and capture their attention.

Looking forward to meeting you in April. For your travel expenses,
including an over night stay in Manhattan if you prefer, please let my
assistant Ms. Paulette Louissaint know of your needs. Keep all expense
receipts. If you are staying overnight, we can plan for a dinner
together and arrange you accommodations. The details of the day's
activities will be formulated soon. Thanking you again to share you
time and expertise. Regards, Van Mow


Mark Gerstein wrote:
> TITLE:
>
> Understanding Protein Function on a Genome-scale using Networks
>
> Mark Gerstein
>
> Yale University
>
> My talk will be concerned with understanding protein function on a
> genomic scale. My lab approaches this through the prediction and
> analysis of biological networks, focusing on protein-protein
> interaction and transcription-factor-target ones. I will describe how
> these networks can be determined through integration of many genomic
> features and how they can be analyzed in terms of various topological
> statistics. In particular, I will discuss a number of recent analyses:
> (1) Improving the prediction of molecular networks through systematic
> training-set expansion; (2) Showing how the analysis of pathways
> across environments potentially allows them to act as biosensors; (3)
> Showing how integrating gene expression data with regulatory networks
> identifies transient hubs for characterizing of proteins of unknown
> function; (4) Analyzing the structure of the regulatory network shows
> that it has a hierarchical layout with the "middle-managers" acting as
> information bottlenecks; (5) Showing that most human variation occurs
> on the periphery of the protein interaction network; and (6)
> Developing useful web-based tools for the analysis of networks (TopNet
> and tYNA).
>
> http://networks.gersteinlab.org
> http://topnet.gersteinlab.org
>
> The tYNA platform for comparative interactomics: a web tool for
> managing, comparing and mining multiple networks. KY Yip, H Yu, PM
> Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.
>
> Genomic analysis of the hierarchical structure of regulatory networks.
> H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.
>
> Positive selection at the protein network periphery: evaluation in
> terms of structural constraints and cellular context. PM Kim, JO
> Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.
>
> Training Set Expansion: An Approach to Improving the Reconstruction of
> Biological Networks from Limited and Uneven Reliable Interactions. KY
> Yip, M Gerstein (2008) Bioinformatics (in press)
>
> Quantifying environmental adaptation of metabolic pathways in
> metagenomics T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I
> Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein
> (2009) PNAS (in press)
>
> Genomic analysis of regulatory network dynamics reveals large
> topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
> Teichmann, M Gerstein (2004) Nature 431: 308-12.
>
>
>
>

Thursday, January 15, 2009

abstract for talk at National Academy of Engineering meeting at Columbia on 14-Apr-2009 [I:NAECU]

TITLE:

Understanding Protein Function on a Genome-scale using Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3)
Showing how integrating gene expression data with regulatory networks
identifies transient hubs for characterizing of proteins of unknown
function; (4) Analyzing the structure of the regulatory network shows
that it has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (5) Showing that most human variation occurs
on the periphery of the protein interaction network; and (6)
Developing useful web-based tools for the analysis of networks (TopNet
and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Genomic analysis of the hierarchical structure of regulatory networks.
H Yu, M Gerstein (2006) Proc Natl Acad Sci U S A 103: 14724-31.

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions. KY
Yip, M Gerstein (2008) Bioinformatics (in press)

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I
Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein
(2009) PNAS (in press)

Genomic analysis of regulatory network dynamics reveals large
topological changes. NM Luscombe, MM Babu, H Yu, M Snyder, SA
Teichmann, M Gerstein (2004) Nature 431: 308-12.