Friday, October 1, 2010

abstract for talk at UConn Chemical Engineering Seminar 5-Oct-2010 (i0uconnengr)

TITLE:

Genome Annotation

Mark Gerstein, Yale University

ABSTRACT:

A central problem for 21st century science is annotating the human
genome and making this annotation useful for the interpretation of
personal genomes.  My talk will focus on annotating the 99% of the
genome that does not code for canonical genes, concentrating on
intergenic features such as structural variants (SVs), pseudogenes
(protein fossils), binding sites, and novel transcribed RNAs (ncRNAs).
In particular, I will describe how we identify regulatory sites and
variable blocks (SVs) based on processing next-generation sequencing
experiments.  I will further explain how we cluster together groups of
sites to create larger annotations. Next, I will discuss a
comprehensive pseudogene identification pipeline, which has enabled us
to identify >10K pseudogenes in the genome and analyze their
distribution with respect to age, protein family, and chromosomal
location. Throughout, I will try to introduce some of the
computational algorithms and approaches that are required for genome
annotation. Much of this work has been carried out in the framework of
the ENCODE, modENCODE, and 1000 genomes projects.


URLS:

http://pseudogene.org
http://GenomeTECH.Gersteinlab.org

RELEVANT PAPERS:

Comparative analysis of processed ribosomal protein pseudogenes in four
mammalian genomes.
S Balasubramanian, D Zheng, YJ Liu, G Fang, A Frankish, N Carriero, R
Robilotto, P Cayting, M Gerstein (2009) Genome Biol 10: R2.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to
controls.
J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R
Bjornson, N Carriero, M Snyder, MB Gerstein (2009) Nat Biotechnol 27: 66-75

MSB: A mean-shift-based approach for the analysis of structural
variation in the genome.
LY Wang, A Abyzov, JO Korbel, M Snyder, M Gerstein (2009) Genome
Res 19: 106-17.

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB
Gerstein (2009) Nucleic Acids Res 37: D738-43.

Analysis of copy number variants and segmental duplications in the human
genome: Evidence for a change in the process of formation in recent
evolutionary history.
PM Kim, HY Lam, AE Urban, JO Korbel, J Affourtit, F Grubert, X
Chen, S Weissman, M Snyder, MB Gerstein (2008) Genome Res 18: 1865-74.

Integrating sequencing technologies in personal genomics: optimal low
cost reconstruction of structural variants.
J Du, RD Bjornson, ZD Zhang, Y Kong, M Snyder, MB Gerstein (2009) PLoS
Comput Biol 5: e1000432.

Personal phenotypes to go with personal genomes.
M Snyder, S Weissman, M Gerstein (2009) Mol Syst Biol 5: 273.

PEMer: a computational framework with simulation-based error models
for inferring genomic structural variants from massive paired-end
sequencing data.
JO Korbel, A Abyzov, XJ Mu, N Carriero, P Cayting, Z Zhang, M Snyder,
MB Gerstein (2009) Genome Biol 10: R23.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo,
J Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of
regulatory elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB
Gerstein (2007) Genome Res 17: 787-97.

Nucleotide-resolution analysis of structural variants using BreakSeq
and a breakpoint library.
HY Lam, XJ Mu, AM Stütz, A Tanzer, PD Cayting, M Snyder, PM Kim, JO
Korbel, MB Gerstein (2010)
Nat Biotechnol 28: 47-55.

Thursday, June 24, 2010

abstract for talk at Royal Holloway [I:ROYALH]

TITLE:

Analysis of Molecular Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3a)
Analyzing the structure of the regulatory network indicates that it
has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (3b) Showing these middle managers tend be
arranged in various "partnership" structures giving the hierarchy a
"democratic character" ; (4) Showing that most human variation occurs
at the periphery of the protein interaction network; (5) Comparing the
topology and variation of the regulatory network to the call graph of
a computer operating system; and (5) Developing useful web-based tools
for the analysis of networks (TopNet and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Analysis of Diverse Regulatory Networks in a Hierarchical Context:
Consistent Tendencies for Collaboration in the Middle Levels
N Bhardwaj et al. PNAS (2010)

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions.
KY Yip, M Gerstein (2008) Bioinformatics

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulis, J Raes, P Patel, R Bjornson, J Korbel, I Letunic, T
Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein (2009)
PNAS

Comparing genomes to computer operating systems in terms of the
topology and evolution of their regulatory control networks.
KK Yan, G Fang, N Bhardwaj, RP Alexander, M Gerstein (2010) Proc Natl
Acad Sci U S A

Sunday, May 23, 2010

abstract for talk at ACM BCB 2010 [I:ACMBIOINFO]

TITLE:

Analysis of Molecular Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3a)
Analyzing the structure of the regulatory network indicates that it
has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (3b) Showing these middle managers tend be
arranged in various "partnership" structures giving the hierarchy a
"democratic character" ; (4) Showing that most human variation occurs
at the periphery of the protein interaction network; (5) Comparing the
topology and variation of the regulatory network to the call graph of
a computer operating system; and (5) Developing useful web-based tools
for the analysis of networks (TopNet and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Analysis of Diverse Regulatory Networks in a Hierarchical Context:
Consistent Tendencies for Collaboration in the Middle Levels
N Bhardwaj et al. PNAS (2010)

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions.
KY Yip, M Gerstein (2008) Bioinformatics

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulis, J Raes, P Patel, R Bjornson, J Korbel, I Letunic, T
Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein (2009)
PNAS

Comparing genomes to computer operating systems in terms of the
topology and evolution of their regulatory control networks.
KK Yan, G Fang, N Bhardwaj, RP Alexander, M Gerstein (2010) Proc Natl
Acad Sci U S A

abstract for talk at OCCBIO 2010 [i0OCCBIO]

TITLE:

Analysis of Molecular Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3a)
Analyzing the structure of the regulatory network indicates that it
has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (3b) Showing these middle managers tend be
arranged in various "partnership" structures giving the hierarchy a
"democratic character" ; (4) Showing that most human variation occurs
at the periphery of the protein interaction network; (5) Comparing the
topology and variation of the regulatory network to the call graph of
a computer operating system; and (5) Developing useful web-based tools
for the analysis of networks (TopNet and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Analysis of Diverse Regulatory Networks in a Hierarchical Context:
Consistent Tendencies for Collaboration in the Middle Levels
N Bhardwaj et al. PNAS (2010)

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions. KY Yip, M Gerstein (2008)
Bioinformatics

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulis, J Raes, P Patel, R Bjornson, J Korbel, I Letunic, T
Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein (2009)
PNAS

Comparing genomes to computer operating systems in terms of the
topology and evolution of their regulatory control networks.
KK Yan, G Fang, N Bhardwaj, RP Alexander, M Gerstein (2010) Proc Natl
Acad Sci U S A

Saturday, March 27, 2010

abstract for talk at NYAS symp. A Look at the Tools and Comparative Approaches of Systems Biology 10 Jun 2010 [i0nysysbio]

TITLE:

Analysis of Molecular Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3a)
Analyzing the structure of the regulatory network indicates that it
has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (3b) Showing these middle managers tend be
arranged in various "partnership" structures giving the hierarchy a
"democratic character" ; (4) Showing that most human variation occurs
on the periphery of the protein interaction network; and (5)
Developing useful web-based tools for the analysis of networks (TopNet
and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Analysis of Diverse Regulatory Networks in a Hierarchical Context:
Consistent Tendencies for Collaboration in the Middle Levels
N Bhardwaj et al. PNAS (2010, in press)

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions. KY
Yip, M Gerstein (2008) Bioinformatics

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I
Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein
(2009) PNAS

abstract for talk at Brown Appl. Math 9 apr 2010 [I:BROWNMATH]

TITLE:

Analysis of Molecular Networks

Mark Gerstein

Yale University

My talk will be concerned with understanding protein function on a
genomic scale. My lab approaches this through the prediction and
analysis of biological networks, focusing on protein-protein
interaction and transcription-factor-target ones. I will describe how
these networks can be determined through integration of many genomic
features and how they can be analyzed in terms of various topological
statistics. In particular, I will discuss a number of recent analyses:
(1) Improving the prediction of molecular networks through systematic
training-set expansion; (2) Showing how the analysis of pathways
across environments potentially allows them to act as biosensors; (3a)
Analyzing the structure of the regulatory network indicates that it
has a hierarchical layout with the "middle-managers" acting as
information bottlenecks; (3b) Showing these middle managers tend be
arranged in various "partnership" structures giving the hierarchy a
"democratic character" ; (4) Showing that most human variation occurs
on the periphery of the protein interaction network; and (5)
Developing useful web-based tools for the analysis of networks (TopNet
and tYNA).

http://networks.gersteinlab.org
http://topnet.gersteinlab.org

The tYNA platform for comparative interactomics: a web tool for
managing, comparing and mining multiple networks. KY Yip, H Yu, PM
Kim, M Schultz, M Gerstein (2006) Bioinformatics 22: 2968-70.

Analysis of Diverse Regulatory Networks in a Hierarchical Context:
Consistent Tendencies for Collaboration in the Middle Levels
N Bhardwaj et al. PNAS (2010, in press)

Positive selection at the protein network periphery: evaluation in
terms of structural constraints and cellular context. PM Kim, JO
Korbel, MB Gerstein (2007) Proc Natl Acad Sci U S A 104: 20274-9.

Training Set Expansion: An Approach to Improving the Reconstruction of
Biological Networks from Limited and Uneven Reliable Interactions. KY
Yip, M Gerstein (2008) Bioinformatics

Quantifying environmental adaptation of metabolic pathways in
metagenomics T Gianoulisa, J Raes, P Patel, R Bjornson, J Korbel, I
Letunic, T Yamada, A Paccanaro, L Jensen, M Snyder, P Bork, M Gerstein
(2009) PNAS

Friday, March 12, 2010

abstract for talk at 6th International Symposium on Bioinformatics Research and Applications 23-May-2010 [i0ISBRA]

TITLE:

Human Genome Annotation

Mark Gerstein, Yale University

ABSTRACT:

A central problem for 21st century science is annotating the human
genome and making this annotation useful for the interpretation of
personal genomes.  My talk will focus on annotating the 99% of the
genome that does not code for canonical genes, concentrating on
intergenic features such as structural variants (SVs), pseudogenes
(protein fossils), binding sites, and novel transcribed RNAs (ncRNAs).
In particular, I will describe how we identify regulatory sites and
variable blocks (SVs) based on processing next-generation sequencing
experiments.  I will further explain how we cluster together groups of
sites to create larger annotations. Next, I will discuss a
comprehensive pseudogene identification pipeline, which has enabled us
to identify >10K pseudogenes in the genome and analyze their
distribution with respect to age, protein family, and chromosomal
location. Throughout, I will try to introduce some of the
computational algorithms and approaches that are required for genome
annotation. Much of this work has been carried out in the framework of
the ENCODE, modENCODE, and 1000 genomes projects.


URLS:

http://pseudogene.org
http://GenomeTECH.Gersteinlab.org

RELEVANT PAPERS:

Comparative analysis of processed ribosomal protein pseudogenes in four
mammalian genomes.
S Balasubramanian, D Zheng, YJ Liu, G Fang, A Frankish, N Carriero, R
Robilotto, P Cayting, M Gerstein (2009) Genome Biol 10: R2.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to
controls.
J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R
Bjornson, N Carriero, M Snyder, MB Gerstein (2009) Nat Biotechnol 27: 66-75

MSB: A mean-shift-based approach for the analysis of structural
variation in the genome.
LY Wang, A Abyzov, JO Korbel, M Snyder, M Gerstein (2009) Genome
Res 19: 106-17.

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB
Gerstein (2009) Nucleic Acids Res 37: D738-43.

Analysis of copy number variants and segmental duplications in the human
genome: Evidence for a change in the process of formation in recent
evolutionary history.
PM Kim, HY Lam, AE Urban, JO Korbel, J Affourtit, F Grubert, X
Chen, S Weissman, M Snyder, MB Gerstein (2008) Genome Res 18: 1865-74.

Integrating sequencing technologies in personal genomics: optimal low
cost reconstruction of structural variants.
J Du, RD Bjornson, ZD Zhang, Y Kong, M Snyder, MB Gerstein (2009) PLoS
Comput Biol 5: e1000432.

Personal phenotypes to go with personal genomes.
M Snyder, S Weissman, M Gerstein (2009) Mol Syst Biol 5: 273.

PEMer: a computational framework with simulation-based error models
for inferring genomic structural variants from massive paired-end
sequencing data.
JO Korbel, A Abyzov, XJ Mu, N Carriero, P Cayting, Z Zhang, M Snyder,
MB Gerstein (2009) Genome Biol 10: R23.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo,
J Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of
regulatory elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB
Gerstein (2007) Genome Res 17: 787-97.

Nucleotide-resolution analysis of structural variants using BreakSeq
and a breakpoint library.
HY Lam, XJ Mu, AM Stütz, A Tanzer, PD Cayting, M Snyder, PM Kim, JO
Korbel, MB Gerstein (2010)
Nat Biotechnol 28: 47-55.

abstract for talk at IBM 23-May-2010 [i0ISBRA]

TITLE:

Human Genome Annotation

Mark Gerstein, Yale University

ABSTRACT:

A central problem for 21st century science is annotating the human
genome and making this annotation useful for the interpretation of
personal genomes.  My talk will focus on annotating the 99% of the
genome that does not code for canonical genes, concentrating on
intergenic features such as structural variants (SVs), pseudogenes
(protein fossils), binding sites, and novel transcribed RNAs (ncRNAs).
In particular, I will describe how we identify regulatory sites and
variable blocks (SVs) based on processing next-generation sequencing
experiments.  I will further explain how we cluster together groups of
sites to create larger annotations. Next, I will discuss a
comprehensive pseudogene identification pipeline, which has enabled us
to identify >10K pseudogenes in the genome and analyze their
distribution with respect to age, protein family, and chromosomal
location. Throughout, I will try to introduce some of the
computational algorithms and approaches that are required for genome
annotation. Much of this work has been carried out in the framework of
the ENCODE, modENCODE, and 1000 genomes projects.


URLS:

http://pseudogene.org
http://GenomeTECH.Gersteinlab.org

RELEVANT PAPERS:

Comparative analysis of processed ribosomal protein pseudogenes in four
mammalian genomes.
S Balasubramanian, D Zheng, YJ Liu, G Fang, A Frankish, N Carriero, R
Robilotto, P Cayting, M Gerstein (2009) Genome Biol 10: R2.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to
controls.
J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R
Bjornson, N Carriero, M Snyder, MB Gerstein (2009) Nat Biotechnol 27: 66-75

MSB: A mean-shift-based approach for the analysis of structural
variation in the genome.
LY Wang, A Abyzov, JO Korbel, M Snyder, M Gerstein (2009) Genome
Res 19: 106-17.

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB
Gerstein (2009) Nucleic Acids Res 37: D738-43.

Analysis of copy number variants and segmental duplications in the human
genome: Evidence for a change in the process of formation in recent
evolutionary history.
PM Kim, HY Lam, AE Urban, JO Korbel, J Affourtit, F Grubert, X
Chen, S Weissman, M Snyder, MB Gerstein (2008) Genome Res 18: 1865-74.

Integrating sequencing technologies in personal genomics: optimal low
cost reconstruction of structural variants.
J Du, RD Bjornson, ZD Zhang, Y Kong, M Snyder, MB Gerstein (2009) PLoS
Comput Biol 5: e1000432.

Personal phenotypes to go with personal genomes.
M Snyder, S Weissman, M Gerstein (2009) Mol Syst Biol 5: 273.

PEMer: a computational framework with simulation-based error models
for inferring genomic structural variants from massive paired-end
sequencing data.
JO Korbel, A Abyzov, XJ Mu, N Carriero, P Cayting, Z Zhang, M Snyder,
MB Gerstein (2009) Genome Biol 10: R23.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo,
J Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of
regulatory elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB
Gerstein (2007) Genome Res 17: 787-97.

Nucleotide-resolution analysis of structural variants using BreakSeq
and a breakpoint library.
HY Lam, XJ Mu, AM Stütz, A Tanzer, PD Cayting, M Snyder, PM Kim, JO
Korbel, MB Gerstein (2010)
Nat Biotechnol 28: 47-55.

Sunday, January 31, 2010

abstract for talk at IBM 11-Feb-2010 [I:IBM]

TITLE:

Human Genome Annotation

Mark Gerstein, Yale University

ABSTRACT:

A central problem for 21st century science is annotating the human genome and making this annotation
useful for the interpretation of personal genomes. My talk will focus on annotating the 99% of the
genome that does not code for canonical genes, concentrating on intergenic features such as
structural variants (SVs), pseudogenes (protein fossils), binding sites, and novel transcribed RNAs
(ncRNAs). In particular, I will describe how we identify regulatory sites and variable blocks (SVs)
based on processing next-generation sequencing experiments. I will further explain how we cluster
together groups of sites to create larger annotations. Next, I will discuss a comprehensive
pseudogene identification pipeline, which has enabled us to identify >10K pseudogenes in the genome
and analyze their distribution with respect to age, protein family, and chromosomal location.
Throughout, I will try to introduce some of the computational algorithms and approaches that are
required for genome annotation. Much of this work has been carried out in the framework of the
ENCODE, modENCODE, and 1000 genomes projects.


URLS:

http://pseudogene.org
http://GenomeTECH.Gersteinlab.org

RELEVANT PAPERS:

Comparative analysis of processed ribosomal protein pseudogenes in four
mammalian genomes.
S Balasubramanian, D Zheng, YJ Liu, G Fang, A Frankish, N Carriero, R
Robilotto, P Cayting, M Gerstein (2009) Genome Biol 10: R2.

PeakSeq enables systematic scoring of ChIP-seq experiments relative to
controls.
J Rozowsky, G Euskirchen, RK Auerbach, ZD Zhang, T Gibson, R
Bjornson, N Carriero, M Snyder, MB Gerstein (2009) Nat Biotechnol 27: 66-75

MSB: A mean-shift-based approach for the analysis of structural
variation in the genome.
LY Wang, A Abyzov, JO Korbel, M Snyder, M Gerstein (2009) Genome
Res 19: 106-17.

Pseudofam: the pseudogene families database.
HY Lam, E Khurana, G Fang, P Cayting, N Carriero, KH Cheung, MB
Gerstein (2009) Nucleic Acids Res 37: D738-43.

Analysis of copy number variants and segmental duplications in the human
genome: Evidence for a change in the process of formation in recent
evolutionary history.
PM Kim, HY Lam, AE Urban, JO Korbel, J Affourtit, F Grubert, X
Chen, S Weissman, M Snyder, MB Gerstein (2008) Genome Res 18: 1865-74.

Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of
structural variants.
J Du, RD Bjornson, ZD Zhang, Y Kong, M Snyder, MB Gerstein (2009) PLoS Comput Biol 5: e1000432.

Personal phenotypes to go with personal genomes.
M Snyder, S Weissman, M Gerstein (2009) Mol Syst Biol 5: 273.

PEMer: a computational framework with simulation-based error models for inferring genomic structural
variants from massive paired-end sequencing data.
JO Korbel, A Abyzov, XJ Mu, N Carriero, P Cayting, Z Zhang, M Snyder, MB Gerstein (2009) Genome Biol
10: R23.

Pseudogenes in the ENCODE regions: consensus annotation, analysis of
transcription, and evolution.
D Zheng, A Frankish, R Baertsch, P Kapranov, A Reymond, SW Choo, Y Lu, F
Denoeud, SE Antonarakis, M Snyder, Y Ruan, CL Wei, TR Gingeras, R Guigo,
J Harrow, MB Gerstein (2007) Genome Res 17: 839-51.

Statistical analysis of the genomic distribution and correlation of
regulatory elements in the ENCODE regions.
ZD Zhang, A Paccanaro, Y Fu, S Weissman, Z Weng, J Chang, M Snyder, MB
Gerstein (2007) Genome Res 17: 787-97.

Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library.
HY Lam, XJ Mu, AM Stütz, A Tanzer, PD Cayting, M Snyder, PM Kim, JO Korbel, MB Gerstein (2010)
Nat Biotechnol 28: 47-55.