Xiaobei Zhao:
On Gene Regulation in Eukaryotes. Computational approaches to decipher transcriptional regulation at genetic and epigenetic level

Date: 01-02-2012    Supervisor: Albin Sandelin

The regulated transcription of genes is one of the most fundamental processes of life, and therefore one of the primary focus areas of molecular biology and biomedical research.

The transcriptional regulatory machinery governs how cis and trans signals are integrated to produce messenger RNA (mRNA) and in extension proteins. Recent advances in highthroughput, genome-wide profiling technology allow more sophisticated investigations to decipher the genetic and epigenetic mechanisms that contribute to transcriptional regulation and coordinate the spatial and temporal expressions in eukaryotic cells. At the same time, these technologies require computational methods, since the number of data points from a single experiment is overwhelming.

This thesis has investigated and characterized the mechanism of transcriptional regulation at multiple levels using computational methods: sequence signals in core promoters, intensity of initiation of transcription, chromatin signals and more. This has been done both indifferentiated systems such as tissues, and in dynamically changing cells, with the cell cycle as a major focus. The major studies described in this thesis are

1) An analysis on how many types of mammalian promoters there are in terms of the distributions of transcription start sites. The results validated previous ad hoc and simplistic clustering methods but also highlighted one type of artifact that should be excluded before downstream analyses.

2) The development of a non-parametric distance measure between empirical distributions with applications in high-throughput biology and provided downstream clustering and visualization tools.

3) A substantial extension of the JASPAR CORE transcription factor bindingmodel database with new ChIP-based data, enhanced the GUI interface and the development of web-based curator’s tool for data manipulation and user management.

4) Mapping of chromatin states over the genome during separated cell-cycle phases of primary cells, adding an important facet to both chromatin and cell cycle biology.

Together, these studies show facets on how genetic and epigenetic mechanisms contribute and crosstalk in regulating transcription, with a focus on both method development and data exploration techniques.