Portrait of author

Stylianos Bakoulis:
On the origin of regulatory elements - Computational analyses of transposable elements in pluripotency and development

Date: 22-04-2022    Supervisor: Robin Andersson




One of the greatest challenges of modern biology is to decipher what underlies the astonishing diversity of organisms. While information on genome size and gene content is indicative of organismal complexity, it is the variation and complexity of the non-coding cis-regulatory DNA elements that control gene expression and give rise to profound phenotypic effects, morphologically distinct cell types and complex multicellular organisms.

The advent of whole-genome sequencing and rapid computational advances enabled us to construct a complete picture of the coding and non-coding parts of
genomes. These technological milestones were precursors of unraveling that a vast proportion of all genomes is composed of transposable elements (TEs), a group of self-replicating mobile genetic sequences, that have been suggested as a rich source of regulatory elements and regulatory innovation by integrating host genomes throughout genome evolution to establish gene regulatory functions. However, it is unclear how systematic the contribution of TEs to the origins and activities of gene regulatory elements is, to which extent they compare to regulatory elements lacking TEs and how specific or dynamic TE-derived regulatory and transcriptional activities contribute to diverse regulatory programs.

The goal of this thesis has been to systematically and genome-wide characterize transcription initiation landscapes and their association to TEs in order to draw conclusions on the origins of regulatory elements and their contribution to the maintenance of regulatory landscapes across diverse cells, tissues or distinct stages of development. This thesis further describes recent advances in bioinformatics and genomic analysis used to extract information about individual TE integrants and how they intertwine with the non-coding genome.

Adopting an integrative data analysis approach when feasible, allowed us to unbiasedly and systematically investigate the contribution of the wide repertoire of mouse TEs to the transcription initiation landscape and thus regulatory elements in mouse embryonic stem cells (Manuscript I). The transcription-centric characterization of TE-derived regulatory elements in the context of divergent transcription initiation, RNA metabolism and transcription factor binding potential allowed us to demonstrate that in particular divergently transcribed endogenous retroviral elements are ideal candidates to spread sequences with readily available regulatory activity conferring regulatory innovation to cell-specific regulatory programs.

Leveraging statistical approaches to analyze continuous expression data, we studied the landscapes of TE-derived transcription initiation across mouse cerebellar development as a proxy to infer their dynamic regulatory activities (Manuscript II). The work suggests that several intronic and gene-distal enhancers, originating from L1 LINE elements, control key distinct stages of development and neurogenesis in the developing cerebellum by integrating the activity of key neurodevelopmental transcription factors.

The findings presented in this thesis inform on the connection of regulatory origins to properties and activities that are maintaining regulatory landscapes and help us to understand the underlying mechanisms that distinct TEs use to contribute to gene regulation.