Malte Thodberg:
Organism- and disease-specific atlases of transcription start sites using Cap Analysis of Gene Expression (CAGE)

Date: 14-03-2018    Supervisor: Albin Sandelin




Modern high-throughput assays have enabled the study of gene expression and its regulation on an unprecedented, genome-wide scale. Cap Analysis of Gene Expression (CAGE) is one of the major assays for studying transcriptional regulation. CAGE can detect and quantify both transcription starts sites (TSSs) and enhancers independently from reference gene annotation. This means that in a single experiment, both gene expression and regulatory activity in intergenic regions, can be assayed when using CAGE.

CAGE has greatly benefited from work by the FANTOM consortiums, who has produced a large number of CAGE datasets and developed new methods for analysing them. Based on these advances, it is now possible to use CAGE to study transcriptional regulation in diseases and additional organism.

In this thesis, we analyzed transcriptional regulation using CAGE in the eukaryotic model organism fission yeast and in the chronic disease inflammatory bowel disease (IBD), and developed general tools for analysing CAGE data.

Using CAGE data from 15 samples of fission yeast growing under a wide range of different conditions, we generated an accurate, genome-wide atlas of TSSs. We showed that this atlas improves and expands existing gene models, and how it can be used as an accurate starting point for analyzing many other types of genetic and epigenetic data. We identified TSSs that change expression between conditions, including cases where genes use alternative TSSs in a condition-dependent manner.

We showed how CAGE can be used in large clinical studies by analyzing a dataset composed of 94 colonic biopsies obtained from patients suffering from IBD. We generated an accurate IBD-specific atlas of TSSs and enhancers, and used this atlas to describe the biological processes that distinguish subtypes of IBD. We showed how enhancers can be used to interpret the regulatory function of intergenic regions, and that enhancers are highly enriched for genetic variants associated to IBD. Lastly, we used the IBD-specific atlas of TSSs and enhancers to select a small set of biomarkers that can be used to classify IBD patients in a clinical setting with high accuracy.

Finally, we developed a novel tool for the analysis of CAGE data that enables and empowers future CAGE studies.