CpG Traffic Lights: functional positions that are involved in regulation in humans

Speaker: Dr. Yulia Medvedeva, Group leader at the Research Center of Biotechnology, Russian Academy of Sciences, Moscow

Host: Professor Albin Sandelin, Section for Computational and RNA Biology

Abstract
DNA methylation is probably the most investigated mechanism of expression regulation. Current technologies allow one to study DNA methylation with single base resolution, nevertheless usually the methylation levels are averaged over several dozens adjacent CpGs during downstream analysis. Yet, experimental evidence shows that the methylation levels of single CpG dinucleotides can influence gene expression, for example in case of ESR1 gene.

In this work, we analyzed multiple human cell types for which both WGBS and RNA-seq were available. CpG positions that have a significant correlation between methylation levels and neighboring gene expression across all the samples were called CpG traffic lights (CpG TL). We observe that the average methylation of a promoter or a gene body is less frequently correlated with gene expression as compared to the methylation levels of CpG TL, even after proper correction for multiplicity testing.

Analyzing features and potential biological functions of CpG TL, we noticed that they often demonstrate intermediate methylation levels. Hydroxymethylcytosine (5hmC) – an intermediate product of the active CpG demethylation – is also overrepresented in such positions, supporting the dynamic demethylation of CpG TL. Also, the causality analysis shows that positions, where methylation is determined by expression, are enriched in 5hmC, suggesting that there is a positive feedback loop: transcription activates DNA demethylation.

We also observe that CpG TL are more conserved both in mammals and primates and depleted of SNPs. We show that CpG TL are overrepresented in promoters of all known types (especially bivalent/poised) and in corresponding chromatin states, spiking at exact transcription start sites, measured by CAGE (cap analysis of gene expression). We also observe a strong enrichment in enhancers determined both by CAGE and by chromatin modifications. Among different enhancer functional categories, those related to hematopoiesis are most enriched in CpG TL.  Following this line, we expand our analysis to the variation across different humans using normal T-cells data and acute myeloid leukemia (AML) data from TCGA. We believe that CpG TL provides a new insight into the functional role of DNA methylation or at very least can be used as markers for functional DNA.