Portrait of author

Emil Jørsboe:
Probabilistic methods for the analysis of genome-wide data for admixed populations including association studies in Greenlandic Inuit

Date: 15-09-2019    Supervisor: Anders Albrechtsen



In this thesis the work presented covers both aspects of population genetics, medical genetics and statistical genetics. For population genetics a method is introduced for inferring population structure by doing estimation of admixture proportions and principal component analysis. For medical genetics, association analysis and gene environment interaction analysis are done in a Greenlandic cohort. For statistical genetics, methods for dealing with the uncertainty of low depth sequencing data are introduced.

The first manuscript is a published paper presenting a method for estimating admixture proportions and doing principal component analysis for a single low depth sequencing sample, taking genotype uncertainty into account. It estimates admixture proportions for a single sample using a reference panel with population frequencies. This method is shown to be more accurate than competing methods for inferring admixture proportions. It gives accurate estimates even for ancient DNA samples and very low depth sequencing samples.

The second manuscript, is a manuscript in preparation, that has been put on bioRxiv. It introduces a method for doing association analysis with low depth sequencing data. It is a maximum likelihood based method modelling the uncertainty on the genotypes, and it is implemented in a linear model framework. It is much faster and more accurate than competing methods, thereby making it possible to do association studies with large data sets with low depth sequencing data taking genotype uncertainty into account and getting estimated e ect sizes of the genotypes.

The third manuscript is a submitted paper with a study of a gene environment interaction in Greenlanders. It is found that homozygous carriers of a common stop gained TBC1D4 variant which is associated with elevated levels of 2 h plasma glucose, can signi cantly reduce their 2 h plasma glucose levels with physical activity. Homozygous carriers of this stop gained variant reduce their 2 h plasma glucose relative more with increased physical activity than non-homozygous carriers. This nding is validated in Tbc1d4 knockout mice.

The fourth manuscript, is a manuscript in preparation. It is a study of a common LDLR missense variant in Greenlanders. The variant is shown to elevate levels of LDL cholesterol, total cholesterol and apolipoprotein B and decrease levels of HDL cholesterol. Furthermore carriers have increased risk of ischemic heart disease and peripheral artery disease and having had a coronary operation.

These manuscripts provide methods for inference of population structure on ancient DNA or very low depth sequencing data samples and doing association analysis with low depth sequencing data. The two studies in Greenlanders provide insights and provide potential for better prevention with regards to two genetic variants that have a health impact on the Greenlandic population.