Xiaosen Guo:
Building high resolution genetic variation map for Mongolians

Date: 15-12-2016    Supervisor: Jun Wang & Karsten Kristiansen

As one of representative population in East Asia and a typical nomadic ethnic group of the world, Mongolians played a pivotal role in human evolution, including early peopling of Native Americans (at least 10,000 years ago) and recent shaping of population genetic structure of Eurasians (around 1,000 years ago). Harsh environmental conditions and characteristic lifestyle result in extremely high prevalence of several genetic diseases in Mongolians, such as alcohol dependency, obesity, Type 2 Diabetes (T2D) and lipid metabolism related diseases. As invention and wide application of new generation sequencing technologies, the genomes of more and more human populations in the world are decoded, the studies of human population genomics have developed dramatically, which include a few human population studies we participated. We first initiated and performed the 1000 Genomes Project (1000G), completed the sequencing of 2,504 genomes from 26 representative human populations of the world, and constructed the highest resolution human referred genetic variation catalogue so far. It has been extensively used in the studies of human evolution and genetic diseases. We participated the sequencing project of 10,000 British genomes (The UK10K Project), built a reference panel for British, found several novel risk alleles associated with some human disease related phenotypes, and explored the contribution of rare and low-frequency variants to human traits. We took part in the study of Danish genomes project, performed high depth genome sequencing for 150 individuals from 50 trios, built Danish characteristic reference panel, and revealed the features of genetic variations and population genetic structure. We also participated the study of genetic mechanism of skin lightening for East Asians and indicated the pigmentation gene OCA2 play an important role in the convergent skin lightening of East Asians during recent human evolution. However, the genomics research on Mongolians, which attract strong research interests, still remains the levels of using the data of Y chromosome or Mitochondrial genome to explore the paternal or maternal transmission, or carrying out the genetic disease studies based on the data of a small number of variants or partial genomic regions.

In the study plan, first, we collected the genomic DNA of a representative Mongolian male individual, performed high coverage whole genome sequencing, de novo assembled a high-quality Mongolian genome. We obtained a Mongolian personal genetic variation catalogue, which contains 3.7 million single nucleotide polymorphisms (SNPs) and 0.76 million short insertions and deletions (indels). The functional analysis based on the variants indicated the individual possesses a risk allele that may cause carnitine deficiency. Y haplogroup analysis located the paternal inheritance to the clade D3a, which is the one of oldest lineage in East Asians and present the most common in Tibeto- Burman populations. Through final population genetics analyses, we roughly revealed different levels of gene flows occurred between Mongolians and other different human populations. In further study, we collected a total of 175 samples from six typical Mongolian tribes or regions and carried out the whole genome sequencing with average coverage of 20X. We identified more than 16 million genetic variants and constructed the first high-quality reference panel for Mongolians. Comparative analyses showed that the panel presented the best prediction accuracy in Mongolian population imputation. Through the analyses of phylogeny, genetic clustering and putative ancestor inference, we discovered the genetic structure of the ethnic group presents the features of spreading widely, high admixed and some level of population stratification. In further inferences of demographic history and gene flow events, we found different tribes present diverse population history and observed frequent gene flows among the tribes and between Mongolians and other human populations. In the admixture events, the degree of gene flow between Mongolians and Finns is significantly higher than that between the ethnic group and other non-Han populations. Further analysis showed such high level gene flow occurred between Finns and the groups of Siberians, Mongolians and other close ethnic groups. We finally integrated the data of our Mongolians and other global human populations and constructed higher resolution phylogenetic tree, especially for the East Asians. The phylogenetic relationships present the East Asian distribution of spreading from north to south, which evidently supports the hypothesis of southward dispersal after entering East Asia. In final study, We performed association study for 28 T2D related SNPs reported previous in 966 Mongolian samples, including 469 samples of control group and 497 diagnosed T2D patients of case group. Only 11 SNPs are repeated significant association with T2D in Mongolians. We also observed substantial difference of T2D risk allele frequency between the Mongolian sample and the 1000G Caucasian sample for a few SNPs, including rs6723108 in the gene of TMEM163 whose risk allele reaches near fixation in Mongolian samples. This study confirmed the genetic heterogeneity of T2D in Mongolians and will provide the data support to explore the novel T2D causal variants in the further studies of this ethnic group.

Assembly of representative Mongolian genome, building of referred variation catalogue, inference of demographic history, pilot study on T2D in the population and other studies on human population genomics we participated, not only gave us a new understanding to this human ethnic group who live in the northern of East Asia and therefore laid a good foundation for the further studies of evolution and diseases of Mongolians, but also will facilitate the evolution studies and personal precision medicine in Chinese people or East Asians as an important part.