Evaluation of model fit of inferred admixture proportions

Biologisk Institut

Evaluation of model fit of inferred admixture proportions

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Standard

Evaluation of model fit of inferred admixture proportions. / Garcia-Erill, Genís; Albrechtsen, Anders.

I: Molecular Ecology Resources, Bind 20, Nr. 4, 2020, s. 936-949.

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Harvard

Garcia-Erill, G & Albrechtsen, A 2020, 'Evaluation of model fit of inferred admixture proportions', Molecular Ecology Resources, bind 20, nr. 4, s. 936-949. https://doi.org/10.1111/1755-0998.13171

APA

Garcia-Erill, G., & Albrechtsen, A. (2020). Evaluation of model fit of inferred admixture proportions. Molecular Ecology Resources, 20(4), 936-949. https://doi.org/10.1111/1755-0998.13171

Vancouver

Garcia-Erill G, Albrechtsen A. Evaluation of model fit of inferred admixture proportions. Molecular Ecology Resources. 2020;20(4):936-949. https://doi.org/10.1111/1755-0998.13171

Author

Garcia-Erill, Genís ; Albrechtsen, Anders. / Evaluation of model fit of inferred admixture proportions. I: Molecular Ecology Resources. 2020 ; Bind 20, Nr. 4. s. 936-949.

Bibtex

@article{a06993346bcf403e98a48177f38efd34,

title = "Evaluation of model fit of inferred admixture proportions",

abstract = "Model based methods for genetic clustering of individuals such as those implemented in structure or ADMIXTURE allow to infer individual ancestries and study population structure. The underlying model makes several assumptions about the demographic history that shaped the analysed genetic data. One assumption is that all individuals are a result of K homogeneous ancestral populations that are all well represented in the data, while another assumption is that no drift happened after the admixture event. The histories of many real world populations do not conform to that model, and in that case taking the inferred admixture proportions at face value might be misleading. We propose a method to evaluate the fit of admixture models based on estimating the correlation of the residual difference between the true genotypes and the genotypes predicted by the model. When the model assumptions are not violated, the residuals from a pair of individuals are not correlated. In case of a bad fit, individuals with similar demographic histories have a positive correlation of their residuals. Using simulated and real data, we show how the method is able to detect a bad fit of inferred admixture proportions due to using an insufficient number of clusters K or to demographic histories that deviate significantly from the admixture model assumptions, such as admixture from ghost populations, drift after admixture events and non-discrete ancestral populations. We have implemented the method as an open source software that can be applied to both unphased genotypes and next generation sequencing data.",

author = "Gen{\'i}s Garcia-Erill and Anders Albrechtsen",

year = "2020",

doi = "10.1111/1755-0998.13171",

language = "English",

volume = "20",

pages = "936--949",

journal = "Molecular Ecology",

issn = "0962-1083",

publisher = "Wiley-Blackwell",

number = "4",

}

RIS

TY - JOUR

T1 - Evaluation of model fit of inferred admixture proportions

AU - Garcia-Erill, Genís

AU - Albrechtsen, Anders

PY - 2020

Y1 - 2020

N2 - Model based methods for genetic clustering of individuals such as those implemented in structure or ADMIXTURE allow to infer individual ancestries and study population structure. The underlying model makes several assumptions about the demographic history that shaped the analysed genetic data. One assumption is that all individuals are a result of K homogeneous ancestral populations that are all well represented in the data, while another assumption is that no drift happened after the admixture event. The histories of many real world populations do not conform to that model, and in that case taking the inferred admixture proportions at face value might be misleading. We propose a method to evaluate the fit of admixture models based on estimating the correlation of the residual difference between the true genotypes and the genotypes predicted by the model. When the model assumptions are not violated, the residuals from a pair of individuals are not correlated. In case of a bad fit, individuals with similar demographic histories have a positive correlation of their residuals. Using simulated and real data, we show how the method is able to detect a bad fit of inferred admixture proportions due to using an insufficient number of clusters K or to demographic histories that deviate significantly from the admixture model assumptions, such as admixture from ghost populations, drift after admixture events and non-discrete ancestral populations. We have implemented the method as an open source software that can be applied to both unphased genotypes and next generation sequencing data.

AB - Model based methods for genetic clustering of individuals such as those implemented in structure or ADMIXTURE allow to infer individual ancestries and study population structure. The underlying model makes several assumptions about the demographic history that shaped the analysed genetic data. One assumption is that all individuals are a result of K homogeneous ancestral populations that are all well represented in the data, while another assumption is that no drift happened after the admixture event. The histories of many real world populations do not conform to that model, and in that case taking the inferred admixture proportions at face value might be misleading. We propose a method to evaluate the fit of admixture models based on estimating the correlation of the residual difference between the true genotypes and the genotypes predicted by the model. When the model assumptions are not violated, the residuals from a pair of individuals are not correlated. In case of a bad fit, individuals with similar demographic histories have a positive correlation of their residuals. Using simulated and real data, we show how the method is able to detect a bad fit of inferred admixture proportions due to using an insufficient number of clusters K or to demographic histories that deviate significantly from the admixture model assumptions, such as admixture from ghost populations, drift after admixture events and non-discrete ancestral populations. We have implemented the method as an open source software that can be applied to both unphased genotypes and next generation sequencing data.

U2 - 10.1111/1755-0998.13171

DO - 10.1111/1755-0998.13171

M3 - Journal article

C2 - 32323416

VL - 20

SP - 936

EP - 949

JO - Molecular Ecology

JF - Molecular Ecology

SN - 0962-1083

IS - 4

ER -

ID: 240252048