A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data

Publikation: Bidrag til tidsskriftTidsskriftartikelfagfællebedømt

Standard

A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data. / Moreno-Mayar, J Víctor; Korneliussen, Thorfinn Sand; Dalal, Jyoti; Renaud, Gabriel; Albrechtsen, Anders; Nielsen, Rasmus; Malaspinas, Anna-Sapfo.

I: Bioinformatics, Bind 36, Nr. 3, 2020, s. 828-841.

Publikation: Bidrag til tidsskriftTidsskriftartikelfagfællebedømt

Harvard

Moreno-Mayar, JV, Korneliussen, TS, Dalal, J, Renaud, G, Albrechtsen, A, Nielsen, R & Malaspinas, A-S 2020, 'A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data', Bioinformatics, bind 36, nr. 3, s. 828-841. https://doi.org/10.1093/bioinformatics/btz660

APA

Moreno-Mayar, J. V., Korneliussen, T. S., Dalal, J., Renaud, G., Albrechtsen, A., Nielsen, R., & Malaspinas, A-S. (2020). A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data. Bioinformatics, 36(3), 828-841. https://doi.org/10.1093/bioinformatics/btz660

Vancouver

Moreno-Mayar JV, Korneliussen TS, Dalal J, Renaud G, Albrechtsen A, Nielsen R o.a. A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data. Bioinformatics. 2020;36(3):828-841. https://doi.org/10.1093/bioinformatics/btz660

Author

Moreno-Mayar, J Víctor ; Korneliussen, Thorfinn Sand ; Dalal, Jyoti ; Renaud, Gabriel ; Albrechtsen, Anders ; Nielsen, Rasmus ; Malaspinas, Anna-Sapfo. / A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data. I: Bioinformatics. 2020 ; Bind 36, Nr. 3. s. 828-841.

Bibtex

@article{8cc9fdfd46a74eaba387444803701f2a,
title = "A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data",
abstract = "MOTIVATION: The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested for general use while few are aimed at low-depth nuclear data, a common feature in aDNA datasets.RESULTS: We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data from male individuals. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e., when the contaminant and the target come from closely related populations or with increased error rates. With a running time below five minutes, our method is applicable to large scale aDNA genomic studies.AVAILABILITY: The method is implemented in C++ and R and is available in github.com/sapfo/contaminationX and popgen.dk/angsd.",
author = "Moreno-Mayar, {J V{\'i}ctor} and Korneliussen, {Thorfinn Sand} and Jyoti Dalal and Gabriel Renaud and Anders Albrechtsen and Rasmus Nielsen and Anna-Sapfo Malaspinas",
note = "{\textcopyright} The Author(s) (2019). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.",
year = "2020",
doi = "10.1093/bioinformatics/btz660",
language = "English",
volume = "36",
pages = "828--841",
journal = "Computer Applications in the Biosciences",
issn = "1471-2105",
publisher = "Oxford University Press",
number = "3",

}

RIS

TY - JOUR

T1 - A likelihood method for estimating present-day human contamination in ancient male samples using low-depth X-chromosome data

AU - Moreno-Mayar, J Víctor

AU - Korneliussen, Thorfinn Sand

AU - Dalal, Jyoti

AU - Renaud, Gabriel

AU - Albrechtsen, Anders

AU - Nielsen, Rasmus

AU - Malaspinas, Anna-Sapfo

N1 - © The Author(s) (2019). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

PY - 2020

Y1 - 2020

N2 - MOTIVATION: The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested for general use while few are aimed at low-depth nuclear data, a common feature in aDNA datasets.RESULTS: We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data from male individuals. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e., when the contaminant and the target come from closely related populations or with increased error rates. With a running time below five minutes, our method is applicable to large scale aDNA genomic studies.AVAILABILITY: The method is implemented in C++ and R and is available in github.com/sapfo/contaminationX and popgen.dk/angsd.

AB - MOTIVATION: The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested for general use while few are aimed at low-depth nuclear data, a common feature in aDNA datasets.RESULTS: We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data from male individuals. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e., when the contaminant and the target come from closely related populations or with increased error rates. With a running time below five minutes, our method is applicable to large scale aDNA genomic studies.AVAILABILITY: The method is implemented in C++ and R and is available in github.com/sapfo/contaminationX and popgen.dk/angsd.

U2 - 10.1093/bioinformatics/btz660

DO - 10.1093/bioinformatics/btz660

M3 - Journal article

C2 - 31504166

VL - 36

SP - 828

EP - 841

JO - Computer Applications in the Biosciences

JF - Computer Applications in the Biosciences

SN - 1471-2105

IS - 3

ER -

ID: 227429532