Efficient approaches for large-scale GWAS with genotype uncertainty

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Efficient approaches for large-scale GWAS with genotype uncertainty. / Jørsboe, Emil; Albrechtsen, Anders.

In: G3: Genes, Genomes, Genetics (Bethesda), Vol. 12, No. 1, jkab385, 2022.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Jørsboe, E & Albrechtsen, A 2022, 'Efficient approaches for large-scale GWAS with genotype uncertainty', G3: Genes, Genomes, Genetics (Bethesda), vol. 12, no. 1, jkab385. https://doi.org/10.1093/g3journal/jkab385

APA

Jørsboe, E., & Albrechtsen, A. (2022). Efficient approaches for large-scale GWAS with genotype uncertainty. G3: Genes, Genomes, Genetics (Bethesda), 12(1), [jkab385]. https://doi.org/10.1093/g3journal/jkab385

Vancouver

Jørsboe E, Albrechtsen A. Efficient approaches for large-scale GWAS with genotype uncertainty. G3: Genes, Genomes, Genetics (Bethesda). 2022;12(1). jkab385. https://doi.org/10.1093/g3journal/jkab385

Author

Jørsboe, Emil ; Albrechtsen, Anders. / Efficient approaches for large-scale GWAS with genotype uncertainty. In: G3: Genes, Genomes, Genetics (Bethesda). 2022 ; Vol. 12, No. 1.

Bibtex

@article{f3cd6983188544bea609dd7b514237b6,
title = "Efficient approaches for large-scale GWAS with genotype uncertainty",
abstract = "Association studies using genetic data from SNP-chip-based imputation or low-depth sequencing data provide a cost-efficient design for large-scale association studies. We explore methods for performing association studies applicable to such genetic data and investigate how using different priors when estimating genotype probabilities affects the association results. Our proposed method, ANGSD-asso's latent model, models the unobserved genotype as a latent variable in a generalized linear model framework. The software is implemented in C/C++ and can be run multi-threaded. ANGSD-asso is based on genotype probabilities, which can be estimated using either the sample allele frequency or the individual allele frequencies as a prior. We explore through simulations how genotype probability-based methods compare with using genetic dosages. Our simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. In scenarios with sequencing depth and phenotype correlation ANGSD-asso's latent model has higher statistical power and less bias than using dosages. Adding additional covariates to the linear model of ANGSD-asso's latent model has higher statistical power and less bias than other methods that accommodate genotype uncertainty, while also being much faster. This is shown with imputed data from UK Biobank and simulations.",
keywords = "admixture, association mapping, case-control study, next-generation sequencing, quantitative traits, GENOME-WIDE ASSOCIATION, POPULATION STRATIFICATION, IMPUTATION, REGRESSION",
author = "Emil J{\o}rsboe and Anders Albrechtsen",
year = "2022",
doi = "10.1093/g3journal/jkab385",
language = "English",
volume = "12",
journal = "G3: Genes, Genomes, Genetics (Bethesda)",
issn = "2160-1836",
publisher = "Genetics Society of America",
number = "1",

}

RIS

TY - JOUR

T1 - Efficient approaches for large-scale GWAS with genotype uncertainty

AU - Jørsboe, Emil

AU - Albrechtsen, Anders

PY - 2022

Y1 - 2022

N2 - Association studies using genetic data from SNP-chip-based imputation or low-depth sequencing data provide a cost-efficient design for large-scale association studies. We explore methods for performing association studies applicable to such genetic data and investigate how using different priors when estimating genotype probabilities affects the association results. Our proposed method, ANGSD-asso's latent model, models the unobserved genotype as a latent variable in a generalized linear model framework. The software is implemented in C/C++ and can be run multi-threaded. ANGSD-asso is based on genotype probabilities, which can be estimated using either the sample allele frequency or the individual allele frequencies as a prior. We explore through simulations how genotype probability-based methods compare with using genetic dosages. Our simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. In scenarios with sequencing depth and phenotype correlation ANGSD-asso's latent model has higher statistical power and less bias than using dosages. Adding additional covariates to the linear model of ANGSD-asso's latent model has higher statistical power and less bias than other methods that accommodate genotype uncertainty, while also being much faster. This is shown with imputed data from UK Biobank and simulations.

AB - Association studies using genetic data from SNP-chip-based imputation or low-depth sequencing data provide a cost-efficient design for large-scale association studies. We explore methods for performing association studies applicable to such genetic data and investigate how using different priors when estimating genotype probabilities affects the association results. Our proposed method, ANGSD-asso's latent model, models the unobserved genotype as a latent variable in a generalized linear model framework. The software is implemented in C/C++ and can be run multi-threaded. ANGSD-asso is based on genotype probabilities, which can be estimated using either the sample allele frequency or the individual allele frequencies as a prior. We explore through simulations how genotype probability-based methods compare with using genetic dosages. Our simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. In scenarios with sequencing depth and phenotype correlation ANGSD-asso's latent model has higher statistical power and less bias than using dosages. Adding additional covariates to the linear model of ANGSD-asso's latent model has higher statistical power and less bias than other methods that accommodate genotype uncertainty, while also being much faster. This is shown with imputed data from UK Biobank and simulations.

KW - admixture

KW - association mapping

KW - case-control study

KW - next-generation sequencing

KW - quantitative traits

KW - GENOME-WIDE ASSOCIATION

KW - POPULATION STRATIFICATION

KW - IMPUTATION

KW - REGRESSION

U2 - 10.1093/g3journal/jkab385

DO - 10.1093/g3journal/jkab385

M3 - Journal article

C2 - 34865001

VL - 12

JO - G3: Genes, Genomes, Genetics (Bethesda)

JF - G3: Genes, Genomes, Genetics (Bethesda)

SN - 2160-1836

IS - 1

M1 - jkab385

ER -

ID: 291215962