SNP detection for massively parallel whole-genome resequencing

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

SNP detection for massively parallel whole-genome resequencing. / Li, Ruiqiang; Li, Yingrui; Fang, Xiaodong; Yang, Huanming; Wang, Jian; Kristiansen, Karsten; Wang, Jun.

In: Genome Research, Vol. 19, No. 6, 2009, p. 1124-32.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Li, R, Li, Y, Fang, X, Yang, H, Wang, J, Kristiansen, K & Wang, J 2009, 'SNP detection for massively parallel whole-genome resequencing', Genome Research, vol. 19, no. 6, pp. 1124-32. https://doi.org/10.1101/gr.088013.108

APA

Li, R., Li, Y., Fang, X., Yang, H., Wang, J., Kristiansen, K., & Wang, J. (2009). SNP detection for massively parallel whole-genome resequencing. Genome Research, 19(6), 1124-32. https://doi.org/10.1101/gr.088013.108

Vancouver

Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K et al. SNP detection for massively parallel whole-genome resequencing. Genome Research. 2009;19(6):1124-32. https://doi.org/10.1101/gr.088013.108

Author

Li, Ruiqiang ; Li, Yingrui ; Fang, Xiaodong ; Yang, Huanming ; Wang, Jian ; Kristiansen, Karsten ; Wang, Jun. / SNP detection for massively parallel whole-genome resequencing. In: Genome Research. 2009 ; Vol. 19, No. 6. pp. 1124-32.

Bibtex

@article{727b58c050e611de87b8000ea68e967b,

title = "SNP detection for massively parallel whole-genome resequencing",

abstract = "Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.",

author = "Ruiqiang Li and Yingrui Li and Xiaodong Fang and Huanming Yang and Jian Wang and Karsten Kristiansen and Jun Wang",

year = "2009",

doi = "10.1101/gr.088013.108",

language = "English",

volume = "19",

pages = "1124--32",

journal = "Genome Research",

issn = "1088-9051",

publisher = "Cold Spring Harbor Laboratory Press",

number = "6",

}

RIS

TY - JOUR

T1 - SNP detection for massively parallel whole-genome resequencing

AU - Li, Ruiqiang

AU - Li, Yingrui

AU - Fang, Xiaodong

AU - Yang, Huanming

AU - Wang, Jian

AU - Kristiansen, Karsten

AU - Wang, Jun

PY - 2009

Y1 - 2009

N2 - Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.

AB - Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.

U2 - 10.1101/gr.088013.108

DO - 10.1101/gr.088013.108

M3 - Journal article

C2 - 19420381

VL - 19

SP - 1124

EP - 1132

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 6

ER -

ID: 12491401

Department of Biology