Variation graph toolkit improves read mapping by representing genetic variation in the reference

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

Variation graph toolkit improves read mapping by representing genetic variation in the reference. / Garrison, Erik; Sirén, Jouni; Novak, Adam M.; Hickey, Glenn; Eizenga, Jordan M.; Dawson, Eric T.; Jones, William; Garg, Shilpa; Markello, Charles; Lin, Michael F.; Paten, Benedict; Durbin, Richard.

In: Nature Biotechnology, Vol. 36, No. 9, 2018, p. 875-879.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Garrison, E, Sirén, J, Novak, AM, Hickey, G, Eizenga, JM, Dawson, ET, Jones, W, Garg, S, Markello, C, Lin, MF, Paten, B & Durbin, R 2018, 'Variation graph toolkit improves read mapping by representing genetic variation in the reference', Nature Biotechnology, vol. 36, no. 9, pp. 875-879. https://doi.org/10.1038/nbt.4227

APA

Garrison, E., Sirén, J., Novak, A. M., Hickey, G., Eizenga, J. M., Dawson, E. T., Jones, W., Garg, S., Markello, C., Lin, M. F., Paten, B., & Durbin, R. (2018). Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology, 36(9), 875-879. https://doi.org/10.1038/nbt.4227

Vancouver

Garrison E, Sirén J, Novak AM, Hickey G, Eizenga JM, Dawson ET et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology. 2018;36(9):875-879. https://doi.org/10.1038/nbt.4227

Author

Garrison, Erik ; Sirén, Jouni ; Novak, Adam M. ; Hickey, Glenn ; Eizenga, Jordan M. ; Dawson, Eric T. ; Jones, William ; Garg, Shilpa ; Markello, Charles ; Lin, Michael F. ; Paten, Benedict ; Durbin, Richard. / Variation graph toolkit improves read mapping by representing genetic variation in the reference. In: Nature Biotechnology. 2018 ; Vol. 36, No. 9. pp. 875-879.

Bibtex

@article{2d6fa916051246dba0ca79550de8e487,

title = "Variation graph toolkit improves read mapping by representing genetic variation in the reference",

abstract = "Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.",

keywords = "Computer Simulation, DNA/genetics, Genetic Variation, Humans",

author = "Erik Garrison and Jouni Sir{\'e}n and Novak, {Adam M.} and Glenn Hickey and Eizenga, {Jordan M.} and Dawson, {Eric T.} and William Jones and Shilpa Garg and Charles Markello and Lin, {Michael F.} and Benedict Paten and Richard Durbin",

year = "2018",

doi = "10.1038/nbt.4227",

language = "English",

volume = "36",

pages = "875--879",

journal = "Nature Biotechnology",

issn = "1087-0156",

publisher = "nature publishing group",

number = "9",

}

RIS

TY - JOUR

T1 - Variation graph toolkit improves read mapping by representing genetic variation in the reference

AU - Garrison, Erik

AU - Sirén, Jouni

AU - Novak, Adam M.

AU - Hickey, Glenn

AU - Eizenga, Jordan M.

AU - Dawson, Eric T.

AU - Jones, William

AU - Garg, Shilpa

AU - Markello, Charles

AU - Lin, Michael F.

AU - Paten, Benedict

AU - Durbin, Richard

PY - 2018

Y1 - 2018

N2 - Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.

AB - Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.

KW - Computer Simulation

KW - DNA/genetics

KW - Genetic Variation

KW - Humans

U2 - 10.1038/nbt.4227

DO - 10.1038/nbt.4227

M3 - Journal article

C2 - 30125266

VL - 36

SP - 875

EP - 879

JO - Nature Biotechnology

JF - Nature Biotechnology

SN - 1087-0156

IS - 9

ER -

ID: 255785460

Department of Biology