Variation graph toolkit improves read mapping by representing genetic variation in the reference
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Variation graph toolkit improves read mapping by representing genetic variation in the reference. / Garrison, Erik; Sirén, Jouni; Novak, Adam M.; Hickey, Glenn; Eizenga, Jordan M.; Dawson, Eric T.; Jones, William; Garg, Shilpa; Markello, Charles; Lin, Michael F.; Paten, Benedict; Durbin, Richard.
In: Nature Biotechnology, Vol. 36, No. 9, 2018, p. 875-879.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Variation graph toolkit improves read mapping by representing genetic variation in the reference
AU - Garrison, Erik
AU - Sirén, Jouni
AU - Novak, Adam M.
AU - Hickey, Glenn
AU - Eizenga, Jordan M.
AU - Dawson, Eric T.
AU - Jones, William
AU - Garg, Shilpa
AU - Markello, Charles
AU - Lin, Michael F.
AU - Paten, Benedict
AU - Durbin, Richard
PY - 2018
Y1 - 2018
N2 - Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.
AB - Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.
KW - Computer Simulation
KW - DNA/genetics
KW - Genetic Variation
KW - Humans
U2 - 10.1038/nbt.4227
DO - 10.1038/nbt.4227
M3 - Journal article
C2 - 30125266
VL - 36
SP - 875
EP - 879
JO - Nature Biotechnology
JF - Nature Biotechnology
SN - 1087-0156
IS - 9
ER -
ID: 255785460