A robust benchmark for detection of germline large deletions and insertions

Research output: Contribution to journalJournal articleResearchpeer-review

  • Justin M. Zook
  • Nancy F. Hansen
  • Nathan D. Olson
  • Lesley Chapman
  • James C. Mullikin
  • Chunlin Xiao
  • Stephen Sherry
  • Sergey Koren
  • Adam M. Phillippy
  • Paul C. Boutros
  • Sayed Mohammad E. Sahraeian
  • Vincent Huang
  • Alexandre Rouette
  • Noah Alexander
  • Christopher E. Mason
  • Iman Hajirasouliha
  • Camir Ricketts
  • Joyce Lee
  • Rick Tearle
  • Ian T. Fiddes
  • And 30 others
  • Alvaro Martinez Barrio
  • Jeremiah Wala
  • Andrew Carroll
  • Noushin Ghaffari
  • Oscar L. Rodriguez
  • Ali Bashir
  • Shaun Jackman
  • John J. Farrell
  • Aaron M. Wenger
  • Can Alkan
  • Arda Soylev
  • Michael C. Schatz
  • Shilpa Garg
  • George Church
  • Tobias Marschall
  • Ken Chen
  • Xian Fan
  • Adam C. English
  • Jeffrey A. Rosenfeld
  • Weichen Zhou
  • Ryan E. Mills
  • Jay M. Sage
  • Jennifer R. Davis
  • Michael D. Kaiser
  • John S. Oliver
  • Anthony P. Catalano
  • Mark J. P. Chaisson
  • Noah Spies
  • Fritz J. Sedlazeck
  • Marc Salit

Detection of structural variants in the human genome is facilitated by a benchmark set of large deletions and insertions.

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls >= 50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by >= 1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.

Original languageEnglish
JournalNature Biotechnology
Volume38
Issue number11
Pages (from-to)1347-+
Number of pages14
ISSN1087-0156
DOIs
Publication statusPublished - Nov 2020
Externally publishedYes

    Research areas

  • STRUCTURAL VARIATION, HUMAN GENOME, VARIANTS, RESOURCE, SNP

ID: 257032178