A robust benchmark for detection of germline large deletions and insertions

Research output: Contribution to journal › Journal article › Research › peer-review

Justin M. Zook
Nancy F. Hansen
Nathan D. Olson
Lesley Chapman
James C. Mullikin
Chunlin Xiao
Stephen Sherry
Sergey Koren
Adam M. Phillippy
Paul C. Boutros
Sayed Mohammad E. Sahraeian
Vincent Huang
Alexandre Rouette
Noah Alexander
Christopher E. Mason
Iman Hajirasouliha
Camir Ricketts
Joyce Lee
Rick Tearle
Ian T. Fiddes
And 30 others

Alvaro Martinez Barrio
Jeremiah Wala
Andrew Carroll
Noushin Ghaffari
Oscar L. Rodriguez
Ali Bashir
Shaun Jackman
John J. Farrell
Aaron M. Wenger
Can Alkan
Arda Soylev
Michael C. Schatz
Shilpa Garg
George Church
Tobias Marschall
Ken Chen
Xian Fan
Adam C. English
Jeffrey A. Rosenfeld
Weichen Zhou
Ryan E. Mills
Jay M. Sage
Jennifer R. Davis
Michael D. Kaiser
John S. Oliver
Anthony P. Catalano
Mark J. P. Chaisson
Noah Spies
Fritz J. Sedlazeck
Marc Salit

Detection of structural variants in the human genome is facilitated by a benchmark set of large deletions and insertions.

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls >= 50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by >= 1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.

Original language	English
Journal	Nature Biotechnology
Volume	38
Issue number	11
Pages (from-to)	1347-+
Number of pages	14
ISSN	1087-0156
DOIs	https://doi.org/10.1038/s41587-020-0538-8
Publication status	Published - Nov 2020
Externally published	Yes

Research areas

STRUCTURAL VARIATION, HUMAN GENOME, VARIANTS, RESOURCE, SNP

ID: 257032178

Department of Biology

A robust benchmark for detection of germline large deletions and insertions

Research areas