A diploid assembly-based benchmark for variants in the major histocompatibility complex

Research output: Contribution to journalJournal articleResearchpeer-review

  • Chen-Shan Chin
  • Justin Wagner
  • Qiandong Zeng
  • Erik Garrison
  • Shilpa Garg
  • Arkarachai Fungtammasan
  • Mikko Rautiainen
  • Sergey Aganezov
  • Melanie Kirsche
  • Samantha Zarate
  • Michael C. Schatz
  • Chunlin Xiao
  • William J. Rowell
  • Charles Markello
  • Jesse Farek
  • Fritz J Sedlazeck
  • Vikas Bansal
  • Byunggil Yoo
  • Neil Miller
  • Xin Zhou
  • And 6 others
  • Andrew Carroll
  • Alvaro Martinez Barrio
  • Marc Salit
  • Tobias Marschall
  • Alexander T. Dilthey
  • Justin M. Zook

Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - the Major Histocompatibility Complex (MHC). Here, we develop a human genome benchmark derived from a diploid assembly for the openly-consented Genome in a Bottle sample HG002. We assemble a single contig for each haplotype, align them to the reference, call phased small and structural variants, and define a small variant benchmark for the MHC, covering 94% of the MHC and 22368 variants smaller than 50 bp, 49% more variants than a mapping-based benchmark. This benchmark reliably identifies errors in mapping-based callsets, and enables performance assessment in regions with much denser, complex variation than regions covered by previous benchmarks.

Original languageEnglish
Article number4794
JournalNature Communications
Volume11
Number of pages9
ISSN2041-1723
DOIs
Publication statusPublished - 2020
Externally publishedYes

    Research areas

  • Benchmarking, Cell Line, Diploidy, Genetic Variation, Genome, Human, Haplotypes, Humans, Major Histocompatibility Complex/genetics

Number of downloads are based on statistics from Google Scholar and www.ku.dk


No data available

ID: 255784755