A graph-based approach to diploid genome assembly

Research output: Contribution to journalJournal articleResearchpeer-review

Documents

  • Shilpa Garg
  • Mikko Rautiainen
  • Adam M. Novak
  • Erik Garrison
  • Richard Durbin
  • Tobias Marschall

Motivation: Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community.

Results: We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants.

Availability and implementation: https://github.com/whatshap/whatshap.

Supplementary information: Supplementary data are available at Bioinformatics online.

Original languageEnglish
JournalBioinformatics
Volume34
Issue number13
Pages (from-to)i105-i114
ISSN1367-4803
DOIs
Publication statusPublished - 2018
Externally publishedYes

    Research areas

  • Data Visualization, Diploidy, Genome, Fungal, Haplotypes, High-Throughput Nucleotide Sequencing/methods, Sequence Analysis, DNA/methods, Yeasts/genetics

Number of downloads are based on statistics from Google Scholar and www.ku.dk


No data available

ID: 255785310