A haplotype-aware de novo assembly of related individuals using pedigree sequence graph

Research output: Contribution to journalJournal articleResearchpeer-review

  • Shilpa Garg
  • John Aach
  • Heng Li
  • Isaac Sebenius
  • Richard Durbin
  • George Church

MOTIVATION: Reconstructing high-quality haplotype-resolved assemblies for related individuals has important applications in Mendelian diseases and population genomics. Through major genomics sequencing efforts such as the Personal Genome Project, the Vertebrate Genome Project (VGP) and the Genome in a Bottle project (GIAB), a variety of sequencing datasets from trios of diploid genomes are becoming available. Current trio assembly approaches are not designed to incorporate long- and short-read data from mother-father-child trios, and therefore require relatively high coverages of costly long-read data to produce high-quality assemblies. Thus, building a trio-aware assembler capable of producing accurate and chromosomal-scale diploid genomes of all individuals in a pedigree, while being cost-effective in terms of sequencing costs, is a pressing need of the genomics community.

RESULTS: We present a novel pedigree sequence graph based approach to diploid assembly using accurate Illumina data and long-read Pacific Biosciences (PacBio) data from all related individuals, thereby generalizing our previous work on single individuals. We demonstrate the effectiveness of our pedigree approach on a simulated trio of pseudo-diploid yeast genomes with different heterozygosity rates, and real data from human chromosome. We show that we require as little as 30× coverage Illumina data and 15× PacBio data from each individual in a trio to generate chromosomal-scale phased assemblies. Additionally, we show that we can detect and phase variants from generated phased assemblies.

AVAILABILITY AND IMPLEMENTATION: https://github.com/shilpagarg/WHdenovo.

Original languageEnglish
JournalBioinformatics
Volume36
Issue number8
Pages (from-to)2385-2392
Number of pages8
ISSN1367-4803
DOIs
Publication statusPublished - 2020
Externally publishedYes

Bibliographical note

© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

    Research areas

  • Genome, Genomics, Haplotypes, High-Throughput Nucleotide Sequencing, Humans, Pedigree, Sequence Analysis, DNA

ID: 255785160