Chromosome-scale, haplotype-resolved assembly of human genomes
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
Chromosome-scale, haplotype-resolved assembly of human genomes. / Garg, Shilpa; Fungtammasan, Arkarachai; Carroll, Andrew; Chou, Mike; Schmitt, Anthony; Zhou, Xiang; Mac, Stephen; Peluso, Paul; Hatas, Emily; Ghurye, Jay; Maguire, Jared; Mahmoud, Medhat; Cheng, Haoyu; Heller, David; Zook, Justin M.; Moemke, Tobias; Marschall, Tobias; Sedlazeck, Fritz J.; Aach, John; Chin, Chen-Shan; Church, George M.; Li, Heng.
In: Nature Biotechnology, 07.12.2020.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Chromosome-scale, haplotype-resolved assembly of human genomes
AU - Garg, Shilpa
AU - Fungtammasan, Arkarachai
AU - Carroll, Andrew
AU - Chou, Mike
AU - Schmitt, Anthony
AU - Zhou, Xiang
AU - Mac, Stephen
AU - Peluso, Paul
AU - Hatas, Emily
AU - Ghurye, Jay
AU - Maguire, Jared
AU - Mahmoud, Medhat
AU - Cheng, Haoyu
AU - Heller, David
AU - Zook, Justin M.
AU - Moemke, Tobias
AU - Marschall, Tobias
AU - Sedlazeck, Fritz J.
AU - Aach, John
AU - Chin, Chen-Shan
AU - Church, George M.
AU - Li, Heng
PY - 2020/12/7
Y1 - 2020/12/7
N2 - Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98-99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.
AB - Haplotype-resolved or phased genome assembly provides a complete picture of genomes and their complex genetic variations. However, current algorithms for phased assembly either do not generate chromosome-scale phasing or require pedigree information, which limits their application. We present a method named diploid assembly (DipAsm) that uses long, accurate reads and long-range conformation data for single individuals to generate a chromosome-scale phased assembly within 1 day. Applied to four public human genomes, PGP1, HG002, NA12878 and HG00733, DipAsm produced haplotype-resolved assemblies with minimum contig length needed to cover 50% of the known genome (NG50) up to 25 Mb and phased ~99.5% of heterozygous sites at 98-99% accuracy, outperforming other approaches in terms of both contiguity and phasing completeness. We demonstrate the importance of chromosome-scale phased assemblies for the discovery of structural variants (SVs), including thousands of new transposon insertions, and of highly polymorphic and medically important regions such as the human leukocyte antigen (HLA) and killer cell immunoglobulin-like receptor (KIR) regions. DipAsm will facilitate high-quality precision medicine and studies of individual haplotype variation and population diversity.
U2 - 10.1038/s41587-020-0711-0
DO - 10.1038/s41587-020-0711-0
M3 - Journal article
C2 - 33288905
JO - Nature Biotechnology
JF - Nature Biotechnology
SN - 1087-0156
ER -
ID: 255785031