Semi-automated assembly of high-quality diploid human reference genomes

Research output: Contribution to journal › Journal article › Research › peer-review

Standard

Semi-automated assembly of high-quality diploid human reference genomes. / Jarvis, Erich D.; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R.; Porubsky, David; Cheng, Haoyu; Asri, Mobin; Logsdon, Glennis A.; Carnevali, Paolo; Chaisson, Mark J.P.; Chin, Chen Shan; Cody, Sarah; Collins, Joanna; Ebert, Peter; Escalona, Merly; Fedrigo, Olivier; Fulton, Robert S.; Fulton, Lucinda L.; Garg, Shilpa; Gerton, Jennifer L.; Ghurye, Jay; Granat, Anastasiya; Green, Richard E.; Harvey, William; Hasenfeld, Patrick; Hastie, Alex; Haukness, Marina; Jaeger, Erich B.; Jain, Miten; Kirsche, Melanie; Kolmogorov, Mikhail; Korbel, Jan O.; Koren, Sergey; Korlach, Jonas; Lee, Joyce; Li, Daofeng; Lindsay, Tina; Lucas, Julian; Luo, Feng; Marschall, Tobias; Mitchell, Matthew W.; McDaniel, Jennifer; Nie, Fan; Zhang, Guojie; Li, Heng; Human Pangenome Reference Consortium.

In: Nature, Vol. 611, No. 7936, 2022.

Research output: Contribution to journal › Journal article › Research › peer-review

Harvard

Jarvis, ED, Formenti, G, Rhie, A, Guarracino, A, Yang, C, Wood, J, Tracey, A, Thibaud-Nissen, F, Vollger, MR, Porubsky, D, Cheng, H, Asri, M, Logsdon, GA, Carnevali, P, Chaisson, MJP, Chin, CS, Cody, S, Collins, J, Ebert, P, Escalona, M, Fedrigo, O, Fulton, RS, Fulton, LL, Garg, S, Gerton, JL, Ghurye, J, Granat, A, Green, RE, Harvey, W, Hasenfeld, P, Hastie, A, Haukness, M, Jaeger, EB, Jain, M, Kirsche, M, Kolmogorov, M, Korbel, JO, Koren, S, Korlach, J, Lee, J, Li, D, Lindsay, T, Lucas, J, Luo, F, Marschall, T, Mitchell, MW, McDaniel, J, Nie, F, Zhang, G, Li, H & Human Pangenome Reference Consortium 2022, 'Semi-automated assembly of high-quality diploid human reference genomes', Nature, vol. 611, no. 7936. https://doi.org/10.1038/s41586-022-05325-5

APA

Jarvis, E. D., Formenti, G., Rhie, A., Guarracino, A., Yang, C., Wood, J., Tracey, A., Thibaud-Nissen, F., Vollger, M. R., Porubsky, D., Cheng, H., Asri, M., Logsdon, G. A., Carnevali, P., Chaisson, M. J. P., Chin, C. S., Cody, S., Collins, J., Ebert, P., ... Human Pangenome Reference Consortium (2022). Semi-automated assembly of high-quality diploid human reference genomes. Nature, 611(7936). https://doi.org/10.1038/s41586-022-05325-5

Vancouver

Jarvis ED, Formenti G, Rhie A, Guarracino A, Yang C, Wood J et al. Semi-automated assembly of high-quality diploid human reference genomes. Nature. 2022;611(7936). https://doi.org/10.1038/s41586-022-05325-5

Author

Jarvis, Erich D. ; Formenti, Giulio ; Rhie, Arang ; Guarracino, Andrea ; Yang, Chentao ; Wood, Jonathan ; Tracey, Alan ; Thibaud-Nissen, Francoise ; Vollger, Mitchell R. ; Porubsky, David ; Cheng, Haoyu ; Asri, Mobin ; Logsdon, Glennis A. ; Carnevali, Paolo ; Chaisson, Mark J.P. ; Chin, Chen Shan ; Cody, Sarah ; Collins, Joanna ; Ebert, Peter ; Escalona, Merly ; Fedrigo, Olivier ; Fulton, Robert S. ; Fulton, Lucinda L. ; Garg, Shilpa ; Gerton, Jennifer L. ; Ghurye, Jay ; Granat, Anastasiya ; Green, Richard E. ; Harvey, William ; Hasenfeld, Patrick ; Hastie, Alex ; Haukness, Marina ; Jaeger, Erich B. ; Jain, Miten ; Kirsche, Melanie ; Kolmogorov, Mikhail ; Korbel, Jan O. ; Koren, Sergey ; Korlach, Jonas ; Lee, Joyce ; Li, Daofeng ; Lindsay, Tina ; Lucas, Julian ; Luo, Feng ; Marschall, Tobias ; Mitchell, Matthew W. ; McDaniel, Jennifer ; Nie, Fan ; Zhang, Guojie ; Li, Heng ; Human Pangenome Reference Consortium. / Semi-automated assembly of high-quality diploid human reference genomes. In: Nature. 2022 ; Vol. 611, No. 7936.

Bibtex

@article{f60741edf22344949c450acab43309f7,

title = "Semi-automated assembly of high-quality diploid human reference genomes",

abstract = "The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.",

author = "Jarvis, {Erich D.} and Giulio Formenti and Arang Rhie and Andrea Guarracino and Chentao Yang and Jonathan Wood and Alan Tracey and Francoise Thibaud-Nissen and Vollger, {Mitchell R.} and David Porubsky and Haoyu Cheng and Mobin Asri and Logsdon, {Glennis A.} and Paolo Carnevali and Chaisson, {Mark J.P.} and Chin, {Chen Shan} and Sarah Cody and Joanna Collins and Peter Ebert and Merly Escalona and Olivier Fedrigo and Fulton, {Robert S.} and Fulton, {Lucinda L.} and Shilpa Garg and Gerton, {Jennifer L.} and Jay Ghurye and Anastasiya Granat and Green, {Richard E.} and William Harvey and Patrick Hasenfeld and Alex Hastie and Marina Haukness and Jaeger, {Erich B.} and Miten Jain and Melanie Kirsche and Mikhail Kolmogorov and Korbel, {Jan O.} and Sergey Koren and Jonas Korlach and Joyce Lee and Daofeng Li and Tina Lindsay and Julian Lucas and Feng Luo and Tobias Marschall and Mitchell, {Matthew W.} and Jennifer McDaniel and Fan Nie and Guojie Zhang and Heng Li and {Human Pangenome Reference Consortium}",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2022",

doi = "10.1038/s41586-022-05325-5",

language = "English",

volume = "611",

journal = "Nature",

issn = "0028-0836",

publisher = "nature publishing group",

number = "7936",

}

RIS

TY - JOUR

T1 - Semi-automated assembly of high-quality diploid human reference genomes

AU - Jarvis, Erich D.

AU - Formenti, Giulio

AU - Rhie, Arang

AU - Guarracino, Andrea

AU - Yang, Chentao

AU - Wood, Jonathan

AU - Tracey, Alan

AU - Thibaud-Nissen, Francoise

AU - Vollger, Mitchell R.

AU - Porubsky, David

AU - Cheng, Haoyu

AU - Asri, Mobin

AU - Logsdon, Glennis A.

AU - Carnevali, Paolo

AU - Chaisson, Mark J.P.

AU - Chin, Chen Shan

AU - Cody, Sarah

AU - Collins, Joanna

AU - Ebert, Peter

AU - Escalona, Merly

AU - Fedrigo, Olivier

AU - Fulton, Robert S.

AU - Fulton, Lucinda L.

AU - Garg, Shilpa

AU - Gerton, Jennifer L.

AU - Ghurye, Jay

AU - Granat, Anastasiya

AU - Green, Richard E.

AU - Harvey, William

AU - Hasenfeld, Patrick

AU - Hastie, Alex

AU - Haukness, Marina

AU - Jaeger, Erich B.

AU - Jain, Miten

AU - Kirsche, Melanie

AU - Kolmogorov, Mikhail

AU - Korbel, Jan O.

AU - Koren, Sergey

AU - Korlach, Jonas

AU - Lee, Joyce

AU - Li, Daofeng

AU - Lindsay, Tina

AU - Lucas, Julian

AU - Luo, Feng

AU - Marschall, Tobias

AU - Mitchell, Matthew W.

AU - McDaniel, Jennifer

AU - Nie, Fan

AU - Zhang, Guojie

AU - Li, Heng

AU - Human Pangenome Reference Consortium

PY - 2022

Y1 - 2022

N2 - The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

AB - The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

U2 - 10.1038/s41586-022-05325-5

DO - 10.1038/s41586-022-05325-5

M3 - Journal article

C2 - 36261518

AN - SCOPUS:85140231380

VL - 611

JO - Nature

JF - Nature

SN - 0028-0836

IS - 7936

ER -

ID: 330467025

Department of Biology