Ou Wang:
A novel co-barcoding method for whole genome sequencing, haplotyping and scaffolding

Date: 08-08-2018    Supervisor: Karsten Kristiansen




Whole genome sequencing (WGS) technology has improved enormously since the completion of the human genome project. Compared to the original three billion USD cost, a whole human genome can now be sequenced for less than $600. This cost continues to fall and should enable WGS to be widely used for both research and clinical testing. However, there are still many aspects of WGS that require improvement before we can achieve perfect genome sequencing. Here we describe and provide a full protocol for a novel technology, single tube long fragment read (stLFR), that enables WGS, haplotyping, and contig scaffolding. Like the original LFR, stLFR is based on co-barcoding long DNA fragments. However, unlike the original version that used the compartments of a 384-well plate, stLFR utilizes the surface of beads to create millions of individual compartments in a single tube. Using a combinatorial process over 1.8 billion unique barcodes have been generated enabling almost no overlap of barcode sequences between beads in a typical reaction. Using stLFR we demonstrate near perfect variant calling and phasing of the genome of NA12878 in contigs with an N50 of >10 Mb. We also demonstrate that complex structure variants can be detected and properly ordered using stLFR data. This was all possible with a single stLFR library, we did not require a separate library to generate high quality variant calls. We also performed scaffolding of contigs generated from SMRT reads and show that stLFR can improve the genome assembly. We believe stLFR represents a potential single library solution that will enable WGS, phasing, SV detection, scaffolding, and ultimately dipoid de novo assembly. Importantly, stLFR does not significantly add to the cost of library preparation and it is easily automatable in 96-well plate format.