Probabilistic Transcriptome Assembly and Variant Graph Genotyping
Publikation: Bog/antologi/afhandling/rapport › Ph.d.-afhandling › Forskning
Standard
Probabilistic Transcriptome Assembly and Variant Graph Genotyping. / Sibbesen, Jonas Andreas.
Department of Biology, Faculty of Science, University of Copenhagen, 2016. 119 s.Publikation: Bog/antologi/afhandling/rapport › Ph.d.-afhandling › Forskning
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - BOOK
T1 - Probabilistic Transcriptome Assembly and Variant Graph Genotyping
AU - Sibbesen, Jonas Andreas
PY - 2016
Y1 - 2016
N2 - The introduction of second-generation sequencing, has in recent years allowed the biologicalcommunity to determine the genomes and transcriptomes of organisms and individualsat an unprecedented rate. However, almost every step in the sequencing protocol introducesuncertainties in how the resulting sequencing data should be interpreted. This hasover the years spurred the development of many probabilistic methods that are capable ofmodelling dierent aspects of the sequencing process. Here, I present two of such methodsthat were developed to each tackle a dierent problem in bioinformatics, together with anapplication of the latter method to a large Danish sequencing project.The rst is a probabilistic method for transcriptome assembly that is based on a novelgenerative model of the RNA sequencing process and provides condence estimates on theassembled transcripts. We show that this approach outperforms existing state-of-the-artmethods measured using sensitivity and precision on both simulated and real data.The second is a novel probabilistic method that uses exact alignment of k-mers to a setof variants graphs to provide unbiased estimates of genotypes in a population of individuals.Using simulations we show that this method markedly increases sensitivity withoutsacricing precision, when compared to mapping-based approaches, especially in variantdense regions. We further demonstrate, using high coverage real genome sequencing dataof parent-ospring trios, that our method is accurate even for larger structural variantsmeasured using trio concordance.Finally, we applied the second method to genotype variants, predicted using both a mappingbasedapproach and de novo assemblies, in a population of 50 Danish parent-ospring triosin the GenomeDenmark project. Using this hybrid-approach we not only created a variantset that was more complete, in term of structural variants, compared to previous similarstudies but also signicantly reduced the bias towards deletions normally observed in suchstudies.
AB - The introduction of second-generation sequencing, has in recent years allowed the biologicalcommunity to determine the genomes and transcriptomes of organisms and individualsat an unprecedented rate. However, almost every step in the sequencing protocol introducesuncertainties in how the resulting sequencing data should be interpreted. This hasover the years spurred the development of many probabilistic methods that are capable ofmodelling dierent aspects of the sequencing process. Here, I present two of such methodsthat were developed to each tackle a dierent problem in bioinformatics, together with anapplication of the latter method to a large Danish sequencing project.The rst is a probabilistic method for transcriptome assembly that is based on a novelgenerative model of the RNA sequencing process and provides condence estimates on theassembled transcripts. We show that this approach outperforms existing state-of-the-artmethods measured using sensitivity and precision on both simulated and real data.The second is a novel probabilistic method that uses exact alignment of k-mers to a setof variants graphs to provide unbiased estimates of genotypes in a population of individuals.Using simulations we show that this method markedly increases sensitivity withoutsacricing precision, when compared to mapping-based approaches, especially in variantdense regions. We further demonstrate, using high coverage real genome sequencing dataof parent-ospring trios, that our method is accurate even for larger structural variantsmeasured using trio concordance.Finally, we applied the second method to genotype variants, predicted using both a mappingbasedapproach and de novo assemblies, in a population of 50 Danish parent-ospring triosin the GenomeDenmark project. Using this hybrid-approach we not only created a variantset that was more complete, in term of structural variants, compared to previous similarstudies but also signicantly reduced the bias towards deletions normally observed in suchstudies.
UR - https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122244646205763
M3 - Ph.D. thesis
BT - Probabilistic Transcriptome Assembly and Variant Graph Genotyping
PB - Department of Biology, Faculty of Science, University of Copenhagen
ER -
ID: 168211016