Probabilistic Transcriptome Assembly and Variant Graph Genotyping

Biologisk Institut

Probabilistic Transcriptome Assembly and Variant Graph Genotyping

Publikation: Bog/antologi/afhandling/rapport › Ph.d.-afhandling › Forskning

Standard

Probabilistic Transcriptome Assembly and Variant Graph Genotyping. / Sibbesen, Jonas Andreas.

Department of Biology, Faculty of Science, University of Copenhagen, 2016. 119 s.

Publikation: Bog/antologi/afhandling/rapport › Ph.d.-afhandling › Forskning

Harvard

Sibbesen, JA 2016, Probabilistic Transcriptome Assembly and Variant Graph Genotyping. Department of Biology, Faculty of Science, University of Copenhagen. <https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122244646205763>

APA

Sibbesen, J. A. (2016). Probabilistic Transcriptome Assembly and Variant Graph Genotyping. Department of Biology, Faculty of Science, University of Copenhagen. https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122244646205763

Vancouver

Sibbesen JA. Probabilistic Transcriptome Assembly and Variant Graph Genotyping. Department of Biology, Faculty of Science, University of Copenhagen, 2016. 119 s.

Author

Sibbesen, Jonas Andreas. / Probabilistic Transcriptome Assembly and Variant Graph Genotyping. Department of Biology, Faculty of Science, University of Copenhagen, 2016. 119 s.

Bibtex

@phdthesis{8862a5beec434fe7b125d7fd7d9dbcb5,

title = "Probabilistic Transcriptome Assembly and Variant Graph Genotyping",

abstract = "The introduction of second-generation sequencing, has in recent years allowed the biologicalcommunity to determine the genomes and transcriptomes of organisms and individualsat an unprecedented rate. However, almost every step in the sequencing protocol introducesuncertainties in how the resulting sequencing data should be interpreted. This hasover the years spurred the development of many probabilistic methods that are capable ofmodelling dierent aspects of the sequencing process. Here, I present two of such methodsthat were developed to each tackle a dierent problem in bioinformatics, together with anapplication of the latter method to a large Danish sequencing project.The rst is a probabilistic method for transcriptome assembly that is based on a novelgenerative model of the RNA sequencing process and provides condence estimates on theassembled transcripts. We show that this approach outperforms existing state-of-the-artmethods measured using sensitivity and precision on both simulated and real data.The second is a novel probabilistic method that uses exact alignment of k-mers to a setof variants graphs to provide unbiased estimates of genotypes in a population of individuals.Using simulations we show that this method markedly increases sensitivity withoutsacricing precision, when compared to mapping-based approaches, especially in variantdense regions. We further demonstrate, using high coverage real genome sequencing dataof parent-ospring trios, that our method is accurate even for larger structural variantsmeasured using trio concordance.Finally, we applied the second method to genotype variants, predicted using both a mappingbasedapproach and de novo assemblies, in a population of 50 Danish parent-ospring triosin the GenomeDenmark project. Using this hybrid-approach we not only created a variantset that was more complete, in term of structural variants, compared to previous similarstudies but also signicantly reduced the bias towards deletions normally observed in suchstudies.",

author = "Sibbesen, {Jonas Andreas}",

year = "2016",

language = "English",

publisher = "Department of Biology, Faculty of Science, University of Copenhagen",

}

RIS

TY - BOOK

T1 - Probabilistic Transcriptome Assembly and Variant Graph Genotyping

AU - Sibbesen, Jonas Andreas

PY - 2016

Y1 - 2016

N2 - The introduction of second-generation sequencing, has in recent years allowed the biologicalcommunity to determine the genomes and transcriptomes of organisms and individualsat an unprecedented rate. However, almost every step in the sequencing protocol introducesuncertainties in how the resulting sequencing data should be interpreted. This hasover the years spurred the development of many probabilistic methods that are capable ofmodelling dierent aspects of the sequencing process. Here, I present two of such methodsthat were developed to each tackle a dierent problem in bioinformatics, together with anapplication of the latter method to a large Danish sequencing project.The rst is a probabilistic method for transcriptome assembly that is based on a novelgenerative model of the RNA sequencing process and provides condence estimates on theassembled transcripts. We show that this approach outperforms existing state-of-the-artmethods measured using sensitivity and precision on both simulated and real data.The second is a novel probabilistic method that uses exact alignment of k-mers to a setof variants graphs to provide unbiased estimates of genotypes in a population of individuals.Using simulations we show that this method markedly increases sensitivity withoutsacricing precision, when compared to mapping-based approaches, especially in variantdense regions. We further demonstrate, using high coverage real genome sequencing dataof parent-ospring trios, that our method is accurate even for larger structural variantsmeasured using trio concordance.Finally, we applied the second method to genotype variants, predicted using both a mappingbasedapproach and de novo assemblies, in a population of 50 Danish parent-ospring triosin the GenomeDenmark project. Using this hybrid-approach we not only created a variantset that was more complete, in term of structural variants, compared to previous similarstudies but also signicantly reduced the bias towards deletions normally observed in suchstudies.

AB - The introduction of second-generation sequencing, has in recent years allowed the biologicalcommunity to determine the genomes and transcriptomes of organisms and individualsat an unprecedented rate. However, almost every step in the sequencing protocol introducesuncertainties in how the resulting sequencing data should be interpreted. This hasover the years spurred the development of many probabilistic methods that are capable ofmodelling dierent aspects of the sequencing process. Here, I present two of such methodsthat were developed to each tackle a dierent problem in bioinformatics, together with anapplication of the latter method to a large Danish sequencing project.The rst is a probabilistic method for transcriptome assembly that is based on a novelgenerative model of the RNA sequencing process and provides condence estimates on theassembled transcripts. We show that this approach outperforms existing state-of-the-artmethods measured using sensitivity and precision on both simulated and real data.The second is a novel probabilistic method that uses exact alignment of k-mers to a setof variants graphs to provide unbiased estimates of genotypes in a population of individuals.Using simulations we show that this method markedly increases sensitivity withoutsacricing precision, when compared to mapping-based approaches, especially in variantdense regions. We further demonstrate, using high coverage real genome sequencing dataof parent-ospring trios, that our method is accurate even for larger structural variantsmeasured using trio concordance.Finally, we applied the second method to genotype variants, predicted using both a mappingbasedapproach and de novo assemblies, in a population of 50 Danish parent-ospring triosin the GenomeDenmark project. Using this hybrid-approach we not only created a variantset that was more complete, in term of structural variants, compared to previous similarstudies but also signicantly reduced the bias towards deletions normally observed in suchstudies.

UR - https://soeg.kb.dk/permalink/45KBDK_KGL/fbp0ps/alma99122244646205763

M3 - Ph.D. thesis

BT - Probabilistic Transcriptome Assembly and Variant Graph Genotyping

PB - Department of Biology, Faculty of Science, University of Copenhagen

ER -

ID: 168211016