maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

maxAlike : maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences. / Menzel, Karl Peter; Stadler, Peter F.; Gorodkin, Jan.

In: Bioinformatics, Vol. 27, No. 3, 2011, p. 317-325.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Menzel, KP, Stadler, PF & Gorodkin, J 2011, 'maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences', Bioinformatics, vol. 27, no. 3, pp. 317-325.

APA

Menzel, K. P., Stadler, P. F., & Gorodkin, J. (2011). maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences. Bioinformatics, 27(3), 317-325.

Vancouver

Menzel KP, Stadler PF, Gorodkin J. maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences. Bioinformatics. 2011;27(3):317-325.

Author

Menzel, Karl Peter ; Stadler, Peter F. ; Gorodkin, Jan. / maxAlike : maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences. In: Bioinformatics. 2011 ; Vol. 27, No. 3. pp. 317-325.

Bibtex

@article{c17d713fa03a425d804a96c3505c5f0b,
title = "maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences",
abstract = "MOTIVATION: The task of reconstructing a genomic sequence from a particular species is gaining more and more importance in the light of the rapid development of high-throughput sequencing technologies and their limitations. Applications include not only compensation for missing data in unsequenced genomic regions and the design of oligonucleotide primers for target genes in species with lacking sequence information but also the preparation of customized queries for homology searches. RESULTS: We introduce the maxAlike algorithm, which reconstructs a genomic sequence for a specific taxon based on sequence homologs in other species. The input is a multiple sequence alignment and a phylogenetic tree that also contains the target species. For this target species, the algorithm computes nucleotide probabilities at each sequence position. Consensus sequences are then reconstructed based on a certain confidence level. For 37 out of 44 target species in a test dataset, we obtain a significant increase of the reconstruction accuracy compared to both the consensus sequence from the alignment and the sequence of the nearest phylogenetic neighbor. When considering only nucleotides above a confidence limit, maxAlike is significantly better (up to 10%) in all 44 species. The improved sequence reconstruction also leads to an increase of the quality of PCR primer design for yet unsequenced genes: the differences between the expected T(m) and real T(m) of the primer-template duplex can be reduced by ~26% compared with other reconstruction approaches. We also show that the prediction accuracy is robust to common distortions of the input trees. The prediction accuracy drops by only 1% on average across all species for 77% of trees derived from random genomic loci in a test dataset. AVAILABILITY: maxAlike is available for download and web server at: https://rth.dk/resources/maxAlike. ",
author = "Menzel, {Karl Peter} and Stadler, {Peter F.} and Jan Gorodkin",
year = "2011",
language = "English",
volume = "27",
pages = "317--325",
journal = "Computer Applications in the Biosciences",
issn = "1471-2105",
publisher = "Oxford University Press",
number = "3",

}

RIS

TY - JOUR

T1 - maxAlike

T2 - maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences

AU - Menzel, Karl Peter

AU - Stadler, Peter F.

AU - Gorodkin, Jan

PY - 2011

Y1 - 2011

N2 - MOTIVATION: The task of reconstructing a genomic sequence from a particular species is gaining more and more importance in the light of the rapid development of high-throughput sequencing technologies and their limitations. Applications include not only compensation for missing data in unsequenced genomic regions and the design of oligonucleotide primers for target genes in species with lacking sequence information but also the preparation of customized queries for homology searches. RESULTS: We introduce the maxAlike algorithm, which reconstructs a genomic sequence for a specific taxon based on sequence homologs in other species. The input is a multiple sequence alignment and a phylogenetic tree that also contains the target species. For this target species, the algorithm computes nucleotide probabilities at each sequence position. Consensus sequences are then reconstructed based on a certain confidence level. For 37 out of 44 target species in a test dataset, we obtain a significant increase of the reconstruction accuracy compared to both the consensus sequence from the alignment and the sequence of the nearest phylogenetic neighbor. When considering only nucleotides above a confidence limit, maxAlike is significantly better (up to 10%) in all 44 species. The improved sequence reconstruction also leads to an increase of the quality of PCR primer design for yet unsequenced genes: the differences between the expected T(m) and real T(m) of the primer-template duplex can be reduced by ~26% compared with other reconstruction approaches. We also show that the prediction accuracy is robust to common distortions of the input trees. The prediction accuracy drops by only 1% on average across all species for 77% of trees derived from random genomic loci in a test dataset. AVAILABILITY: maxAlike is available for download and web server at: https://rth.dk/resources/maxAlike.

AB - MOTIVATION: The task of reconstructing a genomic sequence from a particular species is gaining more and more importance in the light of the rapid development of high-throughput sequencing technologies and their limitations. Applications include not only compensation for missing data in unsequenced genomic regions and the design of oligonucleotide primers for target genes in species with lacking sequence information but also the preparation of customized queries for homology searches. RESULTS: We introduce the maxAlike algorithm, which reconstructs a genomic sequence for a specific taxon based on sequence homologs in other species. The input is a multiple sequence alignment and a phylogenetic tree that also contains the target species. For this target species, the algorithm computes nucleotide probabilities at each sequence position. Consensus sequences are then reconstructed based on a certain confidence level. For 37 out of 44 target species in a test dataset, we obtain a significant increase of the reconstruction accuracy compared to both the consensus sequence from the alignment and the sequence of the nearest phylogenetic neighbor. When considering only nucleotides above a confidence limit, maxAlike is significantly better (up to 10%) in all 44 species. The improved sequence reconstruction also leads to an increase of the quality of PCR primer design for yet unsequenced genes: the differences between the expected T(m) and real T(m) of the primer-template duplex can be reduced by ~26% compared with other reconstruction approaches. We also show that the prediction accuracy is robust to common distortions of the input trees. The prediction accuracy drops by only 1% on average across all species for 77% of trees derived from random genomic loci in a test dataset. AVAILABILITY: maxAlike is available for download and web server at: https://rth.dk/resources/maxAlike.

M3 - Journal article

VL - 27

SP - 317

EP - 325

JO - Computer Applications in the Biosciences

JF - Computer Applications in the Biosciences

SN - 1471-2105

IS - 3

ER -

ID: 37639733