GraphPart: homology partitioning for biological sequence analysis

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

GraphPart : homology partitioning for biological sequence analysis. / Teufel, Felix; Gíslason, Magnús Halldór; Almagro Armenteros, José Juan; Johansen, Alexander Rosenberg; Winther, Ole; Nielsen, Henrik.

In: NAR Genomics and Bioinformatics, Vol. 5, No. 4, lqad088, 2023.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Teufel, F, Gíslason, MH, Almagro Armenteros, JJ, Johansen, AR, Winther, O & Nielsen, H 2023, 'GraphPart: homology partitioning for biological sequence analysis', NAR Genomics and Bioinformatics, vol. 5, no. 4, lqad088. https://doi.org/10.1093/nargab/lqad088

APA

Teufel, F., Gíslason, M. H., Almagro Armenteros, J. J., Johansen, A. R., Winther, O., & Nielsen, H. (2023). GraphPart: homology partitioning for biological sequence analysis. NAR Genomics and Bioinformatics, 5(4), [lqad088]. https://doi.org/10.1093/nargab/lqad088

Vancouver

Teufel F, Gíslason MH, Almagro Armenteros JJ, Johansen AR, Winther O, Nielsen H. GraphPart: homology partitioning for biological sequence analysis. NAR Genomics and Bioinformatics. 2023;5(4). lqad088. https://doi.org/10.1093/nargab/lqad088

Author

Teufel, Felix ; Gíslason, Magnús Halldór ; Almagro Armenteros, José Juan ; Johansen, Alexander Rosenberg ; Winther, Ole ; Nielsen, Henrik. / GraphPart : homology partitioning for biological sequence analysis. In: NAR Genomics and Bioinformatics. 2023 ; Vol. 5, No. 4.

Bibtex

@article{6a9f9294b3b647dc8f03f5b99b9346dd,
title = "GraphPart: homology partitioning for biological sequence analysis",
abstract = "When splitting biological sequence data for the development and testing of predictive models, it is necessary to avoid too-closely related pairs of sequences ending up in different partitions. If this is ignored, performance of prediction methods will tend to be overestimated. Several algorithms have been proposed for homology reduction, where sequences are removed until no too-closely related pairs remain. We present GraphPart, an algorithm for homology partitioning that divides the data such that closely related sequences always end up in the same partition, while keeping as many sequences as possible in the dataset. Evaluation of GraphPart on Protein, DNA and RNA datasets shows that it is capable of retaining a larger number of sequences per dataset, while providing homology separation on a par with reduction approaches. ",
author = "Felix Teufel and G{\'i}slason, {Magn{\'u}s Halld{\'o}r} and {Almagro Armenteros}, {Jos{\'e} Juan} and Johansen, {Alexander Rosenberg} and Ole Winther and Henrik Nielsen",
note = "Publisher Copyright: {\textcopyright} 2023 The Author(s).",
year = "2023",
doi = "10.1093/nargab/lqad088",
language = "English",
volume = "5",
journal = "NAR Genomics and Bioinformatics",
issn = "2631-9268",
publisher = "Oxford University Press",
number = "4",

}

RIS

TY - JOUR

T1 - GraphPart

T2 - homology partitioning for biological sequence analysis

AU - Teufel, Felix

AU - Gíslason, Magnús Halldór

AU - Almagro Armenteros, José Juan

AU - Johansen, Alexander Rosenberg

AU - Winther, Ole

AU - Nielsen, Henrik

N1 - Publisher Copyright: © 2023 The Author(s).

PY - 2023

Y1 - 2023

N2 - When splitting biological sequence data for the development and testing of predictive models, it is necessary to avoid too-closely related pairs of sequences ending up in different partitions. If this is ignored, performance of prediction methods will tend to be overestimated. Several algorithms have been proposed for homology reduction, where sequences are removed until no too-closely related pairs remain. We present GraphPart, an algorithm for homology partitioning that divides the data such that closely related sequences always end up in the same partition, while keeping as many sequences as possible in the dataset. Evaluation of GraphPart on Protein, DNA and RNA datasets shows that it is capable of retaining a larger number of sequences per dataset, while providing homology separation on a par with reduction approaches.

AB - When splitting biological sequence data for the development and testing of predictive models, it is necessary to avoid too-closely related pairs of sequences ending up in different partitions. If this is ignored, performance of prediction methods will tend to be overestimated. Several algorithms have been proposed for homology reduction, where sequences are removed until no too-closely related pairs remain. We present GraphPart, an algorithm for homology partitioning that divides the data such that closely related sequences always end up in the same partition, while keeping as many sequences as possible in the dataset. Evaluation of GraphPart on Protein, DNA and RNA datasets shows that it is capable of retaining a larger number of sequences per dataset, while providing homology separation on a par with reduction approaches.

U2 - 10.1093/nargab/lqad088

DO - 10.1093/nargab/lqad088

M3 - Journal article

C2 - 37850036

AN - SCOPUS:85175205895

VL - 5

JO - NAR Genomics and Bioinformatics

JF - NAR Genomics and Bioinformatics

SN - 2631-9268

IS - 4

M1 - lqad088

ER -

ID: 372181438