DeepLoc 2.0: multi-label subcellular localization prediction using protein language models

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

DeepLoc 2.0 : multi-label subcellular localization prediction using protein language models. / Thumuluri, Vineet; Almagro Armenteros, José Juan; Johansen, Alexander Rosenberg; Nielsen, Henrik; Winther, Ole.

In: Nucleic Acids Research, Vol. 50, No. W1, 2022, p. W228-W234.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Thumuluri, V, Almagro Armenteros, JJ, Johansen, AR, Nielsen, H & Winther, O 2022, 'DeepLoc 2.0: multi-label subcellular localization prediction using protein language models', Nucleic Acids Research, vol. 50, no. W1, pp. W228-W234. https://doi.org/10.1093/nar/gkac278

APA

Thumuluri, V., Almagro Armenteros, J. J., Johansen, A. R., Nielsen, H., & Winther, O. (2022). DeepLoc 2.0: multi-label subcellular localization prediction using protein language models. Nucleic Acids Research, 50(W1), W228-W234. https://doi.org/10.1093/nar/gkac278

Vancouver

Thumuluri V, Almagro Armenteros JJ, Johansen AR, Nielsen H, Winther O. DeepLoc 2.0: multi-label subcellular localization prediction using protein language models. Nucleic Acids Research. 2022;50(W1):W228-W234. https://doi.org/10.1093/nar/gkac278

Author

Thumuluri, Vineet ; Almagro Armenteros, José Juan ; Johansen, Alexander Rosenberg ; Nielsen, Henrik ; Winther, Ole. / DeepLoc 2.0 : multi-label subcellular localization prediction using protein language models. In: Nucleic Acids Research. 2022 ; Vol. 50, No. W1. pp. W228-W234.

Bibtex

@article{667b0f9bd18e4833858983d666a3375a,
title = "DeepLoc 2.0: multi-label subcellular localization prediction using protein language models",
abstract = "The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.",
author = "Vineet Thumuluri and {Almagro Armenteros}, {Jos{\'e} Juan} and Johansen, {Alexander Rosenberg} and Henrik Nielsen and Ole Winther",
note = "{\textcopyright} The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.",
year = "2022",
doi = "10.1093/nar/gkac278",
language = "English",
volume = "50",
pages = "W228--W234",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "W1",

}

RIS

TY - JOUR

T1 - DeepLoc 2.0

T2 - multi-label subcellular localization prediction using protein language models

AU - Thumuluri, Vineet

AU - Almagro Armenteros, José Juan

AU - Johansen, Alexander Rosenberg

AU - Nielsen, Henrik

AU - Winther, Ole

N1 - © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.

PY - 2022

Y1 - 2022

N2 - The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.

AB - The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.

U2 - 10.1093/nar/gkac278

DO - 10.1093/nar/gkac278

M3 - Journal article

C2 - 35489069

VL - 50

SP - W228-W234

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - W1

ER -

ID: 306112192