DeepLoc 2.0: multi-label subcellular localization prediction using protein language models
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
DeepLoc 2.0 : multi-label subcellular localization prediction using protein language models. / Thumuluri, Vineet; Almagro Armenteros, José Juan; Johansen, Alexander Rosenberg; Nielsen, Henrik; Winther, Ole.
In: Nucleic Acids Research, Vol. 50, No. W1, 2022, p. W228-W234.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - DeepLoc 2.0
T2 - multi-label subcellular localization prediction using protein language models
AU - Thumuluri, Vineet
AU - Almagro Armenteros, José Juan
AU - Johansen, Alexander Rosenberg
AU - Nielsen, Henrik
AU - Winther, Ole
N1 - © The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2022
Y1 - 2022
N2 - The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.
AB - The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.
U2 - 10.1093/nar/gkac278
DO - 10.1093/nar/gkac278
M3 - Journal article
C2 - 35489069
VL - 50
SP - W228-W234
JO - Nucleic Acids Research
JF - Nucleic Acids Research
SN - 0305-1048
IS - W1
ER -
ID: 306112192