Artículo
Transfer learning: The key to functionally annotate the protein universe
Bugnon, Leandro Ariel
; Fenoy, Luis Emilio
; Edera, Alejandro
; Raad, Jonathan
; Stegmayer, Georgina
; Milone, Diego Humberto
Fecha de publicación:
02/2023
Editorial:
Cell Press
Revista:
Patterns
ISSN:
2666-3899
Idioma:
Inglés
Tipo de recurso:
Artículo publicado
Clasificación temática:
Resumen
The automatic annotation of the protein universe is still an unresolved challenge. Today, there are 229,149,489 entries in the UniProtKB database, but only 0.25% of them have been functionally annotated. This manual process integrates knowledge from the protein families database Pfam, annotating family domains using sequence alignments and hidden Markov models. This approach has grown the Pfam annotations at a low rate in the last years. Recently, deep learning models appeared with the capability of learning evolutionary patterns from unaligned protein sequences. However, this requires large-scale data, while many families contain just a few sequences. Here, we contend this limitation can be overcome by transfer learning, exploiting the full potential of self-supervised learning on large unannotated data and then supervised learning on a small labeled dataset. We show results where errors in protein family prediction can be reduced by 55% with respect to standard methods.
Archivos asociados
Licencia
Identificadores
Colecciones
Articulos(SINC(I))
Articulos de INST. DE INVESTIGACION EN SEÑALES, SISTEMAS E INTELIGENCIA COMPUTACIONAL
Articulos de INST. DE INVESTIGACION EN SEÑALES, SISTEMAS E INTELIGENCIA COMPUTACIONAL
Citación
Bugnon, Leandro Ariel; Fenoy, Luis Emilio; Edera, Alejandro; Raad, Jonathan; Stegmayer, Georgina; et al.; Transfer learning: The key to functionally annotate the protein universe; Cell Press; Patterns; 4; 2; 2-2023
Compartir
Altmétricas