Mostrar el registro sencillo del ítem

dc.contributor.author
Fenoy, Luis Emilio  
dc.contributor.author
Edera, Alejandro  
dc.contributor.author
Stegmayer, Georgina  
dc.date.available
2023-10-03T13:26:02Z  
dc.date.issued
2022-06  
dc.identifier.citation
Fenoy, Luis Emilio; Edera, Alejandro; Stegmayer, Georgina; Transfer learning in proteins: evaluating novel protein learned representations for bioinformatics tasks; Oxford University Press; Briefings In Bioinformatics; 23; 4; 6-2022; 1-19  
dc.identifier.issn
1467-5463  
dc.identifier.uri
http://hdl.handle.net/11336/213945  
dc.description.abstract
A representation method is an algorithm that calculates numerical feature vectors for samples in a dataset. Such vectors, also known as embeddings, define a relatively low-dimensional space able to efficiently encode high-dimensional data. Very recently, many types of learned data representations based on machine learning have appeared and are being applied to several tasks in bioinformatics. In particular, protein representation learning methods integrate different types of protein information (sequence, domains, etc.), in supervised or unsupervised learning approaches, and provide embeddings of protein sequences that can be used for downstream tasks. One task that is of special interest is the automatic function prediction of the huge number of novel proteins that are being discovered nowadays and are still totally uncharacterized. However, despite its importance, up to date there is not a fair benchmark study of the predictive performance of existing proposals on the same large set of proteins and for very concrete and common bioinformatics tasks. Therefore, this lack of benchmark studies prevent the community from using adequate predictive methods for accelerating the functional characterization of proteins. In this study, we performed a detailed comparison of protein sequence representation learning methods, explaining each approach and comparing them with an experimental benchmark on several bioinformatics tasks: (i) determining protein sequence similarity in the embedding space; (ii) inferring protein domains and (iii) predicting ontology-based protein functions. We examine the advantages and disadvantages of each representation approach over the benchmark results. We hope the results and the discussion of this study can help the community to select the most adequate machine learning-based technique for protein representation according to the bioinformatics task at hand.  
dc.format
application/pdf  
dc.language.iso
eng  
dc.publisher
Oxford University Press  
dc.rights
info:eu-repo/semantics/restrictedAccess  
dc.rights.uri
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/  
dc.subject
AUTOMATIC FUNCTION PREDICTION  
dc.subject
EMBEDDING  
dc.subject
MACHINE LEARNING  
dc.subject
PROTEIN REPRESENTATION  
dc.subject
PROTEOMICS  
dc.subject
TRANSFER LEARNING  
dc.subject.classification
Ciencias de la Información y Bioinformática  
dc.subject.classification
Ciencias de la Computación e Información  
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS  
dc.title
Transfer learning in proteins: evaluating novel protein learned representations for bioinformatics tasks  
dc.type
info:eu-repo/semantics/article  
dc.type
info:ar-repo/semantics/artículo  
dc.type
info:eu-repo/semantics/publishedVersion  
dc.date.updated
2023-08-07T14:57:02Z  
dc.journal.volume
23  
dc.journal.number
4  
dc.journal.pagination
1-19  
dc.journal.pais
Reino Unido  
dc.journal.ciudad
Oxford  
dc.description.fil
Fil: Fenoy, Luis Emilio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina  
dc.description.fil
Fil: Edera, Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina  
dc.description.fil
Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina  
dc.journal.title
Briefings In Bioinformatics  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://academic.oup.com/bib/article-abstract/23/4/bbac232/6618242  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/https://doi.org/10.1093/bib/bbac232