Mostrar el registro sencillo del ítem

dc.contributor.author
Kim, Yohan  
dc.contributor.author
Sidney, John  
dc.contributor.author
Buus, Søren  
dc.contributor.author
Sette, Alessandro  
dc.contributor.author
Nielsen, Morten  
dc.contributor.author
Peters, Bjoern  
dc.date.available
2017-06-12T15:50:21Z  
dc.date.issued
2014-07  
dc.identifier.citation
Kim, Yohan; Sidney, John; Buus, Søren; Sette, Alessandro; Nielsen, Morten; et al.; Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions; BioMed Central; Bmc Bioinformatics; 15; 241; 7-2014; 1-9  
dc.identifier.issn
1471-2105  
dc.identifier.uri
http://hdl.handle.net/11336/17977  
dc.description.abstract
BACKGROUND: It is important to accurately determine the performance of peptide:MHC binding predictions, as this enables users to compare and choose between different prediction methods and provides estimates of the expected error rate. Two common approaches to determine prediction performance are cross-validation, in which all available data are iteratively split into training and testing data, and the use of blind sets generated separately from the data used to construct the predictive method. In the present study, we have compared cross-validated prediction performances generated on our last benchmark dataset from 2009 with prediction performances generated on data subsequently added to the Immune Epitope Database (IEDB) which served as a blind set. RESULTS: We found that cross-validated performances systematically overestimated performance on the blind set. This was found not to be due to the presence of similar peptides in the cross-validation dataset. Rather, we found that small size and low sequence/affinity diversity of either training or blind datasets were associated with large differences in cross-validated vs. blind prediction performances. We use these findings to derive quantitative rules of how large and diverse datasets need to be to provide generalizable performance estimates. CONCLUSION: It has long been known that cross-validated prediction performance estimates often overestimate performance on independently generated blind set data. We here identify and quantify the specific factors contributing to this effect for MHC-I binding predictions. An increasing number of peptides for which MHC binding affinities are measured experimentally have been selected based on binding predictions and thus are less diverse than historic datasets sampling the entire sequence and affinity space, making them more difficult benchmark data sets. This has to be taken into account when comparing performance metrics between different benchmarks, and when deriving error estimates for predictions based on benchmark performance.  
dc.format
application/pdf  
dc.language.iso
eng  
dc.publisher
BioMed Central  
dc.rights
info:eu-repo/semantics/openAccess  
dc.rights.uri
https://creativecommons.org/licenses/by/2.5/ar/  
dc.subject
Benchmarking of Mhc Class I Predictors  
dc.subject
Epitope Prediction  
dc.subject
Sequence Similarity  
dc.subject
Cross-Validation  
dc.subject.classification
Otras Ciencias de la Computación e Información  
dc.subject.classification
Ciencias de la Computación e Información  
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS  
dc.title
Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions  
dc.type
info:eu-repo/semantics/article  
dc.type
info:ar-repo/semantics/artículo  
dc.type
info:eu-repo/semantics/publishedVersion  
dc.date.updated
2017-06-09T15:01:05Z  
dc.journal.volume
15  
dc.journal.number
241  
dc.journal.pagination
1-9  
dc.journal.pais
Reino Unido  
dc.journal.ciudad
Londres  
dc.description.fil
Fil: Kim, Yohan. La Jolla Institute for Allergy and Immunology; Estados Unidos  
dc.description.fil
Fil: Sidney, John. La Jolla Institute for Allergy and Immunology; Estados Unidos  
dc.description.fil
Fil: Buus, Søren. Universidad de Copenhagen; Dinamarca  
dc.description.fil
Fil: Sette, Alessandro. La Jolla Institute for Allergy and Immunology; Estados Unidos  
dc.description.fil
Fil: Nielsen, Morten. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; Argentina. Technical University of Denmark; Dinamarca  
dc.description.fil
Fil: Peters, Bjoern. La Jolla Institute for Allergy and Immunology; Estados Unidos  
dc.journal.title
Bmc Bioinformatics  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1186/1471-2105-15-241  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-241