Mostrar el registro sencillo del ítem

dc.contributor.author
Prada Gori, Denis Nihuel  
dc.contributor.author
Llanos, Manuel  
dc.contributor.author
Bellera, Carolina Leticia  
dc.contributor.author
Talevi, Alan  
dc.contributor.author
Alberca, Lucas Nicolás  
dc.date.available
2024-01-11T14:10:01Z  
dc.date.issued
2022-06  
dc.identifier.citation
Prada Gori, Denis Nihuel; Llanos, Manuel; Bellera, Carolina Leticia; Talevi, Alan; Alberca, Lucas Nicolás; iRaPCA and SOMoC: Development and Validation of Web Applications for New Approaches for the Clustering of Small Molecules; American Chemical Society; Journal of Chemical Information and Modeling; 62; 12; 6-2022; 2987-2998  
dc.identifier.issn
1549-9596  
dc.identifier.uri
http://hdl.handle.net/11336/223388  
dc.description.abstract
The clustering of small molecules implies the organization of a group of chemical structures into smaller subgroups with similar features. Clustering has important applications to sample chemical datasets or libraries in a representative manner (e.g., to choose, from a virtual screening hit list, a chemically diverse subset of compounds to be submitted to experimental confirmation, or to split datasets into representative training and validation sets when implementing machine learning models). Most strategies for clustering molecules are based on molecular fingerprints and hierarchical clustering algorithms. Here, two open-source in-house methodologies for clustering of small molecules are presented: iterative Random subspace Principal Component Analysis clustering (iRaPCA), an iterative approach based on feature bagging, dimensionality reduction, and K-means optimization; and Silhouette Optimized Molecular Clustering (SOMoC), which combines molecular fingerprints with the Uniform Manifold Approximation and Projection (UMAP) and Gaussian Mixture Model algorithm (GMM). In a benchmarking exercise, the performance of both clustering methods has been examined across 29 datasets containing between 100 and 5000 small molecules, comparing these results with those given by two other well-known clustering methods, Ward and Butina. iRaPCA and SOMoC consistently showed the best performance across these 29 datasets, both in terms of within-cluster and between-cluster distances. Both iRaPCA and SOMoC have been implemented as free Web Apps and standalone applications, to allow their use to a wide audience within the scientific community.  
dc.format
application/pdf  
dc.language.iso
eng  
dc.publisher
American Chemical Society  
dc.rights
info:eu-repo/semantics/restrictedAccess  
dc.rights.uri
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/  
dc.subject
CLUSTERING  
dc.subject
ALGORITHMS  
dc.subject
SMALL MOLECULES  
dc.subject.classification
Medicina Química  
dc.subject.classification
Medicina Básica  
dc.subject.classification
CIENCIAS MÉDICAS Y DE LA SALUD  
dc.subject.classification
Otras Ciencias Químicas  
dc.subject.classification
Ciencias Químicas  
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS  
dc.subject.classification
Ciencias de la Información y Bioinformática  
dc.subject.classification
Ciencias de la Computación e Información  
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS  
dc.title
iRaPCA and SOMoC: Development and Validation of Web Applications for New Approaches for the Clustering of Small Molecules  
dc.type
info:eu-repo/semantics/article  
dc.type
info:ar-repo/semantics/artículo  
dc.type
info:eu-repo/semantics/publishedVersion  
dc.date.updated
2024-01-10T12:06:34Z  
dc.journal.volume
62  
dc.journal.number
12  
dc.journal.pagination
2987-2998  
dc.journal.pais
Estados Unidos  
dc.journal.ciudad
Washington D.C  
dc.description.fil
Fil: Prada Gori, Denis Nihuel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Ciencas Exactas. Laboratorio de Investigación y Desarrollo de Bioactivos; Argentina  
dc.description.fil
Fil: Llanos, Manuel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Ciencas Exactas. Laboratorio de Investigación y Desarrollo de Bioactivos; Argentina  
dc.description.fil
Fil: Bellera, Carolina Leticia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Ciencas Exactas. Laboratorio de Investigación y Desarrollo de Bioactivos; Argentina  
dc.description.fil
Fil: Talevi, Alan. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Ciencas Exactas. Laboratorio de Investigación y Desarrollo de Bioactivos; Argentina  
dc.description.fil
Fil: Alberca, Lucas Nicolás. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Ciencas Exactas. Laboratorio de Investigación y Desarrollo de Bioactivos; Argentina  
dc.journal.title
Journal of Chemical Information and Modeling  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acs.jcim.2c00265  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1021/acs.jcim.2c00265