Repositorio Institucional
Repositorio Institucional
CONICET Digital
  • Inicio
  • EXPLORAR
    • AUTORES
    • DISCIPLINAS
    • COMUNIDADES
  • Estadísticas
  • Novedades
    • Noticias
    • Boletines
  • Ayuda
    • General
    • Datos de investigación
  • Acerca de
    • CONICET Digital
    • Equipo
    • Red Federal
  • Contacto
JavaScript is disabled for your browser. Some features of this site may not work without it.
  • INFORMACIÓN GENERAL
  • RESUMEN
  • ESTADISTICAS
 
Artículo

iRaPCA and SOMoC: Development and Validation of Web Applications for New Approaches for the Clustering of Small Molecules

Prada Gori, Denis NihuelIcon ; Llanos, ManuelIcon ; Bellera, Carolina LeticiaIcon ; Talevi, AlanIcon ; Alberca, Lucas NicolásIcon
Fecha de publicación: 06/2022
Editorial: American Chemical Society
Revista: Journal of Chemical Information and Modeling
ISSN: 1549-9596
Idioma: Inglés
Tipo de recurso: Artículo publicado
Clasificación temática:
Medicina Química; Otras Ciencias Químicas; Ciencias de la Información y Bioinformática

Resumen

The clustering of small molecules implies the organization of a group of chemical structures into smaller subgroups with similar features. Clustering has important applications to sample chemical datasets or libraries in a representative manner (e.g., to choose, from a virtual screening hit list, a chemically diverse subset of compounds to be submitted to experimental confirmation, or to split datasets into representative training and validation sets when implementing machine learning models). Most strategies for clustering molecules are based on molecular fingerprints and hierarchical clustering algorithms. Here, two open-source in-house methodologies for clustering of small molecules are presented: iterative Random subspace Principal Component Analysis clustering (iRaPCA), an iterative approach based on feature bagging, dimensionality reduction, and K-means optimization; and Silhouette Optimized Molecular Clustering (SOMoC), which combines molecular fingerprints with the Uniform Manifold Approximation and Projection (UMAP) and Gaussian Mixture Model algorithm (GMM). In a benchmarking exercise, the performance of both clustering methods has been examined across 29 datasets containing between 100 and 5000 small molecules, comparing these results with those given by two other well-known clustering methods, Ward and Butina. iRaPCA and SOMoC consistently showed the best performance across these 29 datasets, both in terms of within-cluster and between-cluster distances. Both iRaPCA and SOMoC have been implemented as free Web Apps and standalone applications, to allow their use to a wide audience within the scientific community.
Palabras clave: CLUSTERING , ALGORITHMS , SMALL MOLECULES
Ver el registro completo
 
Archivos asociados
Tamaño: 3.873Mb
Formato: PDF
.
Solicitar
Licencia
info:eu-repo/semantics/restrictedAccess Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)
Identificadores
URI: http://hdl.handle.net/11336/223388
URL: https://pubs.acs.org/doi/10.1021/acs.jcim.2c00265
DOI: http://dx.doi.org/10.1021/acs.jcim.2c00265
Colecciones
Articulos(CCT - LA PLATA)
Articulos de CTRO.CIENTIFICO TECNOL.CONICET - LA PLATA
Citación
Prada Gori, Denis Nihuel; Llanos, Manuel; Bellera, Carolina Leticia; Talevi, Alan; Alberca, Lucas Nicolás; iRaPCA and SOMoC: Development and Validation of Web Applications for New Approaches for the Clustering of Small Molecules; American Chemical Society; Journal of Chemical Information and Modeling; 62; 12; 6-2022; 2987-2998
Compartir
Altmétricas
 

Items relacionados

Mostrando titulos relacionados por título, autor y tema.

  • Datos de investigación Datasets used in the benchmarking exercise by SOMOC and iRAPCA
    Alberca, Lucas Nicolás Bellera, Carolina Leticia Prada Gori, Denis Nihuel Llanos, Manuel Talevi, Alan (2024)
Enviar por e-mail
Separar cada destinatario (hasta 5) con punto y coma.
  • Facebook
  • X Conicet Digital
  • Instagram
  • YouTube
  • Sound Cloud
  • LinkedIn

Los contenidos del CONICET están licenciados bajo Creative Commons Reconocimiento 2.5 Argentina License

https://www.conicet.gov.ar/ - CONICET

Inicio

Explorar

  • Autores
  • Disciplinas
  • Comunidades

Estadísticas

Novedades

  • Noticias
  • Boletines

Ayuda

Acerca de

  • CONICET Digital
  • Equipo
  • Red Federal

Contacto

Godoy Cruz 2290 (C1425FQB) CABA – República Argentina – Tel: +5411 4899-5400 repositorio@conicet.gov.ar
TÉRMINOS Y CONDICIONES