Clustering gene expression data with a penalized graph-based metric

Baya, Ariel Emilio; Granitto, Pablo Miguel

doi:10.1186/1471-2105-12-2

Artículo

Clustering gene expression data with a penalized graph-based metric

Baya, Ariel Emilio Icon

; Granitto, Pablo Miguel Icon

Fecha de publicación: 01/2011

Editorial: Biomed Central

Revista: Bmc Bioinformatics

ISSN: 1471-2105

Idioma: Inglés

Tipo de recurso: Artículo publicado

Clasificación temática:

Ciencias de la Información y Bioinformática

Resumen

Background: The search for cluster structure in microarray datasets is a base problem for the so-called “-omic sciences”. A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a highdimensional space, as could be the case of some gene expression datasets. Results: In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. Conclusions: In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data.

Palabras clave: Clustering , Isomap , Gene Expression

Ver el registro completo

Archivos asociados

Tamaño: 831.6Kb

Formato: PDF

Descargar

Licencia

Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)

Identificadores

URI: http://hdl.handle.net/11336/15184

DOI: http://dx.doi.org/10.1186/1471-2105-12-2

URL: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-2

URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3023695/

Colecciones

Articulos(CIFASIS)
Articulos de CENTRO INT.FRANCO ARG.D/CS D/L/INF.Y SISTEM.

Citación

Baya, Ariel Emilio; Granitto, Pablo Miguel; Clustering gene expression data with a penalized graph-based metric; Biomed Central; Bmc Bioinformatics; 12; 2; 1-2011; 1-18

Altmétricas