Clustering gene expression data with a penalized graph-based metric

Baya, Ariel Emilio; Granitto, Pablo Miguel

doi:10.1186/1471-2105-12-2

Mostrar el registro sencillo del ítem

dc.contributor.author

Baya, Ariel Emilio Se ha confirmado la validez de este valor de autoridad por un usuario

dc.contributor.author

Granitto, Pablo Miguel Se ha confirmado la validez de este valor de autoridad por un usuario

dc.date.available

2017-04-11T20:56:52Z

dc.date.issued

2011-01

dc.identifier.citation

Baya, Ariel Emilio; Granitto, Pablo Miguel; Clustering gene expression data with a penalized graph-based metric; Biomed Central; Bmc Bioinformatics; 12; 2; 1-2011; 1-18

dc.identifier.issn

1471-2105

dc.identifier.uri

http://hdl.handle.net/11336/15184

dc.description.abstract

Background: The search for cluster structure in microarray datasets is a base problem for the so-called “-omic sciences”. A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a highdimensional space, as could be the case of some gene expression datasets. Results: In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. Conclusions: In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data.

dc.format

application/pdf

dc.language.iso

eng

dc.publisher

Biomed Central Se ha confirmado la validez de este valor de autoridad por un usuario

dc.rights

info:eu-repo/semantics/openAccess

dc.rights.uri

https://creativecommons.org/licenses/by-nc-sa/2.5/ar/

dc.subject

Clustering

dc.subject

Isomap

dc.subject

Gene Expression

dc.subject.classification

Ciencias de la Información y Bioinformática Se ha confirmado la validez de este valor de autoridad por un usuario

dc.subject.classification

Ciencias de la Computación e Información Se ha confirmado la validez de este valor de autoridad por un usuario

dc.subject.classification

CIENCIAS NATURALES Y EXACTAS Se ha confirmado la validez de este valor de autoridad por un usuario

dc.title

Clustering gene expression data with a penalized graph-based metric

dc.type

info:eu-repo/semantics/article

dc.type

info:ar-repo/semantics/artículo

dc.type

info:eu-repo/semantics/publishedVersion

dc.date.updated

2017-04-11T17:42:08Z

dc.journal.volume

12

dc.journal.number

2

dc.journal.pagination

1-18

dc.journal.pais

Reino Unido Se ha confirmado la validez de este valor de autoridad por un usuario

dc.journal.ciudad

Londres

dc.description.fil

Fil: Baya, Ariel Emilio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina

dc.description.fil

Fil: Granitto, Pablo Miguel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina

dc.journal.title

Bmc Bioinformatics Se ha confirmado la validez de este valor de autoridad por un usuario

dc.relation.alternativeid

info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1186/1471-2105-12-2

dc.relation.alternativeid

info:eu-repo/semantics/altIdentifier/url/https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-2

dc.relation.alternativeid

info:eu-repo/semantics/altIdentifier/url/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3023695/

Archivos asociados

Tamaño: 831.6Kb

Formato: PDF

Descargar