Repositorio Institucional
Repositorio Institucional
CONICET Digital
  • Inicio
  • EXPLORAR
    • AUTORES
    • DISCIPLINAS
    • COMUNIDADES
  • Estadísticas
  • Novedades
    • Noticias
    • Boletines
  • Ayuda
    • General
    • Datos de investigación
  • Acerca de
    • CONICET Digital
    • Equipo
    • Red Federal
  • Contacto
JavaScript is disabled for your browser. Some features of this site may not work without it.
  • INFORMACIÓN GENERAL
  • RESUMEN
  • ESTADISTICAS
 
Artículo

A standardized reference data set for vertebrate taxon name resolution

Zermoglio, Paula FlorenciaIcon ; Guralnick, Robert P.; Wieczorek, John R.
Fecha de publicación: 01/2016
Editorial: Public Library of Science
Revista: Plos One
e-ISSN: 1932-6203
Idioma: Inglés
Tipo de recurso: Artículo publicado
Clasificación temática:
Otras Ciencias Biológicas

Resumen

Taxonomic names associated with digitized biocollections labels have flooded into repositories such as GBIF, iDigBio and VertNet. The names on these labels are often misspelled, out of date, or present other problems, as they were often captured only once during accessioning of specimens, or have a history of label changes without clear provenance. Before records are reliably usable in research, it is critical that these issues be addressed. However, still missing is an assessment of the scope of the problem, the effort needed to solve it, and a way to improve effectiveness of tools developed to aid the process. We present a carefully human-vetted analysis of 1000 verbatim scientific names taken at random from those published via the data aggregator VertNet, providing the first rigorously reviewed, reference validation data set. In addition to characterizing formatting problems, human vetting focused on detecting misspelling, synonymy, and the incorrect use of Darwin Core. Our results reveal a sobering view of the challenge ahead, as less than 47% of name strings were found to be currently valid. More optimistically, nearly 97% of name combinations could be resolved to a currently valid name, suggesting that computer-aided approaches may provide feasible means to improve digitized content. Finally, we associated names back to biocollections records and fit logistic models to test potential drivers of issues. A set of candidate variables (geographic region, year collected, higher-level clade, and the institutional digitally accessible data volume) and their 2-way interactions all predict the probability of records having taxon name issues, based on model selection approaches. We strongly encourage further experiments to use this reference data set as a means to compare automated or computer-aided taxon name tools for their ability to resolve and improve the existing wealth of legacy data.
Palabras clave: Biocollections , Data Curation , Fitness for Use , Gold Standard , Taxon Names , Validation , Vertebrates , Vertnet
Ver el registro completo
 
Archivos asociados
Thumbnail
 
Tamaño: 1.354Mb
Formato: PDF
.
Descargar
Licencia
info:eu-repo/semantics/openAccess Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)
Identificadores
URI: http://hdl.handle.net/11336/60277
DOI: http://dx.doi.org/10.1371/journal.pone.0146894
URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0146894
Colecciones
Articulos(IEGEBA)
Articulos de INSTITUTO DE ECOLOGIA, GENETICA Y EVOLUCION DE BS. AS
Citación
Zermoglio, Paula Florencia; Guralnick, Robert P.; Wieczorek, John R.; A standardized reference data set for vertebrate taxon name resolution; Public Library of Science; Plos One; 11; 1; 1-2016; 1-20; e0146894
Compartir
Altmétricas
 

Enviar por e-mail
Separar cada destinatario (hasta 5) con punto y coma.
  • Facebook
  • X Conicet Digital
  • Instagram
  • YouTube
  • Sound Cloud
  • LinkedIn

Los contenidos del CONICET están licenciados bajo Creative Commons Reconocimiento 2.5 Argentina License

https://www.conicet.gov.ar/ - CONICET

Inicio

Explorar

  • Autores
  • Disciplinas
  • Comunidades

Estadísticas

Novedades

  • Noticias
  • Boletines

Ayuda

Acerca de

  • CONICET Digital
  • Equipo
  • Red Federal

Contacto

Godoy Cruz 2290 (C1425FQB) CABA – República Argentina – Tel: +5411 4899-5400 repositorio@conicet.gov.ar
TÉRMINOS Y CONDICIONES