Repositorio Institucional
Repositorio Institucional
CONICET Digital
  • Inicio
  • EXPLORAR
    • AUTORES
    • DISCIPLINAS
    • COMUNIDADES
  • Estadísticas
  • Novedades
    • Noticias
    • Boletines
  • Ayuda
    • General
    • Datos de investigación
  • Acerca de
    • CONICET Digital
    • Equipo
    • Red Federal
  • Contacto
JavaScript is disabled for your browser. Some features of this site may not work without it.
  • INFORMACIÓN GENERAL
  • RESUMEN
  • ESTADISTICAS
 
Artículo

bdc: A toolkit for standardizing, integrating and cleaning biodiversity data

Ribeiro, Bruno R.; Velazco, Santiago José ElíasIcon ; Guidoni Martins, Karlo; Tessarolo, Geiziane; Jardim, Lucas; Bachman, Steven P.; Loyola, Rafael
Fecha de publicación: 04/2022
Editorial: John Wiley & Sons
Revista: Methods in Ecology and Evolution
e-ISSN: 2041-210X
Idioma: Inglés
Tipo de recurso: Artículo publicado
Clasificación temática:
Otras Ciencias Biológicas

Resumen

The increase in online and openly accessible biodiversity databases provides a vast and invaluable resource to support research and policy. However, without scrutiny, errors in primary species occurrence data can lead to erroneous results and misleading information.Here, we introduce the Biodiversity Data Cleaning (bdc), an R package to address quality issues and improve the fitness-for-use of biodiversity datasets. The bdc package brings together several aspects of biodiversity data cleaning in one place. It is organized in thematic modules related to different biodiversity dimensions, including (a) Merge datasets: standardization and integration of different datasets; (b) Pre-filter: flagging and removal of invalid or non-interpretable information, followed by data amendments; (c) Taxonomy: cleaning, parsing and harmonization of scientific names from several taxonomic groups against taxonomic databases locally stored through the application of exact and partial matching algorithms; (d) Space: flagging of erroneous, suspect and low-precision geographic coordinates; and (e) Time: flagging and, whenever possible, correction of inconsistent collection date. In addition, the package contains features to visualize, document and report data quality?which is essential for making data quality assessment transparent and reproducible. The modules illustrated, and functions within, were linked to form a proposed reproducible workflow that can also integrate functions from other R packages.We demonstrated the bdc package´s applicability in cleaning more than 30 million occurrence records for terrestrial plant species in Brazil. We found that around one-fifth of the original datasets hold the standard quality requirements.Compared to other available R packages, the main strengths of the bdc package are that it brings together available tools?and a series of new ones?to assess the quality of different dimensions of biodiversity data into a single and flexible toolkit. The functions can be applied to many taxonomic groups, datasets (including regional or local repositories), countries, or world-wide. We hope the bdc package can facilitate the data cleaning process and catalyse improvements to allow the wise and efficient use of primary biodiversity data.
Palabras clave: big data , biodiversity , data cleaning , data quality , fitness-for-use , GBIF , plants , taxonomy
Ver el registro completo
 
Archivos asociados
Tamaño: 1.277Mb
Formato: PDF
.
Solicitar
Licencia
info:eu-repo/semantics/restrictedAccess Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)
Identificadores
URI: http://hdl.handle.net/11336/213254
URL: https://onlinelibrary.wiley.com/doi/10.1111/2041-210X.13868
DOI: https://doi.org/10.1111/2041-210X.13868
Colecciones
Articulos(IBS)
Articulos de INSTITUTO DE BIOLOGIA SUBTROPICAL
Citación
Ribeiro, Bruno R.; Velazco, Santiago José Elías; Guidoni Martins, Karlo; Tessarolo, Geiziane; Jardim, Lucas; et al.; bdc: A toolkit for standardizing, integrating and cleaning biodiversity data; John Wiley & Sons; Methods in Ecology and Evolution; 13; 2; 4-2022; 1421-1428
Compartir
Altmétricas
 

Enviar por e-mail
Separar cada destinatario (hasta 5) con punto y coma.
  • Facebook
  • X Conicet Digital
  • Instagram
  • YouTube
  • Sound Cloud
  • LinkedIn

Los contenidos del CONICET están licenciados bajo Creative Commons Reconocimiento 2.5 Argentina License

https://www.conicet.gov.ar/ - CONICET

Inicio

Explorar

  • Autores
  • Disciplinas
  • Comunidades

Estadísticas

Novedades

  • Noticias
  • Boletines

Ayuda

Acerca de

  • CONICET Digital
  • Equipo
  • Red Federal

Contacto

Godoy Cruz 2290 (C1425FQB) CABA – República Argentina – Tel: +5411 4899-5400 repositorio@conicet.gov.ar
TÉRMINOS Y CONDICIONES