Mostrar el registro sencillo del ítem
dc.contributor.author
Alemany, Laura Alonso
dc.contributor.author
Benotti, Luciana
dc.contributor.author
Maina, Hernán Javier
dc.contributor.author
Lucía Gonzalez
dc.contributor.author
Rajngewerc, Mariela
dc.contributor.author
Martínez, Lautaro
dc.contributor.author
Sánchez, Jorge
dc.contributor.author
Schilman, Mauro
dc.contributor.author
Ivetta, Guido
dc.contributor.author
Halvorsen, Alexia
dc.contributor.author
Mata Rojo, Amanda
dc.contributor.author
Bordon, Matías
dc.contributor.author
Busaniche, Beatriz
dc.date.available
2023-12-01T14:41:13Z
dc.date.issued
2023-03
dc.identifier.citation
Alemany, Laura Alonso; Benotti, Luciana; Maina, Hernán Javier; Lucía Gonzalez; Rajngewerc, Mariela; et al.; A methodology to characterize bias and harmful stereotypes in natural language processing in Latin America; Cornell University; arXiv; 3-2023; 1-24
dc.identifier.issn
2331-8422
dc.identifier.uri
http://hdl.handle.net/11336/218993
dc.description.abstract
Automated decision-making systems, specially those based on natural language processing, are pervasive in our lives. They are not only behind the internet search engines we use daily, but also take more critical roles: selecting candidates for a job, determining suspects of a crime, diagnosing autism and more. Such automated systems make errors, which may be harmful in many ways, be it because of the severity of the consequences (as in health issues) or because of the sheer number of people they affect. When errors made by an automated system affect a population more than other, we call the system biased.Most modern natural language technologies are based on artifacts obtained from enormous volumes of text using machine learning, namely language models and word embeddings. Since they are created applying subsymbolic machine learning, mostly artificial neural networks, they are opaque and practically uninterpretable by direct inspection, thus making it very difficult to audit them.In this paper we present a methodology that spells out how social scientists, domain experts, and machine learning experts can collaboratively explore biases and harmful stereotypes in word embeddings and large language models. Our methodology is based on the following principles:1. focus on the linguistic manifestations of discrimination on word embeddings and language models, not on the mathematical properties of the models2. reduce the technical barrier for discrimination experts3. characterize through a qualitative exploratory process in addition to ametric-based approach4. address mitigation as part of the training process, not as an after thought.
dc.format
application/pdf
dc.language.iso
eng
dc.publisher
Cornell University
dc.rights
info:eu-repo/semantics/openAccess
dc.rights.uri
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.subject
Natural Language Processing
dc.subject
Language models
dc.subject
Bias
dc.subject
Stereotypes and Discrimination
dc.subject.classification
Otras Ciencias de la Computación e Información
dc.subject.classification
Ciencias de la Computación e Información
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS
dc.title
A methodology to characterize bias and harmful stereotypes in natural language processing in Latin America
dc.type
info:eu-repo/semantics/article
dc.type
info:ar-repo/semantics/artículo
dc.type
info:eu-repo/semantics/publishedVersion
dc.date.updated
2023-11-28T14:57:11Z
dc.journal.pagination
1-24
dc.journal.pais
Estados Unidos
dc.journal.ciudad
Cornell
dc.description.fil
Fil: Alemany, Laura Alonso. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Fundación Via Libre; Argentina
dc.description.fil
Fil: Benotti, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fundación Via Libre; Argentina. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física. Sección Física; Argentina
dc.description.fil
Fil: Maina, Hernán Javier. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fundación Via Libre; Argentina. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina
dc.description.fil
Fil: Lucía Gonzalez. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Fundación Via Libre; Argentina
dc.description.fil
Fil: Rajngewerc, Mariela. Fundación Via Libre; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física. Sección Ciencias de la Computación; Argentina
dc.description.fil
Fil: Martínez, Lautaro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Fundación Via Libre; Argentina
dc.description.fil
Fil: Sánchez, Jorge. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina
dc.description.fil
Fil: Schilman, Mauro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina
dc.description.fil
Fil: Ivetta, Guido. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina
dc.description.fil
Fil: Halvorsen, Alexia. Fundación Via Libre; Argentina
dc.description.fil
Fil: Mata Rojo, Amanda. Fundación Via Libre; Argentina
dc.description.fil
Fil: Bordon, Matías. Fundación Via Libre; Argentina. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina
dc.description.fil
Fil: Busaniche, Beatriz. Fundación Via Libre; Argentina
dc.journal.title
arXiv
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://arxiv.org/abs/2207.06591v3
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/https://doi.org/10.48550/arXiv.2207.06591
Archivos asociados