Interpreting Natural Language Instructions Using Language, Vision, and Behavior

Benotti, Luciana; Lau, Tessa; Villalba, Martin

doi:10.1145/2629632

Artículo

Interpreting Natural Language Instructions Using Language, Vision, and Behavior

Benotti, Luciana Icon

; Lau, Tessa; Villalba, Martin

Fecha de publicación: 10/2014

Editorial: Association for Computing Machinery

Revista: ACM Transactions on Interactive Intelligent Systems

ISSN: 2160-6455

Idioma: Inglés

Tipo de recurso: Artículo publicado

Clasificación temática:

Ciencias de la Computación

Resumen

We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.

Palabras clave: Natural Language Interpretation , Multi-Modal Understanding , Action Recognition , Situated Virtual Agent

Ver el registro completo

Archivos asociados

Tamaño: 5.366Mb

Formato: PDF

Descargar

Licencia

Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)

Identificadores

URI: http://hdl.handle.net/11336/35034

URL: http://dl.acm.org/citation.cfm?id=2629632

DOI: http://dx.doi.org/10.1145/2629632

Colecciones

Articulos(CCT - CORDOBA)
Articulos de CTRO.CIENTIFICO TECNOL.CONICET - CORDOBA

Citación

Benotti, Luciana; Lau, Tessa; Villalba, Martin; Interpreting Natural Language Instructions Using Language, Vision, and Behavior; Association for Computing Machinery; ACM Transactions on Interactive Intelligent Systems; 4; 3; 10-2014

Altmétricas