Mostrar el registro sencillo del ítem

dc.contributor.author
Montemurro, Marcelo Alejandro  
dc.contributor.author
Zanette, Damian Horacio  
dc.date.available
2021-02-09T04:03:43Z  
dc.date.issued
2010-02  
dc.identifier.citation
Montemurro, Marcelo Alejandro; Zanette, Damian Horacio; Towards the quantification of the semantic information encoded in written language; World Scientific; Advances In Complex Systems; 13; 2; 2-2010; 135-153  
dc.identifier.issn
0219-5259  
dc.identifier.uri
http://hdl.handle.net/11336/125163  
dc.description.abstract
Written language is a complex communication signal capable of conveying information encoded in the form of ordered sequences of words. Beyond the local order ruled by grammar, semantic and thematic structures affect long-range patterns in word usage. Here, we show that a direct application of information theory quantifies the relationship between the statistical distribution of words and the semantic content of the text. We show that there is a characteristic scale, roughly around a few thousand words, which establishes the typical size of the most informative segments in written language. Moreover, we find that the words whose contributions to the overall information is larger, are the ones more closely associated with the main subjects and topics of the text. This scenario can be explained by a model of word usage that assumes that words are distributed along the text in domains of a characteristic size where their frequency is higher than elsewhere. Our conclusions are based on the analysis of a large database of written language, diverse in subjects and styles, and thus are likely to be applicable to general language sequences encoding complex information.  
dc.format
application/pdf  
dc.language.iso
eng  
dc.publisher
World Scientific  
dc.rights
info:eu-repo/semantics/openAccess  
dc.rights.uri
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/  
dc.subject
COMPLEX COMMUNICATION  
dc.subject
INFORMATION THEORY  
dc.subject
NATURAL LANGUAGE  
dc.subject.classification
Otras Ciencias Físicas  
dc.subject.classification
Ciencias Físicas  
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS  
dc.title
Towards the quantification of the semantic information encoded in written language  
dc.type
info:eu-repo/semantics/article  
dc.type
info:ar-repo/semantics/artículo  
dc.type
info:eu-repo/semantics/publishedVersion  
dc.date.updated
2021-01-27T19:16:53Z  
dc.journal.volume
13  
dc.journal.number
2  
dc.journal.pagination
135-153  
dc.journal.pais
Singapur  
dc.description.fil
Fil: Montemurro, Marcelo Alejandro. University of Manchester; Reino Unido  
dc.description.fil
Fil: Zanette, Damian Horacio. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina  
dc.journal.title
Advances In Complex Systems  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/arxiv/http://arxiv.org/abs/0907.1558  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://www.worldscientific.com/doi/abs/10.1142/S0219525910002530  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1142/S0219525910002530