Corpus-based Sentence Deletion and Split Decisions for Spanish Text Simplification

Por favor, use este identificador para citar o enlazar este ítem: http://repositoriodigital.ipn.mx/handle/123456789/16636

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Štajner, Sanja	-
dc.contributor.author	Drndarevic, Biljana	-
dc.contributor.author	Saggion, Horacio	-
dc.date.accessioned	2013-08-08T19:13:45Z	-
dc.date.available	2013-08-08T19:13:45Z	-
dc.date.issued	2013-06-07	-
dc.identifier.citation	Revista Computación y Sistemas; Vol. 17 No.2	es
dc.identifier.issn	1405-5546	-
dc.identifier.uri	http://www.repositoriodigital.ipn.mx/handle/123456789/16636	-
dc.description.abstract	Abstract. This study addresses the automatic simplification of texts in Spanish in order to make them more accessible to people with cognitive disabilities. A corpus analysis of original and manually simplified news articles was undertaken in order to identify and quantify relevant operations to be implemented in a text simplification system. The articles were further compared at sentence and text level by means of automatic feature extraction and various machine learning classification algorithms, using three different groups of features (POS frequencies, syntactic information, and text complexity measures) with the aim of identifying features that help separate original documents from their simple equivalents. Finally, it was investigated whether these features can be used to decide upon simplification operations to be carried out at the sentence level (split, delete, and reduce). Automatic classification of original sentences into those to be kept and those to be eliminated outperformed the classification that was previously conducted on the same corpus. Kept sentences were further classified into those to be split or significantly reduced in length and those to be left largely unchanged, with the overall F-measure up to 0.92. Both experiments were conducted and compared on two different sets of features: all features and the best subset returned by an attribute selection algorithm.	es
dc.description.sponsorship	Instituto Politécnico Nacional - Centro de Investigación en Computación (CIC).	es
dc.language.iso	en_US	es
dc.publisher	Revista Computación y Sistemas; Vol. 17 No.2	es
dc.relation.ispartofseries	Revista Computación y Sistemas;Vol. 17 No.2	-
dc.subject	Keywords. Spanish text simplification, supervised learning, sentence classification.	es
dc.title	Corpus-based Sentence Deletion and Split Decisions for Spanish Text Simplification	es
dc.title.alternative	Eliminación de frases y decisiones de división basadas en corpus para simplificación de textos en español	es
dc.type	Article	es
dc.description.especialidad	Investigación en Computación	es
dc.description.tipo	PDF	es
Aparece en las colecciones:	Revistas

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
251_ART 14.pdf		2.23 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem

El Instituto Politécnico Nacional

Contribuye al desarrollo económico y social de la nación, a través de la formación integral de personas competentes; de la investigación, el desarrollo tecnológico y la innovación. Además tiene reconocimiento internacional por su calidad e impacto social.

Aviso de privacidad