Acessibilidade / Reportar erro

Automatic classification of written descriptions by healthy adults: An overview of the application of natural language processing and machine learning techniques to clinical discourse analysis

Classificação Automática de Discurso Descritivo Escrito de Adultos Sadios: uma Visão Geral da Aplicação de Técnicas de Processamento de Línguas Naturais e Aprendizado de Máquina à Análise Clínica do Discurso

Discourse production is an important aspect in the evaluation of brain-injured individuals. We believe that studies comparing the performance of brain-injured subjects with that of healthy controls must use groups with compatible education. A pioneering application of machine learning methods using Brazilian Portuguese for clinical purposes is described, highlighting education as an important variable in the Brazilian scenario.

OBJECTIVE:

The aims were to describe how to: (i) develop machine learning classifiers using features generated by natural language processing tools to distinguish descriptions produced by healthy individuals into classes based on their years of education; and (ii) automatically identify the features that best distinguish the groups.

METHODS:

The approach proposed here extracts linguistic features automatically from the written descriptions with the aid of two Natural Language Processing tools: Coh-Metrix-Port and AIC. It also includes nine task-specific features (three new ones, two extracted manually, besides description time; type of scene described - simple or complex; presentation order - which type of picture was described first; and age). In this study, the descriptions by 144 of the subjects studied in Toledo18 were used, which included 200 healthy Brazilians of both genders.

RESULTS AND CONCLUSION:

A Support Vector Machine (SVM) with a radial basis function (RBF) kernel is the most recommended approach for the binary classification of our data, classifying three of the four initial classes. CfsSubsetEval (CFS) is a strong candidate to replace manual feature selection methods.

natural language processing; language tests; narratives; adults; educational status; age groups


Academia Brasileira de Neurologia, Departamento de Neurologia Cognitiva e Envelhecimento R. Vergueiro, 1353 sl.1404 - Ed. Top Towers Offices, Torre Norte, São Paulo, SP, Brazil, CEP 04101-000, Tel.: +55 11 5084-9463 | +55 11 5083-3876 - São Paulo - SP - Brazil
E-mail: revistadementia@abneuro.org.br | demneuropsy@uol.com.br