Acessibilidade / Reportar erro

Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection

Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning.

Keywords:
COVID-19; Natural Language Processing; Health Care; Selection Criteria; Proprietary Health Facilities


Escola Nacional de Saúde Pública Sergio Arouca, Fundação Oswaldo Cruz Rua Leopoldo Bulhões, 1480 , 21041-210 Rio de Janeiro RJ Brazil, Tel.:+55 21 2598-2511, Fax: +55 21 2598-2737 / +55 21 2598-2514 - Rio de Janeiro - RJ - Brazil
E-mail: cadernos@ensp.fiocruz.br