Acessibilidade / Reportar erro

Item creation and judging: ChatGPT as designer and judge

Abstract

The purpose of this study was to evaluate the effectiveness of artificial intelligence (AI), represented by ChatGPT 4.0, compared to human designers in creating items for an exam for entry into higher education in the area of Written Language. A mixed approach was utilized, combining classic and contemporary methodologies in educational evaluation including expert judgment. ChatGPT and four human designers developed 84 items, following Anderson and Krathwohls Taxonomy to establish the level of cognitive demand. The items were evaluated by two human judges and ChatGPT, using a detailed rubric that includes clarity, neutrality, format, curricular alignment, and writing. The results showed a high rate of acceptance without changes for both ChatGPT and human items, indicating good alignment with the evaluation standards. However, differences were observed in the need for minor and major changes proposed by the rubric. The study concludes that both AI and human designers are capable of generating high-quality items, highlighting the potential of AI in the design of educational items.

Keywords:
Artificial Intelligence; Educational assessment; ChatGPT; Item design; Judging Process

Universidade Federal de Minas Gerais - UFMG Av. Antônio Carlos, 6627 - Pampulha, Cep: 31270-901, Belo Horizonte - Minas Gerais / Brasil, Tel: +55 (31) 3409-6009 - Belo Horizonte - MG - Brazil
E-mail: revistatextolivre@letras.ufmg.br