ABSTRACT
Introduction: Artificial Intelligence (AI) is a tool that is already part of our reality, and this is an opportunity to understand how it can be useful in interacting with patients and providing valuable information about orthodontics.
Objective: This study evaluated the accuracy of ChatGPT in providing accurate and quality information to answer questions on Clear aligners, Temporary anchorage devices and Digital imaging in orthodontics.
Methods: forty-five questions and answers were generated by the ChatGPT 4.0, and analyzed separately by five orthodontists. The evaluators independently rated the quality of information provided on a Likert scale, in which higher scores indicated greater quality of information (1 = very poor; 2 = poor; 3 = acceptable; 4 = good; 5 = very good). The Kruskal-Wallis H test (p< 0.05) and post-hoc pairwise comparisons with the Bonferroni correction were performed.
Results: From the 225 evaluations of the five different evaluators, 11 (4.9%) were considered as very poor, 4 (1.8%) as poor, and 15 (6.7%) as acceptable. The majority were considered as good [34 (15,1%)] and very good [161 (71.6%)]. Regarding evaluators’ scores, a slight agreement was perceived, with Fleiss’s Kappa equal to 0.004.
Conclusions: ChatGPT has proven effective in providing quality answers related to clear aligners, temporary anchorage devices, and digital imaging within the context of interest of orthodontics.
Keywords: ChatGPT; Artificial intelligence; Clear aligner; Temporary anchorage device; Digital image
RESUMO
Introdução: A Inteligência Artificial (IA) é uma ferramenta que já faz parte de nossa realidade, e esta é uma oportunidade de entendermos como ela pode ser útil na interação com os pacientes e no fornecimento de informações valiosas sobre Ortodontia.
Objetivo: O objetivo deste estudo foi avaliar a precisão do ChatGPT em responder a perguntas sobre Alinhadores transparentes, Dispositivos de ancoragem temporária, e Imagens digitais em Ortodontia.
Métodos: 45 perguntas e respostas foram geradas pelo ChatGPT 4.0 e analisadas separadamente por cinco ortodontistas que, de forma independente, avaliaram a qualidade das informações fornecidas, usando uma escala de Likert, na qual pontuações mais altas indicavam uma maior qualidade das informações (1 = muito ruim; 2 = ruim; 3 = aceitável; 4 = bom; 5 = muito bom). Aplicou-se o teste H de Kruskal-Wallis (p < 0,05) e comparações pareadas post-hoc com correção de Bonferroni.
Resultados: Das 225 avaliações dos cinco avaliadores diferentes, 11 (4,9%) foram consideradas como muito ruins, 4 (1,8%) como ruins, e 15 (6,7%) como aceitáveis. A maioria foi considerada boa [34 (15,1%)] ou muito boa [161 (71,6%)]. Com relação às pontuações dos avaliadores, percebeu-se uma leve concordância, com o Kappa de Fleiss igual a 0,004.
Conclusões: O ChatGPT mostrou eficácia em fornecer respostas de qualidade para questões relacionadas a Alinhadores transparentes, Dispositivos de ancoragem temporária e Imagens digitais.
Palavras-chave: ChatGPT; Inteligência artificial; Alinhador transparente; Dispositivo de ancoragem temporária; Imagem digital
INTRODUCTION
Artificial Intelligence (AI) is the ability of digital computers or computer-controlled robots to perform tasks typically associated with intelligent beings,1 and has recently drawn much attention due to new developments in machine learning methods that incorporate multiple layers of artificial neural networks trained on big data2 or deep learning.3
According to its own description, ChatGPT is “a large language model created by OpenAI. I am designed to understand and generate natural language text, and I have been trained on a massive amount of data to help answer questions and provide information on a wide variety of topics. My training data includes text from books, articles, websites, and other sources, and I am constantly learning and updating my knowledge base to improve my responses. I can assist with tasks such as language translation, summarization, question-answering, and more” (https://chat.openai.com/chat).
Researchers have shown that ChatGPT can pass medical licensing tests and is useful in the peer review process.4 However, the excitement surrounding it has been matched by several ethical issues that could, and perhaps should, restrict its use.5
Regarding orthodontic treatment planning, different orthodontists may have distinct plans for a given situation. Before the start of the treatment process, careful treatment planning must be carried out 6. Treatment planning is an intricate process that relies heavily on the orthodontist’s subjective judgment, due to the thorough and deliberate evaluation of numerous variables.7 Studies have demonstrated that the level of agreement between orthodontists reviewing identical sets of case records is not very high.8-10
The use of AI for orthodontic diagnosis and treatment planning has shown good results,11 and automated systems have done it remarkably well, with accuracy and precision comparable to those of trained examiners,12 and the assisted software showed good agreement with AutoCEPH© and manual tracing for all the cephalometric measurements.13
The accuracy, reliability, and content validity of information related to orthodontics topics provided by ChatGPT has not been previously evaluated. This is especially significant considering queries that might be posed by dental practitioners, orthodontists, and patients. Consequently, this study offers an in-depth content analysis of how ChatGPT handles providing information about results linked with the use of clear aligners, temporary anchorage devices and digital imaging in orthodontics.
MATERIAL AND METHODS
A comprehensive content analysis was performed in a series of 45 questions related to three topics in orthodontics (Clear aligners, Temporary anchorage devices and digital imaging). The topics were considered on the basis of the most innovative in a survey of the Journal of Clinical Orthodontics website (https://www.jco-online.com < access July, 23th, 2023 >).
It was asked for the chatGPT-4 (OpenAI, San Francisco, CA: OpenAI LP) the fifteen most frequent questions about 1 - Clear aligners, 2 - Temporary anchorage devices; 3 - Digital Imaging in orthodontics, and then it was asked to the AI to respond the same self-generated questions.
All of th 45 questions and answers were obtained and saved (Table 1). These answers were analyzed by three researches with over 20 years of clinical and academic experience in orthodontics and two PhD students in orthodontics. The evaluators independently rated the quality of the answers provided on a Likert scale, in which higher scores indicated greater quality of information (1 = very poor; 2 = poor; 3 = acceptable; 4 = good; 5 = very good). Scoring was based on the combination of the best available scientific evidence and the clinical expertise. Before the scoring process began, a meeting was convened to establish a shared understanding of the scoring system among the evaluators.
The performed analysis was grounded in the principles of a crowd score strategy,14 given that the outcomes under examination (answers from ChatGPT) lack an established ‘ground truth’, and the evaluation of their quality is fundamentally subjective. We focused on the median scores given by the evaluators for each answer.15
STATISTICAL ANALYSIS
The results of the evaluators scores were tabulated in Microsoft Excel software and analyzed in the Statistical Package for Social Sciences v. 25 (SPSS; SPSS Inc., Chicago, IL) program. For each question, the median, interquartile range (IQR), and full range of scores were determined. Evaluators were given a random identifier, and Fleiss’s kappa was utilized to evaluate the consistency in scores among them. The reliability of the questionnaire (comprising the questions) was gauged using Cronbach’s alfa. The Kruskal-Wallis H test was applied to discern differences in scores among the evaluators. All statistical analyses were performed with a significance level of p< 0.05, and when conducting post-hoc pairwise comparisons, the Bonferroni correction was applied to control for multiple testing.
RESULTS
The questions and answers generated by the ChatGPT, and the median, interquartile interval and range of evaluators scores are presented in Table 1. In general, the evaluators rated ChatGPT as providing good information regarding the evaluated topics: Clear Aligners = 4.33 ± 1.189, TADs = 4.57 ± 1.015), Digital imaging = 4.49 ± 0.89. The median score for the three main topics was 5.0 for all, and did not show statistical difference among them (p> 0.05) (Table 1).
Table 2 presents descriptive data regarding clear aligners, TADs, and digital imaging. The clear aligner indicated more variability in the scores.
Table 3 presents the distribution of evaluators scores ranging from “very poor” to “very good.” The overall result showed that the highest percentage of scores in the total dataset was for the “Very Good” category (71.6%), followed by “Good” (15.1%) and “Acceptable” (6.7%). The “Poor” and “Very Poor” categories had the lowest percentages: 1.8% and 4.9%, respectively.
These results suggest that the majority of evaluators have provided scores in the higher quality categories (“Good” and “Very Good”), indicating a generally positive assessment of the topics being evaluated. The low percentages in the “Poor” and “Very Poor” categories suggest that the quality of the topics was perceived to be quite satisfactory by the evaluators.
Regarding evaluators’ scores, a ‘slight agreement’ was perceived16, with a combined Fleiss’s Kappa of 0.004. The Kruskal-Wallis test indicated a significant variation in scores among the evaluators (p < 0.001). A detailed pairwise comparison of scores can be found in Table 4.
DISCUSSION
This article examined the contemporary applications of ChatGPT, an advanced AI, focusing on the accuracy and efficiency of the answers generated for questions on clear aligners, temporary anchorage devices and digital imaging in orthodontics. In the present study, the chatbot promptly generated answers to all the questions in a matter of seconds. These answers were then compared with those from experienced orthodontists. This study potentially stands as one of the pioneering cross-sectional assessments examining the precision of ChatGPT in addressing questions on contemporary orthodontics.
Even though AI is not a brand-new technology, ChatGPT has become popular and mainstream. Several AIs has been widely used in a variety of disciplines, most notably biological and medical diagnostics,16 and the data may be trained using clinical data sets and used for a variety of tasks in dental and medical diagnostics.17
Regarding AI used for diagnosis and treatment planning in orthodontics, one study reported that AI performed remarkably well, matching the trained examiners’ accuracy and precision.12 Another study showed good reliability when an AI analyzed cephalometric points and measurements.18 Another showed success rates of 93% for the diagnosis of extraction vs. nonextraction and 84% for the detailed diagnosis of extraction patterns.19 However, none of them referred to the field of orthodontics. As in this study the grand majority of the answers were considered as very good by the five evaluators, the answers that received the lowest scores were “Q: Follow-up appointments: How often will I need to see the orthodontist during my treatment? / A: Generally, you’ll have a check-up every 6 to 8 weeks to monitor your progress and receive new sets of aligners.” (median: 2.0); “Q: Treatment cost: How much does orthodontic treatment with clear aligners cost? Will insurance cover the treatment? / A: The cost varies based on the complexity of the case and geographic location. Insurance may cover part of the cost. Always check with your insurance provider and orthodontist for details.”(median: 3.0) and “Q: Procedure: What is the process of taking digital images? Is it painful or uncomfortable? / A: The process is similar to taking traditional X-rays but quicker and more comfortable. A sensor is placed in the mouth, and the image is captured and instantly projected onto a computer screen.” (median 3.0). The answers obtained through ChatGPT are blunt, and an orthodontist with poor training may interpret all the answers as true and may lead to some level of misinformation. However, a well-trained orthodontist can take advantage of their clinical experience and maximize the treatment with esthetics, function, and stability. In the case of only written text from ChatGPT, things become more complicated, since if we are now in a situation where the experts are not able to determine what’s true or not, we lose the middleman that we desperately need to guide us through complicated topics.4 Large linguistic algorithms excel in knowledge-based examinations, but often fall short when addressing medical or dental subjects and literature. For these AI-driven models to perform optimally, they necessitate training on high-caliber datasets. However, their current training on potentially biased datasets might explain the inaccuracies observed when responding to specific research-related inquiries.
Therefore, if AI is not a brand-new concept and idea, why has this new development become so mainstream? The concept of a textbot that writes extremely well about almost everything is interesting, and human curiosity may be the answer to this question. Even though ChatGPT gave good and strong answers about the three studied subjects, this is an AI that learns from a sizable text data set compiled from books, articles, and webpages. Orthodontics requires more precise answers. ChatGPT includes both science and false information found in advertisements, social media, and websites.
ChatGPT is an AI chatbot that condenses information, provides intelligent-sounding text, and appears to be plausible, but it needs to be more accurate in orthodontics.20 The results of this article were broad, realistic and comprehensive, demonstrating knowledge of the subject, but without delving into more specific details. However, because ChatGPT is not a search engine, the answer must also be inaccurate if the source information is inaccurate.20 While ChatGPT generally provides accurate answers about orthodontics, it faces limitations such as: (1) its inability to critically review or analyze findings from scientific literature, (2) a knowledge base that only extends up to 2021 and is not updated, (3) occasional misinterpretation of medical terminology, (4) an inability to distinguish between reputable and predatory journal sources, and (5) concerns regarding scientific precision, potential biases, and the risk of disseminating misinformation to users.21,22
By now, we should consider AI an auxiliary tool in the treatment plan, but one that does not substitute for the orthodontist. We can consider the use of ChatGPT inevitable. It is up to orthodontists to seek clinical and scientific evidence. Although there is a widespread interest in the use of ChatGPT, it was not extensively trained on biomedical data nor were its responses in particular. It is likely that patients and clinicians will turn to ChatGPT for assistance in interpreting laboratory data or understanding how to utilize clinical laboratory services.1 While ChatGPT is seen as a potentially beneficial tool in healthcare settings, concerns about its accuracy, reliability, and medicolegal implications persist.15 It is important not only to understand whether the use of AI tools is reliable in the clinical environment, but also to analyze their accuracy not just in one phase, but throughout the clinical workflow.23
ChatGPT’s orthodontics answers displayed an overall accuracy: out of 225 answers, the majority were rated as very good (71.6%) or good (15.1%). However, its effectiveness is limited by: an inability to critically analyze literature findings, a knowledge database restricted to 2021 (Version 4.0), incapability to distinguish between predatory and indexed journals, and a lack of scientific precision and reliability. In addition, the divergences in evaluator agreement and median scores highlight the possible inaccuracy in ChatGPT answers and the inherent subjectivity of outcomes. To mitigate this subjectivity, given the absence of an absolute standard, a crowd score approach was used based on earlier studies,14 involving multiple evaluators and aggregating median scores. This median score reflects evaluators’ collective agreement, while the interquartile range indicates divergence points. Despite evaluators’ varied expertise and anticipated diverse opinions, this diversity reflects the general consensus in orthodontics.15 Just as evaluators from distinct specialties naturally held varying views, their collective assessment reflects a consensus within orthodontics.
ChatGPT has shown remarkable expertise in the assessed orthodontic topics, but differences in individual evaluations remind us of the complexity and subjectivity inherent in any assessment process. Patients and orthodontists need to be aware of the constraints and ethical issues surrounding ChatGPT, and should consistently verify information using reliable sources. Before incorporating these AI models into the healthcare system, efforts must be directed towards enhancing their reliability.
CONCLUSION
The evaluation of ChatGPT’s knowledge and answers regarding the orthodontic topics of clear aligners, TADs and digital imaging suggests a generally favorable reception by the evaluators: with the majority of the scores grouped in the “Very Good” and “Good” categories, the data highlights ChatGPT’s ability to provide high-quality information in these areas. Median scores for all three main topics consistently reflected this positivity.
REFERENCES
- 1 Zhou N, Siegel ZD, Zarecor S, Lee N, Campbell DA, Andorf CM, et al. Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning. PLoS Comput Biol. 2018 Jul;14(7):e1006337.
- 2 Lee JG, Jun S, Cho YW, Lee H, Kim GB, Seo JB, et al. Deep learning in medical imaging: general overview. Korean J Radiol. 2017;18(4):570-84.
- 3 Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019 Jan;25(1):44-56.
- 4 Else H. Abstracts written by ChatGPT fool scientists. Nature. 2023 Jan;613(7944):423.
- 5 The Lancet Digital Health. ChatGPT: friend or foe? Lancet Digit Health. 2023 Mar;5(3):e102.
- 6 Proffit WR, Fields HW, Larson B, Sarver DM. Contemporary orthodontics. St. Louis: Elsevier Health Sciences; 2018.
- 7 Lee R, MacFarlane T, O'Brien K. Consistency of orthodontic treatment planning decisions. Clin Orthod Res. 1999 May;2(2):79-84.
- 8 Ribarevski R, Vig P, Vig KD, Weyant R, O'Brien K. Consistency of orthodontic extraction decisions. Eur J Orthod. 1996 Feb;18(1):77-80.
- 9 Stephens CD, Drage KJ, Richmond S, Shaw WC, Roberts CT, Andrews M. Consultant opinion on orthodontic treatment plans used by dental practitioners: a pilot study. J Dent. 1993 Dec;21(6):355-9.
- 10 Han UK, Vig KW, Weintraub JA, Vig PS, Kowalski CJ. Consistency of orthodontic treatment decisions relative to diagnostic records. Am J Orthod Dentofacial Orthop. 1991 Sep;100(3):212-9.
- 11 Li P, Kong D, Tang T, Su D, Yang P, Wang H, et al. Orthodontic treatment planning based on artificial neural networks. Scient Rep. 2019 Feb;9:2037.
- 12 Khanagar SB, Al-Ehaideb A, Vishwanathaiah S, Maganur PC, Patil S, Naik S, et al. Scope and performance of artificial intelligence technology in orthodontic diagnosis, treatment planning, and clinical decision-making: a systematic review. J Dent Sci. 2021 Jan;16(1):482-92.
- 13 Prince STT, Srinivasan D, Duraisamy S, Kannan R, Rajaram K. Reproducibility of linear and angular cephalometric measurements obtained by an artificial-intelligence assisted software (WebCeph) in comparison with digital software (AutoCEPH) and manual tracing method. Dental Press J Orthod. 2023 Apr;28(1):e2321214.
- 14 Dumitrache A, Aroyo L, Welty C. Crowdsourcing ground truth for medical relation extraction. ACM Trans Interact Intell Syst. 2018;8(2):1-20.
- 15 Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023 Jun;183(6):589-96.
- 16 Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-74.
- 17 Makaremi M, Lacaule C, Mohammad-Djafari A. Deep learning and artificial intelligence for the determination of the cervical vertebra maturation degree from lateral radiography. Entropy. 2019 Dec;21(12):1222.
- 18 Brickley MR, Shepherd JP, Armstrong RA. Neural networks: a new technique for development of decision support systems in dentistry. J Dent. 1998 May;26(4):305-9.
- 19 Kunz F, Stellzig-Eisenhauer A, Zeman F, Boldt J. Artificial intelligence in orthodontics: evaluation of a fully automated cephalometric analysis using a customized convolutional neural network. J Orofac Orthop. 2020 Jan;81(1):52-68.
- 20 Jung SK, Kim TW. New approach for the diagnosis of extractions with neural network machine learning. Am J Orthod Dentofacial Orthop. 2016 Jan;149(1):127-33.
-
21 O´Brien K. What can an Artificial Intelligence chatbot tell us about orthodontic treatment? [Access 6 Mar 2023]. Available from: https://kevinobrienorthoblog.com/what-can-an-artificial-intelligence-chatbot-tell-us-about-orthodontic-treatment/
» https://kevinobrienorthoblog.com/what-can-an-artificial-intelligence-chatbot-tell-us-about-orthodontic-treatment - 22 Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare. 2023 Mar;11(6):887.
- 23 Biswas S, Logan NS, Davies LN, Sheppard AL, Wolffsohn JS. Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia. Ophthalmic Physiol Opt. Forthcoming 2023.
- 24 Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, et al. Assessing the utility of chatgpt throughout the entire clinical workflow. medRxiv. Forthcoming 2023.
Publication Dates
-
Publication in this collection
03 Nov 2023 -
Date of issue
2023
History
-
Received
20 Aug 2023 -
Accepted
04 Sept 2023