
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

poster
Evaluating the Capability & Accuracy of CHATGPT’s Newest Models in Diagnosing between Melanoma and Non-Melanoma Lesions
Artificial Intelligence (AI) systems like ChatGPT are increasingly explored for diagnostic purposes, particularly as more people may turn to these tools for self-diagnosis of skin conditions. This study evaluates ChatGPT-4 Turbo and ChatGPT-4 Omni (ChatGPT-4O) using the HAM10K dermatoscopic image dataset to classify 500 melanoma and 500 non-melanoma cases, with an additional 1,000 non-melanoma images to test false-positive rates. ChatGPT-4O achieved 57.7% accuracy (95% CI: 54.7%-60.8%) in identifying melanoma, while ChatGPT-4 Turbo had 54.6% accuracy (95% CI: 51.5%-57.7%). Both models showed high sensitivity but low specificity for melanoma diagnosis. For non-melanoma lesions, ChatGPT-4O’s accuracy was 6.56% (95% CI: 4.94%-8.18%), improving to 25.25% (95% CI: 22.55%-27.95%) when binary prompts were used. ChatGPT exhibited a tendency to conservatively classify lesions as melanoma, with explicit prompts improving performance. Despite these gains, the models’ diagnostic accuracy remains inadequate compared to specialized neural networks, suggesting that ChatGPT is not yet reliable for autonomous skin cancer diagnosis, even as more people may seek to use it for self-assessment.