Abstract
Purpose: This cross-sectional study evaluated whether Chat Generative Pre-Trained Transformer (ChatGPT) GPT-3.5 and GPT-4O provide treatment information consistent with high-quality clinical practice guidelines (CPGs) for knee osteoarthritis (OA). Methods: High-quality CPGs published in the past decade were identified via PubMed and PEDro, with the search updated on November 11, 2024. Guidelines were appraised using the AGREE II tool. GPT-3.5 and GPT-4O were queried with common treatment-related questions, and their responses were compared to CPG recommendations. Two independent reviewers conducted a thematic content analysis of GPT-3.5 and GPT-4O responses using a deductive–inductive codebook, iteratively refined through consensus, to identify major themes/subthemes, and their frequencies. Consistency between ChatGPT outputs and CPGs was categorized as high agreement (4/4 “yes” responses), moderate (3/4 “yes” responses), or low agreement (2/4 or fewer “yes” responses or 2/4 not reported). Study selection and data extraction were performed independently by two reviewers, with a third reviewer consulted when necessary. Inter-reviewer agreement for data extraction, assessing alignment between ChatGPT and CPGs, was evaluated using the percentage agreement to measure consistency in data identification and categorization. Results: Four high-quality guidelines were identified and analyzed. GPT-3.5 and GPT-4o generated 14 and 10 questions, respectively, yielding 10 themes and 33 subthemes. The 10 themes included: exercise and physical therapy, lifestyle modifications, medications, supplements and herbal remedies, assistive devices, additional therapies, education, consultation, intraarticular injections, and surgical and advanced treatments. Among the subthemes, 6.06% demonstrated high agreement (e.g., low-impact aerobic exercises, patient education), 18.18% moderate agreement (e.g., strengthening, weight management), and 75.75% low agreement across interventions. Excluding the physical therapy-specific CPG increased agreement for treatments such as analgesics, NSAIDs, glucosamine and chondroitin, and intra-articular corticosteroid injections. Conclusion: While GPT-3.5 and GPT-4O demonstrated high or moderate agreement with CPGs for certain themes, most subthemes showed low agreement. A separate analysis excluding a CPG developed specifically for physical therapists modestly improved agreement levels but did not alter the overall pattern. These findings highlight the need for ongoing refinement of AI tools and underscore the importance of clinicians critically evaluating AI-generated content, particularly as patients may increasingly rely on such tools to guide self-management decisions.
Recommended Citation
Garcia AN, Neville B, Myers B, Leineke L, Green K. Agreement of ChatGPT with Clinical Practice Guidelines for Knee Osteoarthritis Treatment. The Internet Journal of Allied Health Sciences and Practice. 2026 Mar 03;24(1), Article 16.
