CCE Theses and Dissertations

Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


College of Engineering and Computing


Steven Terrell

Committee Member

Martha Snyder

Committee Member

Ling Wang


data mining, machine learning, prediction model, Step 1, USMLE


Identifying the factors associated with medical students who fail Step 1 of the United States Medical Licensing Examination (USMLE) has been a focus of investigation for many years. Some researchers believe lower scores on the Medical Colleges Admissions Test (MCAT) are the sole factor used to identify failure. Other researchers believe lower course outcomes during the first two years of medical training are better indicators of failure. Yet, there are medical students who fail Step 1 of the USMLE who enter medical school with high MCAT scores, and conversely medical students with lower academic credentials who are expected to have difficulty passing Step 1 but pass on the first attempt. Researchers have attempted to find the factors associated with Step 1 outcomes; however, there are two problems associated with their methods used. First is the small sample size due to the high national pass rate of Step 1. And second, research using multivariate regression models indicate correlates of Step 1 but does not predict individual student performance.

This study used data mining methods to create models which predict medical students at risk of failing Step 1 of the USMLE. Predictor variables include those available to admissions committees at application time, and final grades in courses taken during the preclinical years of medical education. Models were trained, tested, and validated using a stepwise approach, adding predictor variables in the order of courses taken to identify the point during the medical education continuum which best predicts students who will fail Step 1. Oversampling techniques were employed to resolve the problem of small sample sizes. Results of this study suggest at risk medical students can be identified as early as the end of the first term during the first year. The approach used in this study can serve as a framework which if implemented at other U.S. allopathic medical schools can identify students in time for appropriate interventions to impact Step 1 outcomes