EXPLORING ISSUES IN ANALYZING NATIONAL DATABASES USING LOGISTIC REGRESSION: APPLICATION OF MEDICAL EXPENDITURE PANEL SURVEY (MEPS

Abdullah Althemery, Nova Southeastern University
Leanne Lai, Nova Southeastern University

Abstract

Objective. The study investigated three main issues when applying logistic regression in nationally representative multistage survey data: subgroup analysis, multicollinearity, and receiver operating characteristic (ROC) curves. Background. Most national data use a complex stratified multistage probability design including cluster, strata, and weight adjustment to extrapolate study results to a national level. Survey procedures are available in Statistical Analysis System 9.2. However, several issues might occur if not used appropriately. Moreover, no clear agreement exists on detecting multicollinearity and generating ROC curves in these recent survey logistic procedures. The current study using Medical Expenditure Panel Survey (MEPS) data discussed and compared the available principles and techniques. Methods. First, a subpopulation analysis was conducted using two procedures with and without domain statements. Also, various multicollinearity methods were conducted and compared. Lastly, a ROC curve in survey logistics was generated and compared with and without survey procedures. Results. The study results showed that the estimates without domain statements yielded potentially overestimated standard errors. The tolerance test and variance inflation factor (VIF) for detecting multicollinearity were similar to two applied procedures: the linear regression procedure and the adjusted weight matrix by maximum likelihood algorithm procedure. ROC curves accounting for the national estimation were successfully generated and offered more reliability. Conclusion. Accounting for total population weights when analyzing a subgroup in national databases is important. New methods are required for exploring multicollinearity in survey logistic regression procedures.

 
Feb 12th, 12:00 AM

EXPLORING ISSUES IN ANALYZING NATIONAL DATABASES USING LOGISTIC REGRESSION: APPLICATION OF MEDICAL EXPENDITURE PANEL SURVEY (MEPS

Morris Auditorium

Objective. The study investigated three main issues when applying logistic regression in nationally representative multistage survey data: subgroup analysis, multicollinearity, and receiver operating characteristic (ROC) curves. Background. Most national data use a complex stratified multistage probability design including cluster, strata, and weight adjustment to extrapolate study results to a national level. Survey procedures are available in Statistical Analysis System 9.2. However, several issues might occur if not used appropriately. Moreover, no clear agreement exists on detecting multicollinearity and generating ROC curves in these recent survey logistic procedures. The current study using Medical Expenditure Panel Survey (MEPS) data discussed and compared the available principles and techniques. Methods. First, a subpopulation analysis was conducted using two procedures with and without domain statements. Also, various multicollinearity methods were conducted and compared. Lastly, a ROC curve in survey logistics was generated and compared with and without survey procedures. Results. The study results showed that the estimates without domain statements yielded potentially overestimated standard errors. The tolerance test and variance inflation factor (VIF) for detecting multicollinearity were similar to two applied procedures: the linear regression procedure and the adjusted weight matrix by maximum likelihood algorithm procedure. ROC curves accounting for the national estimation were successfully generated and offered more reliability. Conclusion. Accounting for total population weights when analyzing a subgroup in national databases is important. New methods are required for exploring multicollinearity in survey logistic regression procedures.