Theses and Dissertations

Date of Award


Document Type


Degree Name

Doctor of Psychology (PsyD)


Center for Psychological Studies

First Advisor

Charles Golden

Second Advisor

Ryan Black

Third Advisor

Barry Schneider


item difficulty, item response theory, psychometric properties, Rasch model, WAIS-IV, Wechsler Adult Intelligence Scale 4th Edition


The ceiling and basal rules of the Wechsler Adult Intelligence Scale – Fourth Edition (WAIS-IV; Wechsler, 2008) only function as intended if subtest items proceed in order of difficulty. While many aspects of the WAIS-IV have been researched, there is no literature about subtest item difficulty and precise item difficulty values are not available. The WAIS-IV was developed within the framework of Classical Test Theory (CTT) and item difficulty was most often determined using p-values. One limitation of this method is that item difficulty values are sample dependent. Both standard error of measurement, an important indicator of reliability, and p-values change when the sample changes. A different framework within which psychological tests can be created, analyzed and refined is called Item Response Theory (IRT). IRT places items and person ability onto the same scale using linear transformations and links item difficulty level to person ability. As a result, IRT is said to be produce sample-independent statistics. Rasch modeling, a form of IRT, is one parameter logistic model that is appropriate for items with only two response options and assumes that the only factors affecting test performance are characteristics of items, such as their difficulty level or their relationship to the construct being measured by the test, and characteristics of participants, such as their ability levels. The partial credit model is similar to the standard dichotomous Rasch model, except that it is appropriate for items with more than two response options. Proponents of standard dichotomous Rasch model argue that it has distinct advantages above both CTT-based methods as well as other IRT models (Bond & Fox, 2007; Embretson & Reise, 2000; Furr & Bacharach, 2013; Hambleton & Jones, 1993) because of the principle of monotonicity, also referred to as specific objectivity, the principle of additivity or double cancellation, which “establishes that two parameters are additively related to a third variable” (Embretson & Reise, 2000, p. 148). In other words, because of the principle of monotonicity, in Rasch modeling, probability of correctly answering an item is the additive function of individuals’ ability, or trait level, and the item’s degree of difficulty. As ability increases, so does an individual’s probability of answering that item. Because only item difficulty and person ability affect an individual’s chance of correctly answering an item, inter-individual comparisons can be made even if individuals did not receive identical items or items of the same difficulty level. This is why Rasch modeling is referred to as a test-free measurement. The purpose of this study was to apply a standard dichotomous Rasch model or partial credit model to the individual items of seven core perceptual, verbal and working memory subtests of the WAIS-IV: Block Design, Matrix Reasoning, Visual Puzzles, Similarities, Vocabulary, Information, Arithmetic Digits Forward, Digits Backward and Digit Sequencing. Results revealed that WAIS-IV subtests fall into one of three categories: optimally ordered, near optimally ordered and sub-optimally ordered. Optimally ordered subtests, Digits Forward and Digits Backward, had no disordered items. Near optimally ordered subtests were those with one to three disordered items and included Digit Sequencing, Arithmetic, Similarities and Block Design. Sub-optimally ordered subtests consisted of Matrix Reasoning, Visual Puzzles, Information and Vocabulary, with the number of disordered items ranging from six to 16. Two major implications of the result of this study were considered: the impact on individuals’ scores and the impact on overall test administration time. While the number of disordered items ranged from 0 to 16, the overall impact on raw scores was deemed minimal. Because of where the disordered items occur in the subtest, most individuals are administered all the items that they would be expected to answer correctly. A one-point reduction in any one subtest is unlikely to significantly affect overall index scores, which are the scores most commonly interpreted in the WAIS-IV. However, if an individual received a one-point reduction across all subtests, this may have a more noticeable impact on index scores. In cases where individuals discontinue before having a chance to answer items that were easier, clinicians may consider testing the limits. While this would have no impact on raw scores, it may provide clinicians with a better understanding of individuals’ true abilities. Based on the findings of this study, clinicians may consider administering only certain items in order to test the limits, based on the items’ difficulty value. This study found that the start point for most subtests is too easy for most individuals. For some subtests, most individuals may be administered more than 10 items that are too easy for them. Other than increasing overall administration time, it is not clear what impact, of any, this has. However, it does suggest the need to reevaluate current start items so that they are the true basal for most people. Future studies should break standard test administration by ignoring basal and ceiling rules to collect data on more items. In order to help clarify why some items are more or less difficult than would be expected given their ordinal rank, future studies should include a qualitative aspect, where, after each subtest, individuals are asked describe what they found easy and difficult about each item. Finally, future research should examine the effects of item ordering on participant performance. While this study revealed that only minimal reductions in index scores likely result from the prematurely stopping test administration, it is not known if disordering has other impacts on performance, perhaps by increasing or decreasing an individual’s confidence.

  Link to NovaCat

Included in

Psychology Commons