CEC Theses and Dissertations

Date of Award

2012

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Information Systems (DCIS)

Department

Graduate School of Computer and Information Sciences

Advisor

Sumitra Mukherjee

Committee Member

Francisco Mitropoulos

Committee Member

Michael Laszlo

Abstract

This dissertation develops methods to minimize recommendation error costs when inputs to a rule-based expert system are prone to errors. The problem often arises in web-based applications where data are inherently noisy or provided by users who perceive some benefit from falsifying inputs. Prior studies proposed methods that attempted to minimize the probability of recommendation error, but did not take into account the relative costs of different types of errors. In situations where these differences are significant, an approach that minimizes the expected misclassification error costs has advantages over extant methods that ignore these costs.

Building on the existing literature, two new techniques - Cost-Based Input Modification (CBIM) and Cost-Based Knowledge-Base Modification (CBKM) were developed and evaluated. Each method takes as inputs (1) the joint probability distribution of a set of rules, (2) the distortion matrix for input noise as characterized by the probability distribution of the observed input vectors conditioned on their true values, and (3) the misclassification cost for each type of recommendation error. Under CBIM, for any observed input vector v, the recommendation is based on a modified input vector v' such that the expected error costs are minimized. Under CBKM the rule base itself is modified to minimize the expected cost of error.

The proposed methods were investigated as follows: as a control, in the special case where the costs associated with different types of errors are identical, the recommendations under these methods were compared for consistency with those obtained under extant methods. Next, the relative advantages of CBIM and CBKM were compared as (1) the noise level changed, and (2) the structure of the cost matrix varied.

As expected, CBKM and CBIM outperformed the extant Knowledge Base Modification (KM) and Input Modification (IM) methods over a wide range of input distortion and cost matrices, with some restrictions. Under the control, with constant misclassification costs, the new methods performed equally with the extant methods. As misclassification costs increased, CBKM outperformed KM and CBIM outperformed IM. Using different cost matrices to increase misclassification cost asymmetry and order, CBKM and CBIM performance increased. At very low distortion levels, CBKM and CBIM underperformed as error probability became more significant in each method's estimation. Additionally, CBKM outperformed CBIM over a wide range of input distortion as its technique of modifying an original knowledge base outperformed the technique of modifying inputs to an unmodified decision tree.

  Link to NovaCat

Share

COinS