CEC Theses and Dissertations

Title

Discovering Interesting Rules Using A Parallel, Multi-Criteria Knowledge Discovery System

Date of Award

2003

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Graduate School of Computer and Information Sciences

Advisor

Junping Sun

Committee Member

James D. Canandy

Committee Member

Michael J. Laszlo

Abstract

Knowledge discovery in databases, or data mining, is the process of finding interesting patterns in large datasets. Patterns describing the data in a dataset, sometimes in the form of rules, can be mined using a variety of techniques, but the results are not always very interesting to the analyst. Interestingness can be defined in many different ways, including usefulness, action ability, statistical significance, uniqueness, correctness, expectedness, surprisingness, etc. The interestingness of a particular rule can be evaluated objectively (using only properties of the rules themselves) or subjectively (using an analyst's knowledge of the subject area). There are numerous techniques for defining and evaluating the interestingness of a rule. However, most of these previously published techniques only assume a single criterion for interestingness. If multiple interestingness criteria are used in these traditional systems, they are strung together in a serial fashion.

The single-criterion systems are obviously limited in the amount of information they can convey about the interestingness of a rule. The serial systems are hampered by certain biases and time constraints inherent in their design. The objective of this dissertation is to outline and implement a Parallel, Multi-Criteria Knowledge Discovery System (PMCKDS) which can assess the interestingness of each rule in a ruleset using multiple definitions of interestingness. This objective is achieved by measuring each rule according to multiple measures of interestingness in parallel, then by using exploratory, unsupervised learning techniques to determine a summary interestingness score for each rule. The result is that a richer, more descriptive definition of interestingness is applied to each rule, which can improve the overall quality of rules generated from a mining exercise.

This document is currently not available here.

  Link to NovaCat

Share

COinS