CEC Theses and Dissertations


Knowledge Discovery by Attribute-Oriented Approach Under Directed Acyclic Concept Graph(DACG)

Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Graduate School of Computer and Information Sciences


Junping Sun

Committee Member

Michael J. Laszlo

Committee Member

Lee Leitner


Knowledge discovery in databases (KDD) is an active and promising research area with potentially high payoffs in business and scientific applications. The great challenge of knowledge discovery in databases is to process large quantities of raw data automatically, to identify the most significant and meaningful patterns, and to present this knowledge in an appropriate form for decision making and other purposes. In previous researches, Attribute-Oriented Induction, implemented artificial intelligence, "learning from examples" paradigm. This method integrates traditional database operations to extract rules from database systems. The key techniques in attribute-oriented induction are attribute generalization and undesirable attribute removal. Attribute generalization is implemented by replacing a low-level concept with its corresponding high-level concept.

The core part of this approach is a concept hierarchy, which is a linear tree schema built on each individual and independent domain (attribute), to control concept generalization.

Because such linear structure of a concept hierarchy represents the concepts that are confined to each independent domain, this topology leads to a learning process without the capability of conditional concept generalization. Therefore, it is unable to extract rich knowledge implied in different directions of non-linear concept scheme.

Although some recent improvements have extended to the basic attribute-oriented induction (BAOD approach, they have some shortcomings. For example, rule-based attribute-oriented induction has to invoke a backtracking algorithm to tackle information loss problem, whereas path id generalization has to transform each data values (at a great cost) in databases into its corresponding path id in order to perform generalization on the path id relation instead.

To overcome the above limitations, we propose a non-linear concept schema: Directed Acyclic Concept Graph (DACG), to extend the power in BAOI in order to discover knowledge across multiple domains conditionally. By utilizing graph theory, DACG can be transformed to its equivalent linear concept tree, which is a linear concept schema. Additionally, we also apply functional mappings, which reflect values from multiple domains into their high-level concepts in their codomains, to implement concept generalization. Therefore, our approach overcomes the limitations of BAOI and enriches the spectrum of learned patterns.

Even though a concept learning under a non-linear concept schema is substantially more complicated than under linear concept tree in BAOI, this research shows that our approach is feasible and practical. In addition to presenting the theoretical discussion in this dissertation, our solution has been implemented by both Java JDK1.2 in Oracle 8i under Solaris at Ultra 450 machines and PUSQL in Oracle 8i under Windows 2000 to generalize rich knowledge from live production databases.

This document is currently not available here.

  Link to NovaCat