CEC Theses and Dissertations


A Reasoning Mechanism in the Probabilistic Data Model Based on the Maximum Entropy Formalism

Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Graduate School of Computer and Information Sciences


Junping Sun

Committee Member

Heng-Da Cheng

Committee Member

Ping Tan


A desirable feature of a database system is its ability to reason with probabilistic information. This dissertation proposes a model for the process of reasoning with probabilistic information in databases. A probabilistic data model has been chosen as the framework for this study and the information theoretical aspects of the Maximum Entropy Formalism as the inference engine. This formalism, although semantically interesting, offers major complexity problems. Probabilistic data models generally assume some knowledge of the uncertainty space, and the Maximum Entropy Formalism finds the least commitment probability distribution within the uncertainty space. This dissertation is an investigation of how successfully the entropy principle could be applied to probabilistic data. The Boolean logic and weighted queries when applied to probabilistic databases have certain pros and cons. A query logic based on the combined advantages of both the Boolean logic and weighted queries would be a desirable alternative. The proposed model based on the Maximum Entropy Formalism meets the foregoing desiderata of making the query language more expressive.

Probabilistic data models generally deal with tuple-level probabilities whereas the proposed model has the ability to handle attribute-level probabilities. This model also has the ability to guess the unknown joint probability distributions. Three techniques to compute the probability distributions were studied in this dissertation: (1) a brute-force, iterative algorithm, (2) a heuristic algorithm, and (3) a simulation approach. The performance characteristics of these algorithms were examined and conclusions were drawn about the potentials and limitations of the proposed model in probabilistic database applications.

Traditionally, the probabilistic solution proposed by the Maximum Entropy Formalism is arrived at by solving nonlinear simultaneous equations for the aggregate factors of the nonlinear terms. The proposed heuristic approach and simulation technique were shown to have the highly desirable property of tractability. Further research is needed to improve the accuracy of the heuristic and make it more attractive.

Although the proposed model and algorithms are applicable to tables with a few probabilistic attributes, say two or three, the complexity of extending the model to a large number of probabilistic attributes still remains unsolved as it falls in the realm of NP-hard problems.

This document is currently not available here.

  Link to NovaCat