CCE Theses and Dissertations

Preliminary Examination for Empirical Knowledge(PEEK) A Fast Heuristic to Estimate the Inherent Degree of Clustering in Data Sets

J. William Cupp, Nova Southeastern University

Date of Award

2007

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computing Technology in Education (DCTE)

Department

Graduate School of Computer and Information Sciences

Advisor

Michael J. Laszlo

Committee Member

James D. Canandy

Committee Member

Sumitra Mukherjee

Abstract

The general area of this research is data clustering, in which an unsupervised classification process is used to discover and extract the clusters that naturally exist in some data set. These inherent patterns are then used to understand the data in a manner consistent with what the data represent. Such clustering methods may be used to discover natural grouping of raw data and to abstract structures which might reside there, without having any prior knowledge of whether such structures exist.

Many different clustering algorithms are in use, each having relative strengths or other points of merit. For example, some have lower asymptotic running time than others, some require a priori knowledge of the underlying data, and some produce results which are highly dependent on the input parameters.

The goal of this dissertation was to develop an approach which measures the degree to which the data under study contains natural clusters. It develops measures of the degree of clustering inherent in a data set. That is, using the measures developed, a researcher can know whether the underlying data possesses nature clusters, or not, so further processing of the data may proceed by choosing a method known to provide best results given the degree of clustering exhibited by the native data. Moreover, understanding the native clustering tendency of the data will facilitate measure of clustering validity, or how well the produced clusters actually partition the data in a manner which is meaningful in the real-world domain of the data set.

Such measures permit a researcher to choose intelligently, perhaps employing computationally intensive technique with the confidence that the underlying data warrant such effort.

NSUWorks Citation

J. William Cupp. 2007. Preliminary Examination for Empirical Knowledge(PEEK) A Fast Heuristic to Estimate the Inherent Degree of Clustering in Data Sets. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, Graduate School of Computer and Information Sciences. (474)
https://nsuworks.nova.edu/gscis_etd/474.

This document is currently not available here.

Share Feedback

Link to NovaCat

COinS

CCE Theses and Dissertations

Preliminary Examination for Empirical Knowledge(PEEK) A Fast Heuristic to Estimate the Inherent Degree of Clustering in Data Sets

Date of Award

Document Type

Degree Name

Department

Advisor

Committee Member

Committee Member

Abstract

NSUWorks Citation

Browse

Author Corner

Links

Connect with NSU