CEC Theses and Dissertations

Campus Access Only

All rights reserved. This publication is intended for use solely by faculty, students, and staff of Nova Southeastern University. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, now known or later developed, including but not limited to photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author or the publisher.

Date of Award

2012

Document Type

Dissertation - NSU Access Only

Degree Name

Doctor of Philosophy in Information Systems (DISS)

Department

Graduate School of Computer and Information Sciences

Advisor

Sumitra Mukherjee

Committee Member

Michael J. Lazlo

Committee Member

Junping Sun

Abstract

Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major shortcoming: it is extremely sensitive to the choice of initial centers used to seed the algorithm. Unless k-means is carefully initialized, it converges to an inferior local optimum and results in poor quality partitions. Developing improved method for selecting initial centers for k-means is an active area of research. Genetic algorithms (GAs) have been successfully used to evolve a good set of initial centers. Among the most promising GA-based methods are those that exchange neighboring centers between candidate partitions in their crossover operations.

K-means is best suited to work when datasets have well-separated non-overlapping clusters. Fuzzy c-means (FCM) is a popular variant of k-means that is designed for applications when clusters are less well-defined. Rather than assigning each point to a unique cluster, FCM determines the degree to which each point belongs to a cluster. Like k-means, FCM is also extremely sensitive to the choice of initial centers. Building on GA-based methods for initial center selection for k-means, this dissertation developed an evolutionary program for center selection in FCM called FCMGA. The proposed algorithm utilized region-based crossover and other mechanisms to improve the GA.

To evaluate the effectiveness of FCMGA, three independent experiments were conducted using real and simulated datasets. The results from the experiments demonstrate the effectiveness and consistency of the proposed algorithm in identifying better quality solutions than extant methods. Moreover, the results confirmed the effectiveness of region-based crossover in enhancing the search process for the GA and the convergence speed of FCM. Taken together, findings in these experiments illustrate that FCMGA was successful in solving the problem of initial center selection in partitional clustering algorithms.

To access this thesis/dissertation you must have a valid nova.edu OR mynsu.nova.edu email address and create an account for NSUWorks.

  Contact Author

  Link to NovaCat

Share

COinS