CCE Theses and Dissertations

Campus Access Only

All rights reserved. This publication is intended for use solely by faculty, students, and staff of Nova Southeastern University. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, now known or later developed, including but not limited to photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author or the publisher.

Date of Award

2012

Document Type

Dissertation - NSU Access Only

Degree Name

Doctor of Philosophy in Computer Science (CISD)

Department

Graduate School of Computer and Information Sciences

Advisor

Michael J. Lazlo

Committee Member

Maxine S Cohen

Committee Member

Sumitra Mukherjee

Keywords

clustering, genetic algorithms, k-means initialization, Pattern recognition, Region-based crossover

Abstract

Data clustering, which partitions data points into clusters, has many useful applications in economics, science and engineering. Data clustering algorithms can be partitional or hierarchical. The k-means algorithm is the most widely used partitional clustering algorithm because of its simplicity and efficiency. One problem with the k-means algorithm is that the quality of partitions produced is highly dependent on the initial selection of centers. This problem has been tackled using genetic algorithms (GA) where a set of centers is encoded into an individual of a population and solutions are generated using evolutionary operators such as crossover, mutation and selection. Of the many GA methods, the region-based genetic algorithm (RBGA) has proven to be an effective technique when the centroid was used as the representative object of a cluster (ROC) and the Euclidean distance was used as the distance metric.

The RBGA uses a region-based crossover operator that exchanges subsets of centers that belong to a region of space rather than exchanging random centers. The rationale is that subsets of centers that occupy a given region of space tend to serve as building blocks. Exchanging such centers preserves and propagates high-quality partial solutions.

This research aims at assessing the RBGA with a variety of ROCs and distance metrics. The RBGA was tested along with other GA methods, on four benchmark datasets using four distance metrics, varied number of centers, and centroids and medoids as ROCs. The results obtained showed the superior performance of the RBGA across all datasets and sets of parameters, indicating that region-based crossover may prove an effective strategy across a broad range of clustering problems.

To access this thesis/dissertation you must have a valid nova.edu OR mynsu.nova.edu email address and create an account for NSUWorks.

Free My Thesis

If you are the author of this work and would like to grant permission to make it openly accessible to all, please click the Free My Thesis button.

  Contact Author

  Link to NovaCat

Share

COinS