CEC Theses and Dissertations

Campus Access Only

All rights reserved. This publication is intended for use solely by faculty, students, and staff of Nova Southeastern University. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, now known or later developed, including but not limited to photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author or the publisher.

Date of Award


Document Type

Dissertation - NSU Access Only

Degree Name

Doctor of Philosophy in Computer Information Systems (DCIS)


Graduate School of Computer and Information Sciences


Sumitra Mukherjee

Committee Member

Michael Laszlo

Committee Member

Francisco Mitropoulos


Given an undirected edge-weighted graph G and a positive integer m, the Constrained Forest Problem (CFP) seeks a lowest-cost (or minimum-weight) forest F which spans G while satisfying the requirement that each tree in F contain at least m vertices. This problem has been shown to be NP-hard for values of m greater than three, giving rise to a number of approximation strategies for finding reasonable m-forest solutions. This research presents a new genetic algorithm (GA) which can consistently find equal-or-better solutions to the problem when compared to non-genetic alternatives. This GA is unique in that it uses chromosomes which are actual candidate solutions (m-forests) and performs genetic operations (random creation, selection, recombination, and mutation) on these candidate solutions. Experiments were run using 180 different GA configurations on 50 benchmark graphs to determine which operators and techniques would be most successful in solving the m-forest problem. The "heaviest edge first" or HEF algorithm run against the minimum spanning tree (MST) of a graph was used as a performance metric. Previously, the HEF(MST) algorithm had been shown to produce the best results on m-forest problems. When the GA was able to find better results than HEF(MST) on the same problem instance, this was considered a GA success. Since the GA's initial population included heuristic candidate solutions such as HEF(MST), the GA never did worse than the best of these. GA solution quality did vary, however, often finding several different better-than-HEF(MST) solutions, illustrating the stochastic nature of the process. Based on data collected from the 9000 initial problem instances, several factors were shown to significantly improve the quality of the GA solution. Problem instances which did not include mutation had a much lower success rate than those which did. Adding calculated heuristic solutions such as HEF(MST) to the initial population allowed the GA to converge more quickly and improved its likelihood of finding better-than-HEF(MST) solutions. Building an initial population using randomly-generated candidate solutions whose edges were restricted to the problem graph's MST proved equally successful. GA configuration options were analyzed using all 9000 test cases and again using only those 403 cases in which the GA was able to find the very best solution for each graph. These analyses were consistent, and resulted in the identification of a single "best" GA configuration which combined the best overall initial population strategy, random seeding algorithms, mutation and crossover strategy. The selected configuration was then further tested using various values of m to ensure that the resulting GA could in fact find better-than-HEF(MST) solutions for the majority of problem instances.

To access this thesis/dissertation you must have a valid nova.edu OR mynsu.nova.edu email address and create an account for NSUWorks.

  Contact Author

  Link to NovaCat