CEC Theses and Dissertations

Campus Access Only

All rights reserved. This publication is intended for use solely by faculty, students, and staff of Nova Southeastern University. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, now known or later developed, including but not limited to photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author or the publisher.

Date of Award

2011

Document Type

Dissertation - NSU Access Only

Degree Name

Doctor of Philosophy in Information Systems (DISS)

Department

Graduate School of Computer and Information Sciences

Advisor

Sumitra Mukherjee

Committee Member

Michael J. Lazlo

Committee Member

Gregory E Simco

Abstract

Microaggregation is a method of statistical disclosure control that attempts to reconcile the need to release information to researchers with the need to protect privacy of individual records in a dataset. Under microaggregation, records are divided into groups containing at least k members. Actual data values of the members are replaced by the mean value of the group, such that each record in the group is indistinguishable from at least k-1 other records. The goal of microaggregation is to create groups of similar records such that information loss is minimized, where information loss is the sum squared deviation between the actual data values and the group means.

Optimal multivariate microaggregation is an NP-hard problem, and heuristics have been proposed to generate solutions in reasonable running time. New heuristics are desirable for either producing groups with lower information loss, or for producing groups with similar information loss but lower computational complexity. Some of the best performing existing microaggregation heuristics are based on record ordering, since it has been proven that for a given ordering of records, the optimal set of groups for that particular ordering can be efficiently computed.

This dissertation improves on previous heuristics that order records in a dataset and subsequently use this record ordering to generate high quality microaggregated k- partitions. This was accomplished by using heuristics from the traveling salesman problem (TSP) literature in order to more effectively order the records. In particular, two tour construction heuristics - the Greedy heuristic and the Quick Boruvka heuristic - that are comparable in complexity to extant microaggregation methods were investigated. Next, three tour improvement heuristics - 2-opt, 3-opt, and Lin-Kernighan - were used on the tours constructed to investigate whether further reduction in information loss could be achieved. The tour improvement heuristics - particularly the 3-opt and Lin-Kernighan heuristics - provided microaggregation solutions better than the best previous known solutions across several datasets and values of k.

To access this thesis/dissertation you must have a valid nova.edu OR mynsu.nova.edu email address and create an account for NSUWorks.

  Contact Author

  Link to NovaCat

Share

COinS