Campus Access Only
All rights reserved. This publication is intended for use solely by faculty, students, and staff of Nova Southeastern University. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, now known or later developed, including but not limited to photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author or the publisher.
Date of Award
Dissertation - NSU Access Only
Doctor of Philosophy in Computer Science (CISD)
Graduate School of Computer and Information Sciences
Management of data with a time dimension increases the overhead of storage and query processing in large database applications especially with the join operation, which is a commonly used and expensive relational operator whose processing is dependent on the size of the input relations. An index-based approach has been shown to improve the processing of a join operation, which in turn, improves the performance of querying historical data. Temporal data consist of tuples associated with a time interval value having a valid life span of different lengths. With join processing on temporal data, since tuples with longer life spans tend to overlap a greater number of joining tuples, they are likely to be accessed more often. The efficient performance of a temporal join depending on index-clustered data is the main theme studied and researched in this work. The presence of intervals having an extended data range in temporal data makes the join evaluation harder because temporal data are intrinsically multidimensional.
Some temporal join processing methods create duplicates of tuples with long life spans to achieve clustering of similar data, which improves the performance on tuples that tend to be accessed more frequently. The proposed Hilbert-Temporal Join (Hilbert-TJ) join algorithm overcomes the need of data duplication by mapping temporal data into Hilbert curve space that is inherently clustered, thus allowing for fast retrieval and storage. A balanced B+ tree index structure was implemented to manage and query the data. The query method identifies data pages containing matching tuples that intersect a multidimensional region. Given that data pages consist of contiguously mapped points on the curve, the query process successively traverses along the curve to determine the next page that intersects the query region by iteratively partitioning the data space. The proposed Adaptive Replacement Cache-Temporal Data (ARC-TD) buffer replacement policy is built upon the Adaptive Replacement Cache (ARC) policy by favoring the cache retention of data pages in proportion to the average life span of the tuples in the buffer. By giving preference to tuples having long life spans, a higher cache hit ratio was evident. The caching priority is also balanced between recently and frequently accessed data.
An evaluation and comparison study of the proposed Hilbert-TJ algorithm determined the relative performance with respect to a nested-loop join, a sort-merge join, and a partition-based join algorithm that use a multiversion B+ tree (MVBT) index. The metrics are based on a comparison between the processing time (disk I/O time plus CPU time), cache hit ratio, and index storage size needed to perform the temporal join. The study was conducted with comparisons in terms of the Least Recently Used (LRU), Least Frequently Used (LFU), ARC, and the new ARC-TD buffer replacement policy. Under the given conditions, the expected outcome was that by reducing data redundancy and considering the longevity of frequently accessed temporal data, better performance was achieved. Additionally, the Hilbert-TJ algorithm offers support to both valid-time and transaction-time data.
Jaime Antonio Raigoza. 2013. Temporal Join Processing with Hilbert Curve Space Mapping. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, Graduate School of Computer and Information Sciences. (280)