CCE Theses and Dissertations

Campus Access Only

All rights reserved. This publication is intended for use solely by faculty, students, and staff of Nova Southeastern University. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, now known or later developed, including but not limited to photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author or the publisher.

Temporal Join Processing with Hilbert Curve Space Mapping

Jaime Antonio Raigoza, Nova Southeastern UniversityFollow

Date of Award

2013

Document Type

Dissertation - NSU Access Only

Degree Name

Doctor of Philosophy in Computer Science (CISD)

Department

Graduate School of Computer and Information Sciences

Advisor

Junping Sun

Committee Member

Peixiang Liu

Committee Member

Francisco Mitropoulos

Keywords

arc, Hilbert, join, optimization, query, temporal

Abstract

Management of data with a time dimension increases the overhead of storage and query processing in large database applications especially with the join operation, which is a commonly used and expensive relational operator whose processing is dependent on the size of the input relations. An index-based approach has been shown to improve the processing of a join operation, which in turn, improves the performance of querying historical data. Temporal data consist of tuples associated with a time interval value having a valid life span of different lengths. With join processing on temporal data, since tuples with longer life spans tend to overlap a greater number of joining tuples, they are likely to be accessed more often. The efficient performance of a temporal join depending on index-clustered data is the main theme studied and researched in this work. The presence of intervals having an extended data range in temporal data makes the join evaluation harder because temporal data are intrinsically multidimensional.

Some temporal join processing methods create duplicates of tuples with long life spans to achieve clustering of similar data, which improves the performance on tuples that tend to be accessed more frequently. The proposed Hilbert-Temporal Join (Hilbert-TJ) join algorithm overcomes the need of data duplication by mapping temporal data into Hilbert curve space that is inherently clustered, thus allowing for fast retrieval and storage. A balanced B+ tree index structure was implemented to manage and query the data. The query method identifies data pages containing matching tuples that intersect a multidimensional region. Given that data pages consist of contiguously mapped points on the curve, the query process successively traverses along the curve to determine the next page that intersects the query region by iteratively partitioning the data space. The proposed Adaptive Replacement Cache-Temporal Data (ARC-TD) buffer replacement policy is built upon the Adaptive Replacement Cache (ARC) policy by favoring the cache retention of data pages in proportion to the average life span of the tuples in the buffer. By giving preference to tuples having long life spans, a higher cache hit ratio was evident. The caching priority is also balanced between recently and frequently accessed data.

An evaluation and comparison study of the proposed Hilbert-TJ algorithm determined the relative performance with respect to a nested-loop join, a sort-merge join, and a partition-based join algorithm that use a multiversion B+ tree (MVBT) index. The metrics are based on a comparison between the processing time (disk I/O time plus CPU time), cache hit ratio, and index storage size needed to perform the temporal join. The study was conducted with comparisons in terms of the Least Recently Used (LRU), Least Frequently Used (LFU), ARC, and the new ARC-TD buffer replacement policy. Under the given conditions, the expected outcome was that by reducing data redundancy and considering the longevity of frequently accessed temporal data, better performance was achieved. Additionally, the Hilbert-TJ algorithm offers support to both valid-time and transaction-time data.

NSUWorks Citation

Jaime Antonio Raigoza. 2013. Temporal Join Processing with Hilbert Curve Space Mapping. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, Graduate School of Computer and Information Sciences. (280)
https://nsuworks.nova.edu/gscis_etd/280.

Download

To access this thesis/dissertation you must have a valid nova.edu OR mynsu.nova.edu email address and create an account for NSUWorks.

Free My Thesis

If you are the author of this work and would like to grant permission to make it openly accessible to all, please click the Free My Thesis button.

Contact Author

COinS

CCE Theses and Dissertations

Campus Access Only

Temporal Join Processing with Hilbert Curve Space Mapping

Date of Award

Document Type

Degree Name

Department

Advisor

Committee Member

Committee Member

Keywords

Abstract

NSUWorks Citation

Browse

Author Corner

Links

Connect with NSU

CCE Theses and Dissertations

Campus Access Only

Temporal Join Processing with Hilbert Curve Space Mapping

Author

Date of Award

Document Type

Degree Name

Department

Advisor

Committee Member

Committee Member

Keywords

Abstract

NSUWorks Citation

Share

Browse

Author Corner

Links

Connect with NSU