CCE Theses and Dissertations

Performance of Classification Tools on Unstructured Text

Janet L. Kourik, Nova Southeastern University

Date of Award

2005

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Graduate School of Computer and Information Sciences

Advisor

Sumitra Mukherjee

Committee Member

Marlyn Kemper Littman

Committee Member

Maxine S. Cohen

Abstract

As digital storage of data continues to grow it is increasingly difficult to find information on demand, particularly in unstructured text documents. Unstructured documents lack explicit record definitions or other metadata that can facilitate retrieval. Yet, digital information is increasingly stored in unstructured documents. Manual or human-assisted indexing of unstructured documents is time consuming and expensive. Automated retrieval techniques, such as those used by Internet search engines, have a variety of limitations including depth and breadth of coverage and frequency of update. In addition many retrieval methods become impractical on large document collections where the need for improved performance is even greater. Most text indexing and retrieval systems include a component that classifies documents. This research will focus on the automated classification of unstructured text. The goal of this research was to investigate a commercial classification tool and evaluate the tool's performance on Reuters-21578, a benchmark categorization collection of unstructured text. The performance of a commercial-off-the-shelf(COTS) product, Oracle Text on the Reuters-21578 collection was evaluated using a variety of measures documented in the classification literature.

NSUWorks Citation

Janet L. Kourik. 2005. Performance of Classification Tools on Unstructured Text. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, Graduate School of Computer and Information Sciences. (645)
https://nsuworks.nova.edu/gscis_etd/645.

This document is currently not available here.

Share Feedback

Link to NovaCat

COinS

CCE Theses and Dissertations

Performance of Classification Tools on Unstructured Text

Date of Award

Document Type

Degree Name

Department

Advisor

Committee Member

Committee Member

Abstract

NSUWorks Citation

Browse

Author Corner

Links

Connect with NSU