Item Details

Print View

Database Selection in Distributed Information Retrieval: A Study of Multi-Collection Information Retrieval

Powell, Allison L
Format
Thesis/Dissertation; Online
Author
Powell, Allison L
Advisor
French, James C
Abstract
The proliferation of online information resources increases the importance of effective and efficient information retrieval in a multi-collection environment. Multi-collection searching includes distributed searching as a special case but is more broadly defined here to incorporate searching partitioned content independently from its physical storage. It is cast in three parts: collection selection (also referred to as database selection) - decide here should a query be sent; query processing - execute the query at each selected collection; and results merging - combine the results from individual collections into a single coherent list for the searcher. We focus our attention on collection selection. We compare a number of different collection selection approaches and examine the effect of collection selection on document retrieval performance. We consider multi-collection retrieval in six different test environments utilizing three document test beds. Considering collection selection in isolation, we find that effective collection selection can be achieved using limited information about each collection. We then turn our attention from selection alone to data item retrieval in a multi-collection environment, considering retrieval performance in the same six test environments. First we find that good collection selection has the potential to result in better retrieval effectiveness than can be achieved in an equivalent single collection. Second we find that good performance can be achieved when only a few collections are selected and that the performance generally increases as more collections are selected. Finally we find that when collection selection is employed, it may not be necessary to maintain collection wide information (CWI), e.g., global idf. Local information can be used to achieve equivalent performance. This means that multi- collection systems can be engineered with more autonomy and less cooperation. This work demonstrates that improvements in collection selection can lead to broader improvements in document retrieval performance.
Published
University of Virginia, Department of Computer Science, PhD, 2001
Published Date
2001-01-31
Degree
PhD
Collection
Libra ETD Repository
In CopyrightIn Copyright
▾See more
▴See less

Availability

Read Online