Item Details

Using N-Grams to Process Hindi Queries With Transliteration Variations

Natrajan, Anand; Powell, Allison; French, James
Format
Report
Author
Natrajan, Anand
Powell, Allison
French, James
Abstract
Retrieval systems based on N-grams have been used as alternatives to word-based systems. N-grams offer a language-independent technique that allows retrieval based on portions of words. A query that contains misspellings or differences in transliteration can defeat word-based systems. N-gram systems are more resistant to these problems. We present a retrieval system based on N-grams that uses a collection of Hindi songs. Within this retrieval system, we study the effect of varying N on retrievability. Additionally, we present an alternative spell-checking tool based on N- grams. We conclude with a discussion of the number of N-grams produced by different values of N for different languages and a discussion of the choice of N.
Language
English
Date Received
20121029
Published
University of Virginia, Department of Computer Science, 1997
Published Date
1997
Collection
Libra Open Repository
Logo for In CopyrightIn Copyright

Availability

Access Online