Item Details

Brill Tagging on the Micron Automata Processor

Zhou, Qi
Thesis/Dissertation; Online
Zhou, Qi
Brown, Donald
There is a growing importance of Natural Language Processing (NLP) as it allows human-machine interaction, drawing insights from text documents and unstructured data, machine translation, etc. Many tasks are involved in the NLP pipeline. Part-of-speech (POS) Tagging is a task within NLP that makes assignments of a tag to input tokens, such as, nouns, verbs, adjectives, adverbs, etc. Various tagging techniques have been developed to accomplish this task. Brill tagging is a classic rule-based algorithm for POS tagging. However, traditional CPU implementation of the tagger is inherently slow. In this work, we take the advantage of different existing computer hardware as well as the Micron Automata Processor, a new computing architecture that can perform massive pattern matching in parallel, and implement the second stage of Brill tagging in a fashion of template matching. The direct implementation is tested with a subset of Brown Corpus using 218 contextual rules. The result shows a significant speed-up for the second stage tagger. To illustrate the general utility of hardware acceleration for other NLP tasks, the 218 contextual rules are then converted into Regular Expressions (Regex), which is more widely in use in various situations for NLP, and compared as single-threaded, multi-threaded versions on CPU, Xeon Phi and the AP. The result shows a promising performance improvement of using the AP as a Regex accelerator. This work serves as a guide of using different accelerators for various computational linguistic tasks, particularly those that involve rule-based or pattern-matching approaches, as well as Regex matching.
Date Received
University of Virginia, Department of Systems Engineering, MS (Master of Science), 2015
Published Date
MS (Master of Science)
Libra ETD Repository
Logo for In CopyrightIn Copyright


Read Online