Item Details

Print View

Maximal Information From Local Alignments

Mills, Lauren
Format
Thesis/Dissertation; Online
Author
Mills, Lauren
Advisor
Pearson, William
Abstract
Accurate identification of homologs, through sequence similarity searching programs like BLAST, is central to converting genome sequence data to biological knowledge. BLAST, FASTA, and other widely used search programs use local alignments to identify homologous sequences based on shared domains, but the boundaries of local alignments reflect both the signal from homology and the intrinsic properties of the alignment scoring matrix. Reliable identification of homologous domains requires sensitive alignment methods, accurate statistical estimates, and accurate alignment boundaries. Matrices that produce sensitive searches can also produce inaccurate alignments for more closely related homologs. Past improvements in search strategies focussed on search sensitivity and statistical accuracy, but largely ignored boundary accuracy. Homologous overextension, a boundary error that occurs when two homologous domains are aligned, but the alignment extends beyond the ends of the domains, can propagate inaccurate functional predictions and contaminate models used in more sensitive similarity searches. In this thesis, I discuss the theoretical and empirical basis for homologous overextension. In Chapter 1, I outline the properties of local similarity scoring matrices that can produce alignment overextension. In Chapter 2, I show that overextension occurs in 8% of alignments in comprehensive searches, increasing to 10% for the 100 most similar alignments. About half of this overextension occurs because of a mismatch between the alignment identity of the homologous domain and the target identity of the scoring matrix used in the initial alignment and more than 85% of this high-identity alignment overextension can be corrected by shifting to the appropriate scoring matrix. In Chapter 3, I consider alignment over extension in other contexts and summarize additional strategies for identifying over extension. Alignment accuracy is central to effectively exploiting our growing knowledge about structure-function relationships, active sites, and variant phenotypes. Future characterizations of alignment methods should examine both internal and alignment boundary accuracy.
Language
English
Published
University of Virginia, Department of Molecular, Cell and Developmental Biology, PHD (Doctor of Philosophy), 2013
Published Date
2013-11-19
Degree
PHD (Doctor of Philosophy)
Collection
Libra ETD Repository
In CopyrightIn Copyright
▾See more
▴See less

Availability

Read Online