Item Details

Selection of Distance Metrics and Feature Subsets for K-Nearest Neighbor Classifiers

Barker, Allen L.
Format
Thesis/Dissertation; Online
Author
Barker, Allen L.
Advisor
Martin, Worthy
Abstract
The k-nearest neighbor (kNN) classifier is a popular and effective method for associating a feature vector with a unique element in a known, finite set of classes. A common choice for the distance metric used in kNN classification is the quadratic distance Q(x; A; y) = (x -y)'A(x-y), where x and y are n-vectors of features, A is a symmetric n x n matrix, and prime denotes transpose. For finite sets of training samples the choice of matrix A is important in optimizing classifier performance. We show that A can be approximately optimized via gradient descent on a sigmoidally smoothed estimate of the classifier's probability of error. We describe an algorithm for performing the metric selection, and compare the performance of our method with that of other methods. We demonstrate that adding noise during the descent process can reduce the effects of overfitting. We further suggest how feature subset selection can be treated as a special case of this metric selection
Date Received
20111213
Published
University of Virginia, Department of Computer Science, PHD (Doctor of Philosophy), 1997
Published Date
1997-05-30
Degree
PHD (Doctor of Philosophy)
Collection
Libra ETD Repository
Logo for In CopyrightIn Copyright

Availability

Read Online