The nearest neighbor search (NNS) is a technique that is used in computing to optimize the amount of time it takes to accurately locate one data point in relation to another data point in a dataset that is organized so that distances between points are measured in Euclidian space. Increasingly, NNS computation is becoming a key sub-task in many algorithms and applications that are used to process, organize, cluster, learn, and understand massive data sets, such as those used in the automotive, aerospace, and geographic information system (GIS) industries.
The Problem:
The NNS algorithm works well for small data sets, but it is too time-consuming to implement with large data sets. The approximate nearest neighbor (ANN) search, an alternative to NNS, improves search time and saves memory by estimating the nearest neighbor, without guaranteeing that the actual nearest neighbor will be returned in every case. Two limitations of this method are that it is difficult to make an ANN algorithm dynamic (i.e., allows for insertions and deletions in the data structure) or to parallelize it (i.e., use multiple processors to speed up queries).
The Solution:
Dr. Kumar and his research team are developing a novel, practical, and theoretically-sound method that will solve the NNS problem in lower dimensional spaces. Specifically, the researchers are creating an approximate k-nearest neighbor algorithm, based on Morton Sorting of points, to create a software library for approximate nearest neighbor searches for Euclidian spaces. The library will use multi-core machines efficiently (parallel) and enable the insertion and deletion of points at run time (dynamic). This new algorithm delivers the search results with expected logarithmic query times that are competitive with or exceed Mount’s approximate nearest neighbor (ANN) search.
Advantages:
- Speed on multicore machines
- Minimum spanning tree computation