Informatics Seminar (Perspective in Informatics 4B) 2010 - 2011
October 8 (Fri), 16:30 - 18:00
- Place:
Lecture Hall 1, Faculty of Engineering Bldg. No.10, Main Campus
- Title:
Locality-sensitive analysis for rank-based similarity search
- Speaker:
Michael E. Houle (National Institute of Informatics)
- Abstract:
The similarity search problem, also known as nearest-neighbour search problem,
is defined as follows: given a collection of data objects, build a
data structure which, for a given query object, returns a list of data
objects most similar to the query.
Similarity search is an operation of fundamental importance in
many application areas, including data mining, information retrieval,
and multimedia databases.
The effectiveness of known methods for similarity search is
generally limited due a phenomenon known as the "curse of dimensionality",
whereby the query or preprocessing time (or both) depends exponentially
on the number of features used to represent the objects.
When the number of features is large (as is often the case in practical
applications), the query cost approaches that of a sequential search
through the entire data collection.
In recent years, a number of similarity search strategies
have been proposed in an attempt to mitigate the effects of the curse
of dimensionality, either by trading the exactness of the query result
for improvements in query processing time, or by relating query performance
to measures of the so-called "intrinsic" dimensionality of the data set,
or both.
In this presentation, we will survey several of these strategies
- locality sensitive hashing (LSH), the cover tree, and the SASH
hierarchical search structure - and introduce a new similarity search
structure, the rank cover tree (RCT), that blends the characteristics
of all three approaches.
Kyoto University
> Graduate School of Informatics
> International Courses
> Informatics Seminar