SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
Combinatorial Algorithms
   for Nearest Neighbors, Near-Duplicates
           and Small-World Design


                                 Yury Lifshits
                                Yahoo! Research

                            Shengyu Zhang
                  The Chinese University of Hong Kong



                                   SODA 2009

Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   1 / 29
Similarity Search: an Example
 Input: Set of objects
 Task: Preprocess it
Similarity Search: an Example
 Input: Set of objects
 Task: Preprocess it




              Query: New object
              Task: Find the most
              similar one in the dataset
Similarity Search: an Example
    Input: Set of objects
    Task: Preprocess it




                                                    Most            similar


                         Query: New object
                         Task: Find the most
                         similar one in the dataset
Yury Lifshits, Shengyu Zhang    ()Combinatorial Nearest Neighbors             2 / 29
Similarity Search


Search space: object domain U, similarity
function σ
Input: database S = {p1 , . . . , pn } ⊆ U
Query: q ∈ U
Task: find argmax σ(pi , q)




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   3 / 29
Nearest Neighbors in Theory
              Orchard’s Algorithm k-d-B tree
     Sphere Rectangle Tree

 Geometric near-neighbor access tree Excluded
   middle vantage point forest                              mvp-tree Fixed-height
                                                       Vantage-point
     fixed-queries tree AESA
     tree  LAESA R∗ -tree Burkhard-Keller tree BBD tree
    Navigating Nets Voronoi tree Balanced aspect ratio
                                                              M-tree
                      tree                      vps -tree
                                 Metric tree

         Locality-Sensitive Hashing                                        SS-tree
             R-tree Spatial approximation tree
                                                                       mb-tree Cover
    Multi-vantage point tree                        Bisector tree

tree                           Generalized hyperplane tree
            Hybrid tree                                                         Slim tree

                                                               k-d tree
                                                X-tree
  Spill Tree Fixed queries tree                                                 Balltree

                   Quadtree Octree                            Post-office tree
Yury Lifshits, Shengyu Zhang       ()Combinatorial Nearest Neighbors                   4 / 29
Revision: Basic Assumptions

In theory:
    Triangle inequality
    Doubling dimension is o(log n)




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   5 / 29
Revision: Basic Assumptions

In theory:
    Triangle inequality
    Doubling dimension is o(log n)


Typical web dataset has separation effect

                                                   1/ 2 ≤ d(pi , pj ) ≤ 1
            For almost all i, j :




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors            5 / 29
Revision: Basic Assumptions

In theory:
    Triangle inequality
    Doubling dimension is o(log n)


Typical web dataset has separation effect

                                                   1/ 2 ≤ d(pi , pj ) ≤ 1
            For almost all i, j :

Classic methods fail:
    Branch and bound algorithms visit every object
    Doubling dimension is at least log n/ 2


Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors            5 / 29
Contribution
Navin Goyal, YL, Hinrich Schütze, WSDM 2008:
         Combinatorial framework: new approach to data
         mining problems that does not require triangle
         inequality
         Nearest neighbor algorithm




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   6 / 29
Contribution
Navin Goyal, YL, Hinrich Schütze, WSDM 2008:
         Combinatorial framework: new approach to data
         mining problems that does not require triangle
         inequality
         Nearest neighbor algorithm

This work:
         Better nearest neighbor search
         Detecting near-duplicates
         Navigability design for small worlds


Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   6 / 29
Outline


       Combinatorial Framework
 1




       New Algorithms
 2




       Combinatorial Nets
 3




       Directions for Further Research
 4




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   7 / 29
1
Combinatorial Framework




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   8 / 29
Comparison Oracle


         Dataset p1 , . . . , pn

         Objects and distance (or similarity)
         function are NOT given

         Instead, there is a comparison oracle
         answering queries of the form:

                    Who is closer to A: B or C?



Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   9 / 29
Disorder Inequality
Sort all objects by their similarity to p:
                rankp (r)

    p                                                              s
                               r

                                      rankp (s)




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors       10 / 29
Disorder Inequality
Sort all objects by their similarity to p:
                rankp (r)

    p                                                                 s
                                  r

                                           rankp (s)


                                                  Then by similarity to r:
                               rankr (s)

                                                                  s
    r




Yury Lifshits, Shengyu Zhang      ()Combinatorial Nearest Neighbors       10 / 29
Disorder Inequality
Sort all objects by their similarity to p:
                rankp (r)

    p                                                                 s
                                  r

                                           rankp (s)


                                                  Then by similarity to r:
                               rankr (s)

                                                                  s
    r



Dataset has disorder D if
 ∀p, r, s : rankr (s) ≤ D(rankp (r) + rankp (s))
Yury Lifshits, Shengyu Zhang      ()Combinatorial Nearest Neighbors       10 / 29
Combinatorial Framework
                                              =
                    Comparison oracle
                  Who is closer to A: B or C?
                                              +
                  Disorder inequality
           rankr (s) ≤ D(rankp(r) + rankp(s))

Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   11 / 29
Combinatorial Framework: FAQ



         Disorder of a metric space? Disorder of
         Rk ?
         In what cases disorder is relatively small?

         Experimental values of D for some
         practical datasets?




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   12 / 29
Disorder vs. Others


         If expansion rate is c, disorder constant is
         at most c2

         Doubling dimension and disorder
         dimension are incomparable

         Disorder inequality implies combinatorial
         form of “doubling effect”



Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   13 / 29
Combinatorial Framework: Pro & Contra



Advantages:
         Does not require triangle inequality for distances
         Applicable to any data model and any similarity
         function
         Require only comparative training information




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   14 / 29
Combinatorial Framework: Pro & Contra



Advantages:
         Does not require triangle inequality for distances
         Applicable to any data model and any similarity
         function
         Require only comparative training information


Limitation: worst-case form of disorder inequality




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   14 / 29
2
New Algorithms




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   15 / 29
Nearest Neighbor Search

Assume S ∪ {q} has disorder constant D

Theorem
There is a deterministic and exact algorithm
for nearest neighbor search:
         Preprocessing: O(D7 n log2 n)
         Search: O(D4 log n)



Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   16 / 29
Near-Duplicates
Assume, comparison oracle can also tell us
whether σ(x, y) > T for some similarity
threshold T

Theorem
All pairs with over-T similarity can be found
deterministically in time

                        poly(D)(n log2 n + |Output|)




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   17 / 29
Visibility Graph

Theorem
For any dataset S with disorder D there exists
a visibility graph:
         poly(D)n log2 n construction time
         O(D4 log n) out-degrees
         Naïve greedy routing
         deterministically reaches
         exact nearest neighbor of the given target q
         in at most log n steps



Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   18 / 29
q
p1
p2




          q
p1
p2




          q
p1
p2


          p3


               q
p1
p2


          p3


               q
p1
p2


                                                                         p3

                                                                              p4
                                                                              q
                               p1


Yury Lifshits, Shengyu Zhang    ()Combinatorial Nearest Neighbors                  19 / 29
3
Combinatorial Nets




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   20 / 29
Combinatorial Ball

                         B(x, r) = {y : rankx (y) < r}

In other words, it is a subset of dataset S: the
object x itself and r − 1 its nearest neighbors




                                                             B(x, 10)
                                x



Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors        21 / 29
Combinatorial Net
A subset R ⊆ S is called a combinatorial
r-net iff the following two properties holds:
Covering: ∀y ∈ S, ∃x ∈ R, s.t. rankx (y) < r.
Separation: ∀xi , xj ∈ R, rankxi (xj ) ≥ r OR rankxj (xi ) ≥ r




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   22 / 29
Combinatorial Net
A subset R ⊆ S is called a combinatorial
r-net iff the following two properties holds:
Covering: ∀y ∈ S, ∃x ∈ R, s.t. rankx (y) < r.
Separation: ∀xi , xj ∈ R, rankxi (xj ) ≥ r OR rankxj (xi ) ≥ r




          How to construct a combinatorial net?
     What upper bound on its size can we guarantee?
Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   22 / 29
Basic Data Structure
Combinatorial nets:
                                                                   n
 For every 0 ≤ i ≤ log n, construct a                                 -net
                                                                   2i




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors             23 / 29
Basic Data Structure
Combinatorial nets:
                                                                   n
 For every 0 ≤ i ≤ log n, construct a                                 -net
                                                                   2i

Pointers, pointers, pointers:
         Direct & inverted indices: links between centers
         and members of their balls
         Cousin links: for every center keep pointers to
         close centers on the same level
         Navigation links: for every center keep pointers to
         close centers on the next level


Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors             23 / 29
Fast Net Construction




Theorem
Combinatorial nets can be constructed in
O(D7 n log2 n) time

Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   24 / 29
Up’n’Down Trick

Assume your have 2r-net for the dataset

To compute an r-ball around some object p:
         Take a center p of 2r ball that is covering p
    1



         Take all centers of 2r-balls nearby p
    2



         For all of them write down all members of theirs
    3

         2r-balls
         Sort all written objects with respect to p and keep r
    4

         most similar ones.



Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   25 / 29
4
Directions for Further Research




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   26 / 29
Future of Combinatorial Framework
         Other problems in combinatorial framework:
                  Low-distortion embeddings
                  Closest pairs
                  Community discovery
                  Linear arrangement
                  Distance labelling
                  Dimensionality reduction
         What if disorder inequality has exceptions?
         Insertions, deletions, changing metric
         Experiments & implementation
         Unification challenge: disorder + doubling = ?
Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   27 / 29
Summary
         Combinatorial framework:
         comparison oracle + disorder inequality
         New algorithms:
         Nearest neighbor search
         Deterministic detection of near-duplicates
         Navigability design




Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   28 / 29
Summary
         Combinatorial framework:
         comparison oracle + disorder inequality
         New algorithms:
         Nearest neighbor search
         Deterministic detection of near-duplicates
         Navigability design



          Thanks for your attention!
                 Questions?
Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors   28 / 29
Links
http://yury.name

http://simsearch.yury.name
Tutorial, bibliography, people, links, open problems


       Yury Lifshits and Shengyu Zhang
       Combinatorial Algorithms for Nearest Neighbors, Near-Duplicates and
       Small-World Design
       http://yury.name/papers/lifshits2008similarity.pdf

       Navin Goyal, Yury Lifshits, Hinrich Schütze
       Disorder Inequality: A Combinatorial Approach to Nearest Neighbor Search
       http://yury.name/papers/goyal2008disorder.pdf

       Benjamin Hoffmann, Yury Lifshits, Dirk Novotka
       Maximal Intersection Queries in Randomized Graph Models
       http://yury.name/papers/hoffmann2007maximal.pdf

Yury Lifshits, Shengyu Zhang   ()Combinatorial Nearest Neighbors                  29 / 29

Contenu connexe

Similaire à Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small World Design - Yury Lifshits - SODA 2009

Simulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsSimulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsDavid Gleich
 
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...Yandex
 
Class 11: Deeper List Procedures
Class 11: Deeper List ProceduresClass 11: Deeper List Procedures
Class 11: Deeper List ProceduresDavid Evans
 
Computing Local and Global Centrality
Computing Local and Global CentralityComputing Local and Global Centrality
Computing Local and Global CentralityDavid Gleich
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity ResolutionBenjamin Bengfort
 
Prolog Cpt114 - Week 2
Prolog Cpt114 - Week 2Prolog Cpt114 - Week 2
Prolog Cpt114 - Week 2a_akhavan
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreDavid Gleich
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2zukun
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
[ICDE 2012] On Top-k Structural Similarity Search
[ICDE 2012] On Top-k Structural Similarity Search[ICDE 2012] On Top-k Structural Similarity Search
[ICDE 2012] On Top-k Structural Similarity SearchPei Lee
 
Similarity Search in High Dimensions via Hashing
Similarity Search in High Dimensions via HashingSimilarity Search in High Dimensions via Hashing
Similarity Search in High Dimensions via HashingMaruf Aytekin
 
Chapter 10 link prediction
Chapter 10 link predictionChapter 10 link prediction
Chapter 10 link predictionAbanobZakaria1
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 

Similaire à Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small World Design - Yury Lifshits - SODA 2009 (20)

Simulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific DatasetsSimulation Informatics; Analyzing Large Scientific Datasets
Simulation Informatics; Analyzing Large Scientific Datasets
 
Drugs and Electrons
Drugs and ElectronsDrugs and Electrons
Drugs and Electrons
 
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...
Nelly Litvak – Asymptotic behaviour of ranking algorithms in directed random ...
 
Class 11: Deeper List Procedures
Class 11: Deeper List ProceduresClass 11: Deeper List Procedures
Class 11: Deeper List Procedures
 
Pertemuan 5_Relation Matriks_01 (17)
Pertemuan 5_Relation Matriks_01 (17)Pertemuan 5_Relation Matriks_01 (17)
Pertemuan 5_Relation Matriks_01 (17)
 
Computing Local and Global Centrality
Computing Local and Global CentralityComputing Local and Global Centrality
Computing Local and Global Centrality
 
A Primer on Entity Resolution
A Primer on Entity ResolutionA Primer on Entity Resolution
A Primer on Entity Resolution
 
Prolog Cpt114 - Week 2
Prolog Cpt114 - Week 2Prolog Cpt114 - Week 2
Prolog Cpt114 - Week 2
 
Fast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and moreFast matrix primitives for ranking, link-prediction and more
Fast matrix primitives for ranking, link-prediction and more
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2
 
Slides4
Slides4Slides4
Slides4
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Spectral graph theory
Spectral graph theorySpectral graph theory
Spectral graph theory
 
[ICDE 2012] On Top-k Structural Similarity Search
[ICDE 2012] On Top-k Structural Similarity Search[ICDE 2012] On Top-k Structural Similarity Search
[ICDE 2012] On Top-k Structural Similarity Search
 
Similarity Search in High Dimensions via Hashing
Similarity Search in High Dimensions via HashingSimilarity Search in High Dimensions via Hashing
Similarity Search in High Dimensions via Hashing
 
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
MUMS Opening Workshop - Emulators for models and Complexity Reduction - Akil ...
 
pres_coconat
pres_coconatpres_coconat
pres_coconat
 
Chapter 10 link prediction
Chapter 10 link predictionChapter 10 link prediction
Chapter 10 link prediction
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 

Plus de Yury Lifshits

Osh — Curiosity Learning on Mobile
Osh — Curiosity Learning on MobileOsh — Curiosity Learning on Mobile
Osh — Curiosity Learning on MobileYury Lifshits
 
Maester — The First Platform for Lead Generation on Mobile
Maester — The First Platform for Lead Generation on MobileMaester — The First Platform for Lead Generation on Mobile
Maester — The First Platform for Lead Generation on MobileYury Lifshits
 
Earlydays: План построения успешной компании
Earlydays: План построения успешной компанииEarlydays: План построения успешной компании
Earlydays: План построения успешной компанииYury Lifshits
 
Екатерина Карпова: Опыт организации Ночи Музеев
Екатерина Карпова: Опыт организации Ночи МузеевЕкатерина Карпова: Опыт организации Ночи Музеев
Екатерина Карпова: Опыт организации Ночи МузеевYury Lifshits
 
Evolution of Two Sided Markets - Yury Lifshits - WSDM 2010
Evolution of  Two Sided Markets - Yury Lifshits - WSDM 2010Evolution of  Two Sided Markets - Yury Lifshits - WSDM 2010
Evolution of Two Sided Markets - Yury Lifshits - WSDM 2010Yury Lifshits
 
Interfaces of Attention: What if people will outsource management of their at...
Interfaces of Attention: What if people will outsource management of their at...Interfaces of Attention: What if people will outsource management of their at...
Interfaces of Attention: What if people will outsource management of their at...Yury Lifshits
 
Data Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! ResearchData Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! ResearchYury Lifshits
 
Future of Search | Yury Lifshits, Yahoo! Research
Future of Search | Yury Lifshits, Yahoo! ResearchFuture of Search | Yury Lifshits, Yahoo! Research
Future of Search | Yury Lifshits, Yahoo! ResearchYury Lifshits
 
"Economics of Web Design" by Yury Lifshits
"Economics of Web Design" by Yury Lifshits"Economics of Web Design" by Yury Lifshits
"Economics of Web Design" by Yury LifshitsYury Lifshits
 
Social Design (Stanford version)
Social Design (Stanford version)Social Design (Stanford version)
Social Design (Stanford version)Yury Lifshits
 
The Architecture of the Web
The Architecture of the WebThe Architecture of the Web
The Architecture of the WebYury Lifshits
 
Reputation Systems I
Reputation Systems IReputation Systems I
Reputation Systems IYury Lifshits
 
Reputation Systems II
Reputation Systems IIReputation Systems II
Reputation Systems IIYury Lifshits
 
Business-Consumer Networks: Concept and Challenges
Business-Consumer Networks: Concept and ChallengesBusiness-Consumer Networks: Concept and Challenges
Business-Consumer Networks: Concept and ChallengesYury Lifshits
 
Business-Consumer Networks. Project Proposal by Yury Lifshits
Business-Consumer Networks. Project Proposal by Yury LifshitsBusiness-Consumer Networks. Project Proposal by Yury Lifshits
Business-Consumer Networks. Project Proposal by Yury LifshitsYury Lifshits
 

Plus de Yury Lifshits (17)

Osh — Curiosity Learning on Mobile
Osh — Curiosity Learning on MobileOsh — Curiosity Learning on Mobile
Osh — Curiosity Learning on Mobile
 
Maester — The First Platform for Lead Generation on Mobile
Maester — The First Platform for Lead Generation on MobileMaester — The First Platform for Lead Generation on Mobile
Maester — The First Platform for Lead Generation on Mobile
 
Earlydays: План построения успешной компании
Earlydays: План построения успешной компанииEarlydays: План построения успешной компании
Earlydays: План построения успешной компании
 
Екатерина Карпова: Опыт организации Ночи Музеев
Екатерина Карпова: Опыт организации Ночи МузеевЕкатерина Карпова: Опыт организации Ночи Музеев
Екатерина Карпова: Опыт организации Ночи Музеев
 
Evolution of Two Sided Markets - Yury Lifshits - WSDM 2010
Evolution of  Two Sided Markets - Yury Lifshits - WSDM 2010Evolution of  Two Sided Markets - Yury Lifshits - WSDM 2010
Evolution of Two Sided Markets - Yury Lifshits - WSDM 2010
 
New Advertising
New AdvertisingNew Advertising
New Advertising
 
Interfaces of Attention: What if people will outsource management of their at...
Interfaces of Attention: What if people will outsource management of their at...Interfaces of Attention: What if people will outsource management of their at...
Interfaces of Attention: What if people will outsource management of their at...
 
Data Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! ResearchData Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! Research
 
Future of Search | Yury Lifshits, Yahoo! Research
Future of Search | Yury Lifshits, Yahoo! ResearchFuture of Search | Yury Lifshits, Yahoo! Research
Future of Search | Yury Lifshits, Yahoo! Research
 
"Economics of Web Design" by Yury Lifshits
"Economics of Web Design" by Yury Lifshits"Economics of Web Design" by Yury Lifshits
"Economics of Web Design" by Yury Lifshits
 
Social Design (Stanford version)
Social Design (Stanford version)Social Design (Stanford version)
Social Design (Stanford version)
 
Social Design
Social DesignSocial Design
Social Design
 
The Architecture of the Web
The Architecture of the WebThe Architecture of the Web
The Architecture of the Web
 
Reputation Systems I
Reputation Systems IReputation Systems I
Reputation Systems I
 
Reputation Systems II
Reputation Systems IIReputation Systems II
Reputation Systems II
 
Business-Consumer Networks: Concept and Challenges
Business-Consumer Networks: Concept and ChallengesBusiness-Consumer Networks: Concept and Challenges
Business-Consumer Networks: Concept and Challenges
 
Business-Consumer Networks. Project Proposal by Yury Lifshits
Business-Consumer Networks. Project Proposal by Yury LifshitsBusiness-Consumer Networks. Project Proposal by Yury Lifshits
Business-Consumer Networks. Project Proposal by Yury Lifshits
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Dernier (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Cobinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small World Design - Yury Lifshits - SODA 2009

  • 1. Combinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small-World Design Yury Lifshits Yahoo! Research Shengyu Zhang The Chinese University of Hong Kong SODA 2009 Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 1 / 29
  • 2. Similarity Search: an Example Input: Set of objects Task: Preprocess it
  • 3. Similarity Search: an Example Input: Set of objects Task: Preprocess it Query: New object Task: Find the most similar one in the dataset
  • 4. Similarity Search: an Example Input: Set of objects Task: Preprocess it Most similar Query: New object Task: Find the most similar one in the dataset Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 2 / 29
  • 5. Similarity Search Search space: object domain U, similarity function σ Input: database S = {p1 , . . . , pn } ⊆ U Query: q ∈ U Task: find argmax σ(pi , q) Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 3 / 29
  • 6. Nearest Neighbors in Theory Orchard’s Algorithm k-d-B tree Sphere Rectangle Tree Geometric near-neighbor access tree Excluded middle vantage point forest mvp-tree Fixed-height Vantage-point fixed-queries tree AESA tree LAESA R∗ -tree Burkhard-Keller tree BBD tree Navigating Nets Voronoi tree Balanced aspect ratio M-tree tree vps -tree Metric tree Locality-Sensitive Hashing SS-tree R-tree Spatial approximation tree mb-tree Cover Multi-vantage point tree Bisector tree tree Generalized hyperplane tree Hybrid tree Slim tree k-d tree X-tree Spill Tree Fixed queries tree Balltree Quadtree Octree Post-office tree Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 4 / 29
  • 7. Revision: Basic Assumptions In theory: Triangle inequality Doubling dimension is o(log n) Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 5 / 29
  • 8. Revision: Basic Assumptions In theory: Triangle inequality Doubling dimension is o(log n) Typical web dataset has separation effect 1/ 2 ≤ d(pi , pj ) ≤ 1 For almost all i, j : Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 5 / 29
  • 9. Revision: Basic Assumptions In theory: Triangle inequality Doubling dimension is o(log n) Typical web dataset has separation effect 1/ 2 ≤ d(pi , pj ) ≤ 1 For almost all i, j : Classic methods fail: Branch and bound algorithms visit every object Doubling dimension is at least log n/ 2 Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 5 / 29
  • 10. Contribution Navin Goyal, YL, Hinrich Schütze, WSDM 2008: Combinatorial framework: new approach to data mining problems that does not require triangle inequality Nearest neighbor algorithm Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 6 / 29
  • 11. Contribution Navin Goyal, YL, Hinrich Schütze, WSDM 2008: Combinatorial framework: new approach to data mining problems that does not require triangle inequality Nearest neighbor algorithm This work: Better nearest neighbor search Detecting near-duplicates Navigability design for small worlds Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 6 / 29
  • 12. Outline Combinatorial Framework 1 New Algorithms 2 Combinatorial Nets 3 Directions for Further Research 4 Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 7 / 29
  • 13. 1 Combinatorial Framework Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 8 / 29
  • 14. Comparison Oracle Dataset p1 , . . . , pn Objects and distance (or similarity) function are NOT given Instead, there is a comparison oracle answering queries of the form: Who is closer to A: B or C? Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 9 / 29
  • 15. Disorder Inequality Sort all objects by their similarity to p: rankp (r) p s r rankp (s) Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 10 / 29
  • 16. Disorder Inequality Sort all objects by their similarity to p: rankp (r) p s r rankp (s) Then by similarity to r: rankr (s) s r Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 10 / 29
  • 17. Disorder Inequality Sort all objects by their similarity to p: rankp (r) p s r rankp (s) Then by similarity to r: rankr (s) s r Dataset has disorder D if ∀p, r, s : rankr (s) ≤ D(rankp (r) + rankp (s)) Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 10 / 29
  • 18. Combinatorial Framework = Comparison oracle Who is closer to A: B or C? + Disorder inequality rankr (s) ≤ D(rankp(r) + rankp(s)) Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 11 / 29
  • 19. Combinatorial Framework: FAQ Disorder of a metric space? Disorder of Rk ? In what cases disorder is relatively small? Experimental values of D for some practical datasets? Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 12 / 29
  • 20. Disorder vs. Others If expansion rate is c, disorder constant is at most c2 Doubling dimension and disorder dimension are incomparable Disorder inequality implies combinatorial form of “doubling effect” Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 13 / 29
  • 21. Combinatorial Framework: Pro & Contra Advantages: Does not require triangle inequality for distances Applicable to any data model and any similarity function Require only comparative training information Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 14 / 29
  • 22. Combinatorial Framework: Pro & Contra Advantages: Does not require triangle inequality for distances Applicable to any data model and any similarity function Require only comparative training information Limitation: worst-case form of disorder inequality Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 14 / 29
  • 23. 2 New Algorithms Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 15 / 29
  • 24. Nearest Neighbor Search Assume S ∪ {q} has disorder constant D Theorem There is a deterministic and exact algorithm for nearest neighbor search: Preprocessing: O(D7 n log2 n) Search: O(D4 log n) Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 16 / 29
  • 25. Near-Duplicates Assume, comparison oracle can also tell us whether σ(x, y) > T for some similarity threshold T Theorem All pairs with over-T similarity can be found deterministically in time poly(D)(n log2 n + |Output|) Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 17 / 29
  • 26. Visibility Graph Theorem For any dataset S with disorder D there exists a visibility graph: poly(D)n log2 n construction time O(D4 log n) out-degrees Naïve greedy routing deterministically reaches exact nearest neighbor of the given target q in at most log n steps Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 18 / 29
  • 27. q p1
  • 28. p2 q p1
  • 29. p2 q p1
  • 30. p2 p3 q p1
  • 31. p2 p3 q p1
  • 32. p2 p3 p4 q p1 Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 19 / 29
  • 33. 3 Combinatorial Nets Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 20 / 29
  • 34. Combinatorial Ball B(x, r) = {y : rankx (y) < r} In other words, it is a subset of dataset S: the object x itself and r − 1 its nearest neighbors B(x, 10) x Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 21 / 29
  • 35. Combinatorial Net A subset R ⊆ S is called a combinatorial r-net iff the following two properties holds: Covering: ∀y ∈ S, ∃x ∈ R, s.t. rankx (y) < r. Separation: ∀xi , xj ∈ R, rankxi (xj ) ≥ r OR rankxj (xi ) ≥ r Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 22 / 29
  • 36. Combinatorial Net A subset R ⊆ S is called a combinatorial r-net iff the following two properties holds: Covering: ∀y ∈ S, ∃x ∈ R, s.t. rankx (y) < r. Separation: ∀xi , xj ∈ R, rankxi (xj ) ≥ r OR rankxj (xi ) ≥ r How to construct a combinatorial net? What upper bound on its size can we guarantee? Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 22 / 29
  • 37. Basic Data Structure Combinatorial nets: n For every 0 ≤ i ≤ log n, construct a -net 2i Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 23 / 29
  • 38. Basic Data Structure Combinatorial nets: n For every 0 ≤ i ≤ log n, construct a -net 2i Pointers, pointers, pointers: Direct & inverted indices: links between centers and members of their balls Cousin links: for every center keep pointers to close centers on the same level Navigation links: for every center keep pointers to close centers on the next level Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 23 / 29
  • 39. Fast Net Construction Theorem Combinatorial nets can be constructed in O(D7 n log2 n) time Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 24 / 29
  • 40. Up’n’Down Trick Assume your have 2r-net for the dataset To compute an r-ball around some object p: Take a center p of 2r ball that is covering p 1 Take all centers of 2r-balls nearby p 2 For all of them write down all members of theirs 3 2r-balls Sort all written objects with respect to p and keep r 4 most similar ones. Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 25 / 29
  • 41. 4 Directions for Further Research Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 26 / 29
  • 42. Future of Combinatorial Framework Other problems in combinatorial framework: Low-distortion embeddings Closest pairs Community discovery Linear arrangement Distance labelling Dimensionality reduction What if disorder inequality has exceptions? Insertions, deletions, changing metric Experiments & implementation Unification challenge: disorder + doubling = ? Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 27 / 29
  • 43. Summary Combinatorial framework: comparison oracle + disorder inequality New algorithms: Nearest neighbor search Deterministic detection of near-duplicates Navigability design Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 28 / 29
  • 44. Summary Combinatorial framework: comparison oracle + disorder inequality New algorithms: Nearest neighbor search Deterministic detection of near-duplicates Navigability design Thanks for your attention! Questions? Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 28 / 29
  • 45. Links http://yury.name http://simsearch.yury.name Tutorial, bibliography, people, links, open problems Yury Lifshits and Shengyu Zhang Combinatorial Algorithms for Nearest Neighbors, Near-Duplicates and Small-World Design http://yury.name/papers/lifshits2008similarity.pdf Navin Goyal, Yury Lifshits, Hinrich Schütze Disorder Inequality: A Combinatorial Approach to Nearest Neighbor Search http://yury.name/papers/goyal2008disorder.pdf Benjamin Hoffmann, Yury Lifshits, Dirk Novotka Maximal Intersection Queries in Randomized Graph Models http://yury.name/papers/hoffmann2007maximal.pdf Yury Lifshits, Shengyu Zhang ()Combinatorial Nearest Neighbors 29 / 29