SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Accelera'ng 
Local 
Search 
With 
PostgreSQL 
9.1 
Jonathan 
S. 
Katz 
Exco 
Ventures 
PGConf.eu 
2011 
– 
Oct 
21, 
2011
Local 
Search? 
• Not 
necessarily 
loca'on 
based 
on 
places 
• “How 
close 
are 
two 
en''es 
to 
one 
another?” 
• “What 
are 
the 
closest 
en''es 
to 
me?” 
• “What 
are 
my 
nearest 
neighbors?” 
2
Nearest 
Neighbor 
Overview 
• Want 
to 
know 
“how 
similar” 
objects 
are 
rela've 
to 
each 
other 
– What 
are 
the 
top 
“k” 
choices 
near 
me? 
• Need 
to 
define 
a 
“metric” 
for 
similarity 
– “distance” 
3
K-­‐Nearest 
Neighbor 
• Given 
a 
collec'on 
of 
n 
objects 
• When 
trying 
to 
classify 
an 
unknown 
object 
– compute 
the 
distance 
between 
all 
known 
objects 
– find 
the 
k 
(k 
≥ 
1) 
closest 
objects 
to 
the 
unknown 
object 
– classify 
the 
object 
based 
on 
class 
of 
k 
closest 
objects 
• When 
k=1, 
then 
unknown 
object 
is 
given 
same 
classifica'on 
as 
object 
it 
is 
closest 
to 
4
K=1 
Example 
Voronoi 
Diagram 
of 
order 
1 
can 
be 
used 
to 
make 
k=1 
NN 
queries 
5
Just 
a 
bit 
more 
theory… 
• Voronoi 
diagrams 
of 
order-­‐1 
are 
created 
in 
O(n 
log 
n) 
'me 
– Looks 
similar 
to…? 
:-­‐) 
• Queried 
in 
O(1) 
'me 
• Therefore: 
– Pay 
the 
'me 
penalty 
to 
build 
the 
index 
– Query 
against 
index 
quickly 
6
Applica'ons 
• Geoloca'on 
+ 
Op'mizing 
Posi'oning 
• Classifica'on 
• Similarity 
• Recommenda'on 
systems 
• Content-­‐based 
image 
retrieval 
• etc. 
7
So 
what 
about 
PostgreSQL? 
• As 
of 
PostgreSQL 
9.0 
– supports 
geometric 
types 
and 
distances 
• Points, 
circles, 
lines, 
boxes, 
polygons 
• Distance 
operator: 
<-­‐> 
– pg_trgm 
– 
supplied 
module 
for 
determining 
text 
similarity 
• similarity(“abc”, 
“ade”) 
computes 
similarity 
score 
• <-­‐> 
defines 
distance 
(opposite 
if 
similarity), 
not 
defined 
(in 
9.0) 
8
PostgreSQL 
9.1: 
KNN-­‐GiST 
• Let 
n 
= 
size 
of 
a 
table 
• Can 
index 
data 
that 
provides 
a 
“<-­‐>” 
(distance) 
operator 
– Geometric 
– pg_trgm 
• “k” 
= 
LIMIT 
clause 
• Known 
inefficiencies 
when 
k=n 
and 
n 
is 
small 
9
Example: 
pg_trgm 
• Data: 
– List 
of 
1,000,000 
names 
– 
700,000 
unique 
– n 
= 
1,000,000 
• Indexes: 
– CREATE 
INDEX 
names_name_idx 
ON 
names 
(name) 
– CREATE 
INDEX 
trgm_idx 
ON 
names 
USING 
gist 
(name 
gist_trgm_ops) 
• k=10 
• Displaying 
query 
plan 
/ 
execu'on 
'me 
aqer 
10 
runs 
10
pg_trgm: 
9.0 
EXPLAIN 
ANALYZE 
SELECT 
name, 
similarity(name, 
'jon') 
AS 
sim 
FROM 
names 
WHERE 
name 
% 
'jon' 
ORDER 
BY 
sim 
DESC 
LIMIT 
10; 
11
pg_trgm: 
9.0 
Limit 
(cost=2724.95..2724.98 
rows=10 
width=14) 
(actual 
'me=192.793..192.794 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=2724.95..2727.45 
rows=1000 
width=14) 
(actual 
'me=192.790..192.791 
rows=10 
loops=1) 
Sort 
Key: 
(similarity(name, 
'jon'::text)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Bitmap 
Heap 
Scan 
on 
names 
(cost=56.47..2703.34 
rows=1000 
width=14) 
(actual 
'me=188.836..192.499 
rows=865 
loops=1) 
Recheck 
Cond: 
(name 
% 
'jon'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
trgm_idx 
(cost=0.00..56.22 
rows=1000 
width=0) 
(actual 
'me=188.652..188.652 
rows=865 
loops=1) 
Index 
Cond: 
(name 
% 
'jon'::text) 
Total 
run'me: 
192.881 
ms 
12
pg_trgm: 
9.1 
EXPLAIN 
ANALYZE 
SELECT 
name, 
similarity(name, 
'jon') 
AS 
sim 
FROM 
names 
WHERE 
name 
% 
'jon' 
ORDER 
BY 
sim 
DESC 
LIMIT 
10; 
13
pg_trgm 
9.1 
Limit 
(cost=2720.91..2720.93 
rows=10 
width=14) 
(actual 
'me=202.022..202.023 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=2720.91..2723.41 
rows=1000 
width=14) 
(actual 
'me=202.020..202.021 
rows=10 
loops=1) 
Sort 
Key: 
(similarity(name, 
'jon'::text)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Bitmap 
Heap 
Scan 
on 
names 
(cost=52.43..2699.30 
rows=1000 
width=14) 
(actual 
'me=198.324..201.719 
rows=865 
loops=1) 
Recheck 
Cond: 
(name 
% 
'jon'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
names_trgm_idx 
(cost=0.00..52.18 
rows=1000 
width=0) 
(actual 
'me=198.156..198.156 
rows=865 
loops=1) 
Index 
Cond: 
(name 
% 
'jon'::text) 
Total 
run1me: 
202.113 
ms 
14
Comparable? 
• Seems 
to 
be 
similar 
– Need 
to 
do 
more 
research 
why 
– 
anyone? 
• However, 
9.1 
offers 
improvements 
for 
LIKE/ 
ILIKE 
search 
with 
pg_trgm 
15
LIKE/ILIKE 
EXPLAIN 
ANALYZE 
SELECT 
name 
FROM 
names 
WHERE 
name 
LIKE 
'%ata%n'; 
16
LIKE/ILIKE 
pg_trgm: 
9.0 
vs 
9.1 
Seq 
Scan 
on 
names 
(cost=0.00..18717.00 
rows=99 
width=14) 
(actual 
'me=0.339..205.659 
rows=665 
loops=1) 
Filter: 
(name 
~~ 
'%ata%n'::text) 
Total 
run1me: 
205.743 
ms 
Bitmap 
Heap 
Scan 
on 
names 
(cost=9.45..369.20 
rows=99 
width=14) 
(actual 
'me=122.494..125.967 
rows=665 
loops=1) 
Recheck 
Cond: 
(name 
~~ 
'%ata%n'::text) 
-­‐> 
Bitmap 
Index 
Scan 
on 
names_trgm_idx 
(cost=0.00..9.42 
rows=99 
width=0) 
(actual 
'me=121.972..121.972 
rows=3551 
loops=1) 
Index 
Cond: 
(name 
~~ 
'%ata%n'::text) 
Total 
run1me: 
126.065 
ms 
17
Geometry 
• Data: 
– 2,000,000 
points, 
from 
(0,0) 
-­‐> 
(10000, 
10000) 
• Index: 
– CREATE 
INDEX 
geoloc_coord_idx 
ON 
geoloc 
USING 
gist 
(coord); 
18
Geometry 
EXPLAIN 
ANALYZE 
SELECT 
*, 
coord 
<-­‐> 
point(500,500) 
FROM 
geoloc 
ORDER 
BY 
coord 
<-­‐> 
point(500,500) 
LIMIT 
10; 
19
Geometry: 
9.0 
vs 
9.1 
Limit 
(cost=80958.28..80958.31 
rows=10 
width=20) 
(actual 
'me=1035.313..1035.316 
rows=10 
loops=1) 
-­‐> 
Sort 
(cost=80958.28..85958.28 
rows=2000000 
width=20) 
(actual 
'me=1035.312..1035.314 
rows=10 
loops=1) 
Sort 
Key: 
((coord 
<-­‐> 
'(500,500)'::point)) 
Sort 
Method: 
top-­‐N 
heapsort 
Memory: 
25kB 
-­‐> 
Seq 
Scan 
on 
geoloc 
(cost=0.00..37739.00 
rows=2000000 
width=20) 
(actual 
'me=0.029..569.501 
rows=2000000 
loops=1) 
Total 
run1me: 
1035.349 
ms 
Limit 
(cost=0.00..0.81 
rows=10 
width=20) 
(actual 
'me=0.576..1.255 
rows=10 
loops=1) 
-­‐> 
Index 
Scan 
using 
geoloc_coord_idx 
on 
geoloc 
(cost=0.00..162068.96 
rows=2000000 
width=20) 
(actual 
'me=0.575..1.251 
rows=10 
loops=1) 
Order 
By: 
(coord 
<-­‐> 
'(500,500)'::point) 
Total 
run1me: 
1.391 
ms 
20
Applica'on 
Examples 
• Proximity 
map 
search 
– 
fast! 
21
Drawbacks 
• Performance 
benefits 
are 
limited 
when: 
– LIMIT 
is 
close 
to 
size 
of 
data 
set 
and 
data 
set 
is 
large 
– Data 
set 
is 
small 
• Time 
to 
build 
index 
– High 
transac'on 
table 
22
Conclusions 
• GiST: 
“Generalized 
Search 
Tree” 
– 
index 
is 
there, 
up 
to 
developers 
to 
define 
access 
methods 
of 
data 
types 
– e.g. 
yields 
KNN-­‐GiST 
• Different 
types 
of 
applica'ons 
can 
be 
built 
– 
performance 
enhancements 
• Next 
steps? 
23
My 
Wish 
List 
• Further 
geometric-­‐type 
support 
in 
Postgres 
– N-­‐dimensional 
points 
– ‘=‘ 
operator 
for 
point 
type 
– (PostGIS 
s'll 
champion 
of 
complex 
geometric 
+ 
geographic 
data 
types) 
• Define 
“distance” 
over 
mul'columns 
with 
different 
types? 
– SELECT 
(a.name, 
a.geocode) 
<-­‐> 
(b.name, 
b.geocode) 
FROM 
x 
a, 
x 
b; 
24
References 
• Oleg 
Bartunov 
– 
“Efficient 
K-­‐nearest 
neighbour 
search 
in 
PostgreSQL” 
(h~p://www.sai.msu.su/~megera/ 
postgres/talks/pgday-­‐2010.pdf) 
• Oleg 
Bartunov 
and 
Teodor 
Sigaev 
for 
work 
on 
KNN-­‐ 
GiST 
and 
notes 
on 
pg_trgm 
(h~p:// 
developer.postgresql.org/pgdocs/postgres/ 
pgtrgm.html) 
• Hubert 
“depesz” 
Lubaczewski 
– 
pa~erned 
benchmarks 
off 
of 
his 
work 
-­‐ 
h~p://www.depesz.com/index.php/ 
2010/12/11/wai'ng-­‐for-­‐9-­‐1-­‐knngist/ 
25
Contact 
• jonathan.katz@excoventures.com 
• @jkatz05 
• Feedback 
Please! 
– h~p://2011.pgconf.eu/feedback 
26

Contenu connexe

Tendances

MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)Jean-François Gagné
 
RPC에서 REST까지 간단한 개념소개
RPC에서 REST까지 간단한 개념소개RPC에서 REST까지 간단한 개념소개
RPC에서 REST까지 간단한 개념소개Wonchang Song
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDatabricks
 
Using Wildcards with rsyslog's File Monitor imfile
Using Wildcards with rsyslog's File Monitor imfileUsing Wildcards with rsyslog's File Monitor imfile
Using Wildcards with rsyslog's File Monitor imfileRainer Gerhards
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)YoungHeon (Roy) Kim
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaMetrics
 
How the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksHow the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksEDB
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るPreferred Networks
 
Visual Studio를 이용한 어셈블리어 학습 part 2
Visual Studio를 이용한 어셈블리어 학습 part 2Visual Studio를 이용한 어셈블리어 학습 part 2
Visual Studio를 이용한 어셈블리어 학습 part 2YEONG-CHEON YOU
 
오승준, 사회적 기술이 프로그래머 인생을 바꿔주는 이유, NDC2011
오승준, 사회적 기술이 프로그래머 인생을 바꿔주는 이유, NDC2011오승준, 사회적 기술이 프로그래머 인생을 바꿔주는 이유, NDC2011
오승준, 사회적 기술이 프로그래머 인생을 바꿔주는 이유, NDC2011devCAT Studio, NEXON
 
Windbg랑 친해지기
Windbg랑 친해지기Windbg랑 친해지기
Windbg랑 친해지기Ji Hun Kim
 
모두가 성장하는 스터디 만들기
모두가 성장하는 스터디 만들기모두가 성장하는 스터디 만들기
모두가 성장하는 스터디 만들기BYUNGHOKIM10
 
BHyVeってなんや
BHyVeってなんやBHyVeってなんや
BHyVeってなんやTakuya ASADA
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재PgDay.Seoul
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 
Continuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityContinuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityScyllaDB
 
1日5分でPostgreSQLに詳しくなるアプリの開発 ~PostgRESTを使ってみた~(第38回PostgreSQLアンカンファレンス@オンライン 発...
1日5分でPostgreSQLに詳しくなるアプリの開発 ~PostgRESTを使ってみた~(第38回PostgreSQLアンカンファレンス@オンライン 発...1日5分でPostgreSQLに詳しくなるアプリの開発 ~PostgRESTを使ってみた~(第38回PostgreSQLアンカンファレンス@オンライン 発...
1日5分でPostgreSQLに詳しくなるアプリの開発 ~PostgRESTを使ってみた~(第38回PostgreSQLアンカンファレンス@オンライン 発...NTT DATA Technology & Innovation
 
さくらのVPSに来る悪い人を観察する その2
さくらのVPSに来る悪い人を観察する その2さくらのVPSに来る悪い人を観察する その2
さくらのVPSに来る悪い人を観察する その2ozuma5119
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 

Tendances (20)

MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
 
RPC에서 REST까지 간단한 개념소개
RPC에서 REST까지 간단한 개념소개RPC에서 REST까지 간단한 개념소개
RPC에서 REST까지 간단한 개념소개
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
 
Using Wildcards with rsyslog's File Monitor imfile
Using Wildcards with rsyslog's File Monitor imfileUsing Wildcards with rsyslog's File Monitor imfile
Using Wildcards with rsyslog's File Monitor imfile
 
ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)ProxySQL & PXC(Query routing and Failover Test)
ProxySQL & PXC(Query routing and Failover Test)
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - Preview
 
How the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksHow the Postgres Query Optimizer Works
How the Postgres Query Optimizer Works
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
 
Visual Studio를 이용한 어셈블리어 학습 part 2
Visual Studio를 이용한 어셈블리어 학습 part 2Visual Studio를 이용한 어셈블리어 학습 part 2
Visual Studio를 이용한 어셈블리어 학습 part 2
 
오승준, 사회적 기술이 프로그래머 인생을 바꿔주는 이유, NDC2011
오승준, 사회적 기술이 프로그래머 인생을 바꿔주는 이유, NDC2011오승준, 사회적 기술이 프로그래머 인생을 바꿔주는 이유, NDC2011
오승준, 사회적 기술이 프로그래머 인생을 바꿔주는 이유, NDC2011
 
Windbg랑 친해지기
Windbg랑 친해지기Windbg랑 친해지기
Windbg랑 친해지기
 
모두가 성장하는 스터디 만들기
모두가 성장하는 스터디 만들기모두가 성장하는 스터디 만들기
모두가 성장하는 스터디 만들기
 
BHyVeってなんや
BHyVeってなんやBHyVeってなんや
BHyVeってなんや
 
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajpAt least onceってぶっちゃけ問題の先送りだったよね #kafkajp
At least onceってぶっちゃけ問題の先送りだったよね #kafkajp
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
Continuous Go Profiling & Observability
Continuous Go Profiling & ObservabilityContinuous Go Profiling & Observability
Continuous Go Profiling & Observability
 
1日5分でPostgreSQLに詳しくなるアプリの開発 ~PostgRESTを使ってみた~(第38回PostgreSQLアンカンファレンス@オンライン 発...
1日5分でPostgreSQLに詳しくなるアプリの開発 ~PostgRESTを使ってみた~(第38回PostgreSQLアンカンファレンス@オンライン 発...1日5分でPostgreSQLに詳しくなるアプリの開発 ~PostgRESTを使ってみた~(第38回PostgreSQLアンカンファレンス@オンライン 発...
1日5分でPostgreSQLに詳しくなるアプリの開発 ~PostgRESTを使ってみた~(第38回PostgreSQLアンカンファレンス@オンライン 発...
 
さくらのVPSに来る悪い人を観察する その2
さくらのVPSに来る悪い人を観察する その2さくらのVPSに来る悪い人を観察する その2
さくらのVPSに来る悪い人を観察する その2
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 

Similaire à Accelerating Local Search with PostgreSQL (KNN-Search)

MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Featuresajhannan
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSqlDmytro Shylovskyi
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesJonathan Katz
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practiceJano Suchal
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenPostgresOpen
 
The state of geo in ElasticSearch
The state of geo in ElasticSearchThe state of geo in ElasticSearch
The state of geo in ElasticSearchFan Robbin
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised HashingSean Moran
 
k-means Clustering and Custergram with R
k-means Clustering and Custergram with Rk-means Clustering and Custergram with R
k-means Clustering and Custergram with RDr. Volkan OBAN
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Sean Moran
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsNicholas Kiraly
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Konstantin V. Shvachko
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico
 
Top k string similarity search
Top k string similarity searchTop k string similarity search
Top k string similarity searchChiao-Meng Huang
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Texas Natural Resources Information System
 

Similaire à Accelerating Local Search with PostgreSQL (KNN-Search) (20)

MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSql
 
On Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data TypesOn Beyond (PostgreSQL) Data Types
On Beyond (PostgreSQL) Data Types
 
ClusterAnalysis
ClusterAnalysisClusterAnalysis
ClusterAnalysis
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
The state of geo in ElasticSearch
The state of geo in ElasticSearchThe state of geo in ElasticSearch
The state of geo in ElasticSearch
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 
k-means Clustering and Custergram with R
k-means Clustering and Custergram with Rk-means Clustering and Custergram with R
k-means Clustering and Custergram with R
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and Operators
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
Top k string similarity search
Top k string similarity searchTop k string similarity search
Top k string similarity search
 
Lecture 3.pdf
Lecture 3.pdfLecture 3.pdf
Lecture 3.pdf
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...
 

Plus de Jonathan Katz

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Jonathan Katz
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLJonathan Katz
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Jonathan Katz
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!Jonathan Katz
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMJonathan Katz
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMJonathan Katz
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Jonathan Katz
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesJonathan Katz
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWJonathan Katz
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesJonathan Katz
 

Plus de Jonathan Katz (14)

Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
 
Vectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQLVectors are the new JSON in PostgreSQL
Vectors are the new JSON in PostgreSQL
 
Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15Looking ahead at PostgreSQL 15
Looking ahead at PostgreSQL 15
 
Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!Build a Complex, Realtime Data Management App with Postgres 14!
Build a Complex, Realtime Data Management App with Postgres 14!
 
High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!High Availability PostgreSQL on OpenShift...and more!
High Availability PostgreSQL on OpenShift...and more!
 
Get Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAMGet Your Insecure PostgreSQL Passwords to SCRAM
Get Your Insecure PostgreSQL Passwords to SCRAM
 
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAMSafely Protect PostgreSQL Passwords - Tell Others to SCRAM
Safely Protect PostgreSQL Passwords - Tell Others to SCRAM
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018Using PostgreSQL With Docker & Kubernetes - July 2018
Using PostgreSQL With Docker & Kubernetes - July 2018
 
An Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & KubernetesAn Introduction to Using PostgreSQL with Docker & Kubernetes
An Introduction to Using PostgreSQL with Docker & Kubernetes
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
 

Dernier

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Dernier (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Accelerating Local Search with PostgreSQL (KNN-Search)

  • 1. Accelera'ng Local Search With PostgreSQL 9.1 Jonathan S. Katz Exco Ventures PGConf.eu 2011 – Oct 21, 2011
  • 2. Local Search? • Not necessarily loca'on based on places • “How close are two en''es to one another?” • “What are the closest en''es to me?” • “What are my nearest neighbors?” 2
  • 3. Nearest Neighbor Overview • Want to know “how similar” objects are rela've to each other – What are the top “k” choices near me? • Need to define a “metric” for similarity – “distance” 3
  • 4. K-­‐Nearest Neighbor • Given a collec'on of n objects • When trying to classify an unknown object – compute the distance between all known objects – find the k (k ≥ 1) closest objects to the unknown object – classify the object based on class of k closest objects • When k=1, then unknown object is given same classifica'on as object it is closest to 4
  • 5. K=1 Example Voronoi Diagram of order 1 can be used to make k=1 NN queries 5
  • 6. Just a bit more theory… • Voronoi diagrams of order-­‐1 are created in O(n log n) 'me – Looks similar to…? :-­‐) • Queried in O(1) 'me • Therefore: – Pay the 'me penalty to build the index – Query against index quickly 6
  • 7. Applica'ons • Geoloca'on + Op'mizing Posi'oning • Classifica'on • Similarity • Recommenda'on systems • Content-­‐based image retrieval • etc. 7
  • 8. So what about PostgreSQL? • As of PostgreSQL 9.0 – supports geometric types and distances • Points, circles, lines, boxes, polygons • Distance operator: <-­‐> – pg_trgm – supplied module for determining text similarity • similarity(“abc”, “ade”) computes similarity score • <-­‐> defines distance (opposite if similarity), not defined (in 9.0) 8
  • 9. PostgreSQL 9.1: KNN-­‐GiST • Let n = size of a table • Can index data that provides a “<-­‐>” (distance) operator – Geometric – pg_trgm • “k” = LIMIT clause • Known inefficiencies when k=n and n is small 9
  • 10. Example: pg_trgm • Data: – List of 1,000,000 names – 700,000 unique – n = 1,000,000 • Indexes: – CREATE INDEX names_name_idx ON names (name) – CREATE INDEX trgm_idx ON names USING gist (name gist_trgm_ops) • k=10 • Displaying query plan / execu'on 'me aqer 10 runs 10
  • 11. pg_trgm: 9.0 EXPLAIN ANALYZE SELECT name, similarity(name, 'jon') AS sim FROM names WHERE name % 'jon' ORDER BY sim DESC LIMIT 10; 11
  • 12. pg_trgm: 9.0 Limit (cost=2724.95..2724.98 rows=10 width=14) (actual 'me=192.793..192.794 rows=10 loops=1) -­‐> Sort (cost=2724.95..2727.45 rows=1000 width=14) (actual 'me=192.790..192.791 rows=10 loops=1) Sort Key: (similarity(name, 'jon'::text)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Bitmap Heap Scan on names (cost=56.47..2703.34 rows=1000 width=14) (actual 'me=188.836..192.499 rows=865 loops=1) Recheck Cond: (name % 'jon'::text) -­‐> Bitmap Index Scan on trgm_idx (cost=0.00..56.22 rows=1000 width=0) (actual 'me=188.652..188.652 rows=865 loops=1) Index Cond: (name % 'jon'::text) Total run'me: 192.881 ms 12
  • 13. pg_trgm: 9.1 EXPLAIN ANALYZE SELECT name, similarity(name, 'jon') AS sim FROM names WHERE name % 'jon' ORDER BY sim DESC LIMIT 10; 13
  • 14. pg_trgm 9.1 Limit (cost=2720.91..2720.93 rows=10 width=14) (actual 'me=202.022..202.023 rows=10 loops=1) -­‐> Sort (cost=2720.91..2723.41 rows=1000 width=14) (actual 'me=202.020..202.021 rows=10 loops=1) Sort Key: (similarity(name, 'jon'::text)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Bitmap Heap Scan on names (cost=52.43..2699.30 rows=1000 width=14) (actual 'me=198.324..201.719 rows=865 loops=1) Recheck Cond: (name % 'jon'::text) -­‐> Bitmap Index Scan on names_trgm_idx (cost=0.00..52.18 rows=1000 width=0) (actual 'me=198.156..198.156 rows=865 loops=1) Index Cond: (name % 'jon'::text) Total run1me: 202.113 ms 14
  • 15. Comparable? • Seems to be similar – Need to do more research why – anyone? • However, 9.1 offers improvements for LIKE/ ILIKE search with pg_trgm 15
  • 16. LIKE/ILIKE EXPLAIN ANALYZE SELECT name FROM names WHERE name LIKE '%ata%n'; 16
  • 17. LIKE/ILIKE pg_trgm: 9.0 vs 9.1 Seq Scan on names (cost=0.00..18717.00 rows=99 width=14) (actual 'me=0.339..205.659 rows=665 loops=1) Filter: (name ~~ '%ata%n'::text) Total run1me: 205.743 ms Bitmap Heap Scan on names (cost=9.45..369.20 rows=99 width=14) (actual 'me=122.494..125.967 rows=665 loops=1) Recheck Cond: (name ~~ '%ata%n'::text) -­‐> Bitmap Index Scan on names_trgm_idx (cost=0.00..9.42 rows=99 width=0) (actual 'me=121.972..121.972 rows=3551 loops=1) Index Cond: (name ~~ '%ata%n'::text) Total run1me: 126.065 ms 17
  • 18. Geometry • Data: – 2,000,000 points, from (0,0) -­‐> (10000, 10000) • Index: – CREATE INDEX geoloc_coord_idx ON geoloc USING gist (coord); 18
  • 19. Geometry EXPLAIN ANALYZE SELECT *, coord <-­‐> point(500,500) FROM geoloc ORDER BY coord <-­‐> point(500,500) LIMIT 10; 19
  • 20. Geometry: 9.0 vs 9.1 Limit (cost=80958.28..80958.31 rows=10 width=20) (actual 'me=1035.313..1035.316 rows=10 loops=1) -­‐> Sort (cost=80958.28..85958.28 rows=2000000 width=20) (actual 'me=1035.312..1035.314 rows=10 loops=1) Sort Key: ((coord <-­‐> '(500,500)'::point)) Sort Method: top-­‐N heapsort Memory: 25kB -­‐> Seq Scan on geoloc (cost=0.00..37739.00 rows=2000000 width=20) (actual 'me=0.029..569.501 rows=2000000 loops=1) Total run1me: 1035.349 ms Limit (cost=0.00..0.81 rows=10 width=20) (actual 'me=0.576..1.255 rows=10 loops=1) -­‐> Index Scan using geoloc_coord_idx on geoloc (cost=0.00..162068.96 rows=2000000 width=20) (actual 'me=0.575..1.251 rows=10 loops=1) Order By: (coord <-­‐> '(500,500)'::point) Total run1me: 1.391 ms 20
  • 21. Applica'on Examples • Proximity map search – fast! 21
  • 22. Drawbacks • Performance benefits are limited when: – LIMIT is close to size of data set and data set is large – Data set is small • Time to build index – High transac'on table 22
  • 23. Conclusions • GiST: “Generalized Search Tree” – index is there, up to developers to define access methods of data types – e.g. yields KNN-­‐GiST • Different types of applica'ons can be built – performance enhancements • Next steps? 23
  • 24. My Wish List • Further geometric-­‐type support in Postgres – N-­‐dimensional points – ‘=‘ operator for point type – (PostGIS s'll champion of complex geometric + geographic data types) • Define “distance” over mul'columns with different types? – SELECT (a.name, a.geocode) <-­‐> (b.name, b.geocode) FROM x a, x b; 24
  • 25. References • Oleg Bartunov – “Efficient K-­‐nearest neighbour search in PostgreSQL” (h~p://www.sai.msu.su/~megera/ postgres/talks/pgday-­‐2010.pdf) • Oleg Bartunov and Teodor Sigaev for work on KNN-­‐ GiST and notes on pg_trgm (h~p:// developer.postgresql.org/pgdocs/postgres/ pgtrgm.html) • Hubert “depesz” Lubaczewski – pa~erned benchmarks off of his work -­‐ h~p://www.depesz.com/index.php/ 2010/12/11/wai'ng-­‐for-­‐9-­‐1-­‐knngist/ 25
  • 26. Contact • jonathan.katz@excoventures.com • @jkatz05 • Feedback Please! – h~p://2011.pgconf.eu/feedback 26