SlideShare a Scribd company logo
1 of 44
Strictly (Ordered) Ballroom
(AKA, "geo indexing and sufficiency")
Chris Eichelberger
FOSS4G NA 2017
part 1: the pain
"You can raise welts like nobody else / as we dance to the Masochism Tango"
2
with sincere apologies to the great Tom Lehrer
3
searching for a NoSQL analog...
to answer the hard questions... which comes first?
4
● Virginia
● Massachusetts
this is the entire purpose of an index: given a data element, tell me which
bin (disk page, tablet, ...) in which it will be found if it exists
which comes first?
5
indexing properties
DATA-SPECIFIC
● think: Japanese street addresses
● lot numbers depend on building order
● good when: cross-index joins are cheap
(RDBMS)
SPACE-SPECIFIC
● think: US street addresses
● lots are aligned to block ranges
● good when: cross-index joins are
expensive (NoSQL)
6
7
8
what does an SFC look like, do?
Z-order curve, 4 bits (2x2), 16 cells
9
Z-order curve, 6 bits (3x3), 64 cells
10
Z-order curve progression
11
SFCurve... a LocationTech project is born
12
but have you tried the...
13
API with the help of
this solution would become FOSS
14
with the help of
life is good
15
● we have AN INDEX
● we can ingest geo-temporal data
● we can query with geometric bounds and a time span
part 2: the pain
"Blacken my eye, set fire to my tie / as we dance to the Masochism Tango"
16
real data are often non-uniformly distributed
17
18
a real place
19
how real data are often distributed
20
how SFC indexes might be distributed (gridded)
21
how real data tend to map to SFC indexes (bins)
22
how to trade density for uniformity
life is good
23
● we have a (POTENTIALLY SHARDED) index
● we can ingest geo-temporal data
● we can query with geometric bounds and a time span
● we don't suffer from hot-spotting
part 3: the pain
"My heart entreats, just hear those savage beats / and go put on your cleats / and come and trample me"
24
more than geo-temporal attributes?
25
https://www.britannica.com/technology/airplane/Types-of-aircraft
add more indexes!
26
● add an ATTRIBUTE index
○ tied to the SimpleFeatureType (in user data)
○ each indexed attributes has all values recorded
○ contains a complete copy of every simple feature
● add a RECORD ID index
○ automatically created, populated
○ values are assumed to be unique to the SimpleFeature
○ contains a complete copy of every simple feature
index selection
27
● simple cases
○ if you only filter on an indexed attribute, use the attribute index
○ if you only filter on a record ID, use the record-ID index
○ if you only filter on location and time, use the geo-temporal index
● all other cases
○ this is a geo-temporal store... use the geo-temporal index
life is good
28
● we have some indexes
● we can ingest geo-temporal data
● we can query with geometric bounds and a time span
● we don't suffer from hot-spotting
● we have per-attribute indexes and a record-ID index
● we have the option of querying by any one attribute OR record ID or geo-time
part 4: the pain
"Your heart is hard as stone or mahogany / that's why I'm in such exquisite agony"
29
pointedly...
30
● the world is not flat
● it (the world) contains non-point geometries
handling non-point geometries
31
Christian Böhm, Gerald Klump and Hans-Peter Kriegel. "XZ-Ordering: A Space-Filling Curve for Objects with Spatial Extension".
6th Int. Symposium on Large Spatial Databases (SSD), 1999, Hong Kong, China
add more indexes!
32
● add an XZ3 index
○ indexes longitude, latitude, and time
○ contains a complete copy of every simple feature
● add an XZ2 index, just to be sure
○ indexes longitude and latitude alone
○ contains a complete copy of every simple feature
life is good
33
● we have some indexes
● we can ingest geo-temporal data
● we can query with geometric bounds and a time span
● we don't suffer from hot-spotting
● we have per-attribute indexes and a record-ID index
● we have the option of querying by any one attribute
● we have non-duplicative indexes for non-point geometries, even those that
cross the anti-meridian
part 5: the pain
"My soul is on fire; it's aflame with desire / which is why I perspire when we tango"
34
an embarrassment of riches
35http://i.ebayimg.com/00/s/NTY2WDg0OQ==/z/U~IAAOSw-jhUBFhb/$_32.JPG?set_id=880000500F
for once!
36
● what we need is NOT another index... exactly
cost-based optimizer... oh, and summary statistics
37
● CBO
○ rewrite query using DNF... or CNF
○ estimate cost of using a particular index
■ at least whether a full-table scan is required
○ requires knowing something about cardinalities
○ ought to be able to explain why it made its choice
● statistics collection
○ responsible for providing some estimates of cardinalities (HyperLogLog, count-min sketch,
etc.)
this is really just a fancy version of the board game Guess Who?
life is good
38
● we have some indexes
● we can ingest geo-temporal data
● we can query with geometric bounds and a time span
● we don't suffer from hot-spotting
● we have per-attribute indexes and a record-ID index
● we have the option of querying by any one attribute
● we have non-duplicative indexes for non-point geometries, even those that
cross the anti-meridian
part 6: the pain
"You caught my nose in your left castanet, love / I can feel the pain yet, love / everytime I hear drums"
39
40
serious fun requires serious thought
analytics, streaming, and cross-platform support
41
Apache Arrow
"who knew [geo data] could be so complicated?"
● there exist simpler solutions
○ D4M works very well, albeit not specifically for geo-time data
○ Elasticsearch has geographic, temporal indexes
● do you have a simpler problem?
○ do you need low-latency, high-velocity streaming data ingest, processing?
○ does even your streaming, in-memory geo-time data store require secondary indexing?
○ do your clients require access via OGC services, the GeoTools API?
○ must you support multiple flavors of NoSQL?
42
if it doesn't hurt, you're doing it wrong
"Fracture my spine / and swear that you're mine / as we dance to the Masochism Tango"
43
for additional questions...
44

More Related Content

Recently uploaded

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Indexes in geo-temporal data sets... How much is enough?

  • 1. Strictly (Ordered) Ballroom (AKA, "geo indexing and sufficiency") Chris Eichelberger FOSS4G NA 2017
  • 2. part 1: the pain "You can raise welts like nobody else / as we dance to the Masochism Tango" 2 with sincere apologies to the great Tom Lehrer
  • 3. 3 searching for a NoSQL analog...
  • 4. to answer the hard questions... which comes first? 4 ● Virginia ● Massachusetts this is the entire purpose of an index: given a data element, tell me which bin (disk page, tablet, ...) in which it will be found if it exists
  • 6. indexing properties DATA-SPECIFIC ● think: Japanese street addresses ● lot numbers depend on building order ● good when: cross-index joins are cheap (RDBMS) SPACE-SPECIFIC ● think: US street addresses ● lots are aligned to block ranges ● good when: cross-index joins are expensive (NoSQL) 6
  • 7. 7
  • 8. 8 what does an SFC look like, do?
  • 9. Z-order curve, 4 bits (2x2), 16 cells 9
  • 10. Z-order curve, 6 bits (3x3), 64 cells 10
  • 12. SFCurve... a LocationTech project is born 12
  • 13. but have you tried the... 13 API with the help of
  • 14. this solution would become FOSS 14 with the help of
  • 15. life is good 15 ● we have AN INDEX ● we can ingest geo-temporal data ● we can query with geometric bounds and a time span
  • 16. part 2: the pain "Blacken my eye, set fire to my tie / as we dance to the Masochism Tango" 16
  • 17. real data are often non-uniformly distributed 17
  • 19. 19 how real data are often distributed
  • 20. 20 how SFC indexes might be distributed (gridded)
  • 21. 21 how real data tend to map to SFC indexes (bins)
  • 22. 22 how to trade density for uniformity
  • 23. life is good 23 ● we have a (POTENTIALLY SHARDED) index ● we can ingest geo-temporal data ● we can query with geometric bounds and a time span ● we don't suffer from hot-spotting
  • 24. part 3: the pain "My heart entreats, just hear those savage beats / and go put on your cleats / and come and trample me" 24
  • 25. more than geo-temporal attributes? 25 https://www.britannica.com/technology/airplane/Types-of-aircraft
  • 26. add more indexes! 26 ● add an ATTRIBUTE index ○ tied to the SimpleFeatureType (in user data) ○ each indexed attributes has all values recorded ○ contains a complete copy of every simple feature ● add a RECORD ID index ○ automatically created, populated ○ values are assumed to be unique to the SimpleFeature ○ contains a complete copy of every simple feature
  • 27. index selection 27 ● simple cases ○ if you only filter on an indexed attribute, use the attribute index ○ if you only filter on a record ID, use the record-ID index ○ if you only filter on location and time, use the geo-temporal index ● all other cases ○ this is a geo-temporal store... use the geo-temporal index
  • 28. life is good 28 ● we have some indexes ● we can ingest geo-temporal data ● we can query with geometric bounds and a time span ● we don't suffer from hot-spotting ● we have per-attribute indexes and a record-ID index ● we have the option of querying by any one attribute OR record ID or geo-time
  • 29. part 4: the pain "Your heart is hard as stone or mahogany / that's why I'm in such exquisite agony" 29
  • 30. pointedly... 30 ● the world is not flat ● it (the world) contains non-point geometries
  • 31. handling non-point geometries 31 Christian Böhm, Gerald Klump and Hans-Peter Kriegel. "XZ-Ordering: A Space-Filling Curve for Objects with Spatial Extension". 6th Int. Symposium on Large Spatial Databases (SSD), 1999, Hong Kong, China
  • 32. add more indexes! 32 ● add an XZ3 index ○ indexes longitude, latitude, and time ○ contains a complete copy of every simple feature ● add an XZ2 index, just to be sure ○ indexes longitude and latitude alone ○ contains a complete copy of every simple feature
  • 33. life is good 33 ● we have some indexes ● we can ingest geo-temporal data ● we can query with geometric bounds and a time span ● we don't suffer from hot-spotting ● we have per-attribute indexes and a record-ID index ● we have the option of querying by any one attribute ● we have non-duplicative indexes for non-point geometries, even those that cross the anti-meridian
  • 34. part 5: the pain "My soul is on fire; it's aflame with desire / which is why I perspire when we tango" 34
  • 35. an embarrassment of riches 35http://i.ebayimg.com/00/s/NTY2WDg0OQ==/z/U~IAAOSw-jhUBFhb/$_32.JPG?set_id=880000500F
  • 36. for once! 36 ● what we need is NOT another index... exactly
  • 37. cost-based optimizer... oh, and summary statistics 37 ● CBO ○ rewrite query using DNF... or CNF ○ estimate cost of using a particular index ■ at least whether a full-table scan is required ○ requires knowing something about cardinalities ○ ought to be able to explain why it made its choice ● statistics collection ○ responsible for providing some estimates of cardinalities (HyperLogLog, count-min sketch, etc.) this is really just a fancy version of the board game Guess Who?
  • 38. life is good 38 ● we have some indexes ● we can ingest geo-temporal data ● we can query with geometric bounds and a time span ● we don't suffer from hot-spotting ● we have per-attribute indexes and a record-ID index ● we have the option of querying by any one attribute ● we have non-duplicative indexes for non-point geometries, even those that cross the anti-meridian
  • 39. part 6: the pain "You caught my nose in your left castanet, love / I can feel the pain yet, love / everytime I hear drums" 39
  • 40. 40 serious fun requires serious thought
  • 41. analytics, streaming, and cross-platform support 41 Apache Arrow
  • 42. "who knew [geo data] could be so complicated?" ● there exist simpler solutions ○ D4M works very well, albeit not specifically for geo-time data ○ Elasticsearch has geographic, temporal indexes ● do you have a simpler problem? ○ do you need low-latency, high-velocity streaming data ingest, processing? ○ does even your streaming, in-memory geo-time data store require secondary indexing? ○ do your clients require access via OGC services, the GeoTools API? ○ must you support multiple flavors of NoSQL? 42
  • 43. if it doesn't hurt, you're doing it wrong "Fracture my spine / and swear that you're mine / as we dance to the Masochism Tango" 43