SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
Scaling GIS Data in
       Nonrelational Data Stores

       featuring Mike Malone




Tuesday, March 30, 2010
Mike Malone
                           @mjmalone

Tuesday, March 30, 2010
Tuesday, March 30, 2010
SimpleGeo
      Scalable turnkey location infrastructure
      Allows you to easily add geo-aware features
      to an existing application
      That result: we need to store and query lots
      of data (data set is already approaching
      1TB, and we haven’t launched)




Tuesday, March 30, 2010
Scaling HTTP is easy
      No shared state - shared-nothing architecture
      • HTTP requests contain all of the information
        necessary to generate a response
      • HTTP responses contain all of the information
        necessary for clients to interpret them
      • In other words, requests are self-contained and
        different requests can be routed to different servers
      Uniform interface - allows middleware
      applications to proxy requests, creating a tiered
      architecture and making load balancing trivial

Tuesday, March 30, 2010
So what’s the problem?
      Individual HTTP requests have no shared
      state, but the applications that
      communicate via HTTP can and do
      Application state has to live somewhere
      • Path of least resistance is usually a relational
        database
      • But RDBMSs aren’t always the best tool for the
        job



Tuesday, March 30, 2010
Desirable Data Store Characteristics

                          Massively distributed
                          Horizontally scalable
                             Fault tolerant
                                  Fast
                            Always available




Tuesday, March 30, 2010
Relational Databases
      Based on the “relational model” first
      proposed by E.F. Codd in 1969
      Tons of implementation experience and
      lots of robust open source and proprietary
      implementations




Tuesday, March 30, 2010
RDBMS Strenghts
                              Theoretically pure
                              Clean abstraction
                              Declarative syntax
                             Mostly standardized
                           Easy to reason about data




Tuesday, March 30, 2010
ACID
      Atomicity - if one part of a transaction fails,
      the entire transaction fails
      Consistency - all data constraints must be
      met for a transaction to be successful
      Isolation - other operations can’t see a
      transaction that has not yet completed
      Durability - once the client has been
      notified that a transaction succeeded, the
      transaction will not be lost
Tuesday, March 30, 2010
RDBMS Weaknesses
      SQL is opaque, and query parsers don’t
      always do the right thing
      • Geospatial SQL is particularly bad
      The best ones are crazy expensive
      Really bad at scaling writes
      Strong consistency requirements make
      horizontal scaling difficult



Tuesday, March 30, 2010
RDBMS Writes
      Relational databases almost always use B-
      Tree (or some other tree-based) indexes
      Writes are typically implemented by doing
      an in-place update on disk
      • Requires random seek to a specific location on
        disk
      • May require additional seeks to read indexes if
        they outgrow the disk cache
      Disk seeks are bad.

Tuesday, March 30, 2010
CAP Theorem
         There are three desirable characteristics of
         a shared data system that is deployed in a
           distributed environment like the web.




Tuesday, March 30, 2010
CAP Theorem
      1. Consistency - every node in the system
      contains the same data (e.g., replicas are
      never out of date)
      2. Availability - every request to a non-failing
      node in the system returns a response
      3. Partition Tolerance - system properties
      (consistency and/or availability) hold even
      when the system is partitioned and data is
      lost

Tuesday, March 30, 2010
CAP Theorem


                          Choose two.


Tuesday, March 30, 2010
Client



                                                      reads & writes


                              reads & writes



                                         replicates



                          Node A                       Node B



Tuesday, March 30, 2010
Client




                             writes




                                      replicates



                          Node A                   Node B



Tuesday, March 30, 2010
Client




                              responds




                                         acknowledges



                          Node A                        Node B



Tuesday, March 30, 2010
Client




                              responds




                                   o noes!



                          Node A             Node B



Tuesday, March 30, 2010
What now?
      1. Write fails: data store is unavailable
      2. Write succeeds on Node A: data is
      inconsistent




Tuesday, March 30, 2010
RDBMS Consistency
      Relational databases prioritize consistency
      Large scale distributed systems need to be
      highly available
      • As we add servers, the possibility of a network
        partition or node failure becomes an inevitability
      We could write an abstraction layer on top of a
      relational data store that trades consistency for
      availability
      Or we could switch to a data store that
      prioritizes the characteristics we really want
Tuesday, March 30, 2010
Nonrelational DBs
             Over the past couple years, a number of specialized
                         data stores have emerged

                           • CouchDB     • Redis
                           • Cassandra   • MongoDB
                           • Dynamo      • SimpleDB
                           • BigTable    • Memcached
                           • Riak        • MemcacheDB
Tuesday, March 30, 2010
Also Known As NoSQL
      Not entirely appropriate, since SQL can be
      implemented on non-relational DBs
      But SQL is an opaque abstraction with lots
      of features that are difficult or impossible to
      efficiently distribute




Tuesday, March 30, 2010
So what’s different?
      Most “non-relational” stores specifically
      emphasize partition tolerance and
      availability
      Typically provide a more relaxed guarantee
      of eventually consistent




Tuesday, March 30, 2010
NoACID



Tuesday, March 30, 2010
BASE
                           Basically Available
                               Soft State
                          Eventually Consistent




Tuesday, March 30, 2010
Eventual Consistency
      Write operations are attempted on n nodes
      that are “authoritative” for the provided key
      In the event of a network partition, data is
      written to another node in the cluster
      When the network heals and nodes become
      available again, inconsistent data is
      updated



Tuesday, March 30, 2010
SimpleGeo                    Cassandra
      No single point of failure
      Efficient online cluster rebalancing allows for
      incremental scalability
      Emphasizes availability and partition tolerance
      • Eventually consistent
      • Tradeoff between consistency and latency
        exposed to the client
      Battle tested - large clusters at Facebook, Digg,
      and Twitter

Tuesday, March 30, 2010
Cassandra Data Model
      Column - a tuple containing a name, value,
      and timestamp
      Column Family - a group of columns that
      are stored together on disk
      Row - identifier for a specific group of
      columns in a column family
      Super Column - a column that has columns



Tuesday, March 30, 2010
Cassandra Data Model
                    {
                          '9xj5ss824mzyv.12345': {
                              'Record': {
                                   'lat': 40.0149856,
                                   'lon': -105.2705456,
                                   'city': 'Boulder',
                                   'state': 'CO'
                                },
                          },
                          'dr5regy3zcfgr.67890': {
                              'Record': {
                                   'lat': 40.7142691,
                                   'lon': -74.0059729,
                                   'city': 'New York',
                                   'state': 'NY'
                              }
                          }
                    }


Tuesday, March 30, 2010
Cassandra Data Model
                    {
                          '9xj5ss824mzyv.12345': {
                              'Record': {
                                   'lat': 40.0149856,
                                   'lon': -105.2705456,
                                   'city': 'Boulder',
                                   'state': 'CO'
                                },
                          },
                          'dr5regy3zcfgr.67890': {
                              'Record': {
                                   'lat': 40.7142691,
                                   'lon': -74.0059729,
                                   'city': 'New York',
                                   'state': 'NY'
                              }
                          }
                    }


Tuesday, March 30, 2010
Cassandra Data Model
                    {
                          '9xj5ss824mzyv.12345': {
                              'Record': {
                                   'lat': 40.0149856,
                                   'lon': -105.2705456,
                                   'city': 'Boulder',
                                   'state': 'CO'
                                },
                          },
                          'dr5regy3zcfgr.67890': {
                              'Record': {
                                   'lat': 40.7142691,
                                   'lon': -74.0059729,
                                   'city': 'New York',
                                   'state': 'NY'
                              }
                          }
                    }


Tuesday, March 30, 2010
Cassandra Data Model
                    {
                          '9xj5ss824mzyv.12345': {
                              'Record': {
                                   'lat': 40.0149856,
                                   'lon': -105.2705456,
                                   'city': 'Boulder',
                                   'state': 'CO'
                                },
                          },
                          'dr5regy3zcfgr.67890': {
                              'Record': {
                                   'lat': 40.7142691,
                                   'lon': -74.0059729,
                                   'city': 'New York',
                                   'state': 'NY'
                              }
                          }
                    }


Tuesday, March 30, 2010
Writes are crazy fast
      Writes are written to a commit log in the
      order they’re received - serial I/O
      New data is stored in an in-memory table
      Memory table is periodically synced to a file
      Files are occasionally merged
      Reads may end up checking multiple files
      (bloom filter helps) and merging results
      • Thats okay because reads are pretty easy to scale


Tuesday, March 30, 2010
How can I query?
      Depends on the partitioner you use
      • Random partitioner: makes it really easy to
        keep a cluster balanced, but can only do
        lookups by row key
      • Order-preserving partitioner: stores data
        ordered by row key, so it can query for ranges
        of keys, but it’s a lot harder to keep balanced




Tuesday, March 30, 2010
BYOI
      • If you need an index on something other than
        the row key, you need to build an inverted
        index yourself
           • Row key: attribute you're interested in plus row key
             being indexed
           • “dr5regy3zcfgr:com.simplegeo/1”
      • But what about indexing multiple attributes..?




Tuesday, March 30, 2010
The Curse of Dimensionality
      Location data is multidimensional
      Traditional GIS software typically uses
      some variation of a Quadtree or R-Tree for
      indexes
      Like B-Trees, R-Trees need to be updated
      in-place and are expensive to manipulate
      when they outgrow memory



Tuesday, March 30, 2010
Dimensionality Reduction
      If we think of the world as two-dimensional
      cartesian plane, we can think of latitude
      and longitude as coordinates for that plane
      Instead of using (x, y) coordinates, we can
      break the plane into a grid and number
      each box
      • Space-filling curve: a continuous line that
        intersects every point in a two-dimensional
        plane


Tuesday, March 30, 2010
Tuesday, March 30, 2010
Geohash
      A convenient dimensionality reduction
      mechanism for (latitude, longitude) coordinates
      that uses a Z-Curve
      Simply interleave the bits of a (latitude,
      longitude) pair and base32 encode the result
      Interesting characteristics
      • Easy to calculate and to reverse
      • Represent bounding boxes
      • Truncating bits from the end of a geohash results
        in a larger geohash bounding the original
Tuesday, March 30, 2010
Geohash Drawbacks
        Z-Curves are not necessarily the most
        efficient space-filling curve for range queries
        • Points on either end of the Z’s diagonal seem
          close together when they’re not
        • Points next to each other on the spherical
          earth may end up on opposite sides of our
          plane
        These inefficiencies mean we sometimes
        have to run multiple queries, or expand
        bounding box queries to cover very large
        expanses



Tuesday, March 30, 2010
Geohash Alternatives
      Hilbert curves: improve on Z-Curves but
      have different drawbacks
      Non-algorithmic unique identifiers
      • Provide unique identifiers for geopolitical and
        colloquial bounding polygons
      • Yahoo! GeoPlanet’s WOEIDs are a good
        example




Tuesday, March 30, 2010
Other stuff we use



Tuesday, March 30, 2010
Memcache
      Useful for storing ephemeral or short-lived
      data and for caching
      Super crazy extra fast
      Robust support from pretty much every
      language in the world




Tuesday, March 30, 2010
MemcacheDB
      BDB backed memcache
      We use it for statistics
      • Can’t use Cassandra because it doesn’t
        support eventually consistent increment and
        decrement operations (yet)
      Giant con: it’s pretty much impossible to
      rebalance if you add a node



Tuesday, March 30, 2010
Pushpin Service
      Custom storage solution
      R-Tree index for fast lookups
      Mostly fixed data sets so it’s ok that we can’t
      update data efficiently




Tuesday, March 30, 2010
MySQL!

             Our website still uses MySQL for some
           stuff... though we’re moving away from it




Tuesday, March 30, 2010
Thanks!



Tuesday, March 30, 2010
Ask me questions!




                          @mjmalone
Tuesday, March 30, 2010

Contenu connexe

En vedette

Design and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web CrawlerDesign and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web CrawlerGeorge Ang
 
The NoSQL Geospatial Landscape
The NoSQL Geospatial LandscapeThe NoSQL Geospatial Landscape
The NoSQL Geospatial LandscapeRaj Singh
 
人人网技术经理张铁安 Feed系统结构浅析
人人网技术经理张铁安 Feed系统结构浅析人人网技术经理张铁安 Feed系统结构浅析
人人网技术经理张铁安 Feed系统结构浅析isnull
 
OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)Luca Garulli
 
Evolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response SystemEvolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response SystemTaha Kass-Hout, MD, MS
 
Public Health Surveillance Through Collaboration
Public Health Surveillance Through CollaborationPublic Health Surveillance Through Collaboration
Public Health Surveillance Through CollaborationTaha Kass-Hout, MD, MS
 
BioSense Program Going Forward: HIMSS10 Conference
BioSense Program Going Forward: HIMSS10 ConferenceBioSense Program Going Forward: HIMSS10 Conference
BioSense Program Going Forward: HIMSS10 ConferenceTaha Kass-Hout, MD, MS
 
Geohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataGeohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataDataCards
 
Latest Advances in Megapixel Surveillance
Latest Advances in Megapixel SurveillanceLatest Advances in Megapixel Surveillance
Latest Advances in Megapixel SurveillanceSteve Ma
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechRob Emanuele
 
Matchinguu droidcon presentation
Matchinguu droidcon presentationMatchinguu droidcon presentation
Matchinguu droidcon presentationDroidcon Berlin
 
Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...Taha Kass-Hout, MD, MS
 
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리BJ Jang
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databasesArangoDB Database
 

En vedette (18)

Design and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web CrawlerDesign and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web Crawler
 
The NoSQL Geospatial Landscape
The NoSQL Geospatial LandscapeThe NoSQL Geospatial Landscape
The NoSQL Geospatial Landscape
 
人人网技术经理张铁安 Feed系统结构浅析
人人网技术经理张铁安 Feed系统结构浅析人人网技术经理张铁安 Feed系统结构浅析
人人网技术经理张铁安 Feed系统结构浅析
 
OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)OrientDB document or graph? Select the right model (old presentation)
OrientDB document or graph? Select the right model (old presentation)
 
BioSense 2.0
BioSense 2.0BioSense 2.0
BioSense 2.0
 
Big Data in Public Health
Big Data in Public HealthBig Data in Public Health
Big Data in Public Health
 
Evolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response SystemEvolve: InSTEDD's Global Early Warning and Response System
Evolve: InSTEDD's Global Early Warning and Response System
 
Public Health Surveillance Through Collaboration
Public Health Surveillance Through CollaborationPublic Health Surveillance Through Collaboration
Public Health Surveillance Through Collaboration
 
Social Media for the Meta-Leader
Social Media for the Meta-LeaderSocial Media for the Meta-Leader
Social Media for the Meta-Leader
 
BioSense Program Going Forward: HIMSS10 Conference
BioSense Program Going Forward: HIMSS10 ConferenceBioSense Program Going Forward: HIMSS10 Conference
BioSense Program Going Forward: HIMSS10 Conference
 
Geohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial DataGeohash: Integration of Disparate Geospatial Data
Geohash: Integration of Disparate Geospatial Data
 
precisionFDA
precisionFDAprecisionFDA
precisionFDA
 
Latest Advances in Megapixel Surveillance
Latest Advances in Megapixel SurveillanceLatest Advances in Megapixel Surveillance
Latest Advances in Megapixel Surveillance
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
Matchinguu droidcon presentation
Matchinguu droidcon presentationMatchinguu droidcon presentation
Matchinguu droidcon presentation
 
Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...Riff: A Social Network and Collaborative Platform for Public Health Disease S...
Riff: A Social Network and Collaborative Platform for Public Health Disease S...
 
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리
[FOSS4G Korea 2016] GeoHash를 이용한 지형도 변화탐지와 시계열 관리
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 

Similaire à Scaling GIS Data in Non-relational Data Stores

openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldOliver Hankeln
 
Availability, the Cloud and Everything
Availability, the Cloud and EverythingAvailability, the Cloud and Everything
Availability, the Cloud and Everythinglogicalstack
 
Cassandra devoxx 2010
Cassandra devoxx 2010Cassandra devoxx 2010
Cassandra devoxx 2010jbellis
 
Introduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBIntroduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBHector Correa
 
Big Data @ Bodensee Barcamp 2010
Big Data @ Bodensee Barcamp 2010Big Data @ Bodensee Barcamp 2010
Big Data @ Bodensee Barcamp 2010c1sc0
 
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability GroupCassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability Groupjbellis
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingChris Dagdigian
 
Pmit 6102-14-lec1-intro
Pmit 6102-14-lec1-introPmit 6102-14-lec1-intro
Pmit 6102-14-lec1-introJesmin Rahaman
 
Learning from ubicomp deployments keio 2010
Learning from ubicomp deployments keio 2010Learning from ubicomp deployments keio 2010
Learning from ubicomp deployments keio 2010Adrian Friday
 
Boosting performance with Mysql partitions
Boosting performance with Mysql partitionsBoosting performance with Mysql partitions
Boosting performance with Mysql partitionsGiuseppe Maxia
 
no sql presentation
no sql presentationno sql presentation
no sql presentationchandanm2
 
NoSQL - Post-Relational Databases - BarCamp Ruhr3
NoSQL - Post-Relational Databases - BarCamp Ruhr3NoSQL - Post-Relational Databases - BarCamp Ruhr3
NoSQL - Post-Relational Databases - BarCamp Ruhr3Jonathan Weiss
 
PHP and the Cloud (phpbenelux conference)
PHP and the Cloud (phpbenelux conference)PHP and the Cloud (phpbenelux conference)
PHP and the Cloud (phpbenelux conference)Ivo Jansch
 
Databases in 30 minutes.
Databases in 30 minutes.Databases in 30 minutes.
Databases in 30 minutes.Athira Mukundan
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 

Similaire à Scaling GIS Data in Non-relational Data Stores (19)

openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed world
 
Availability, the Cloud and Everything
Availability, the Cloud and EverythingAvailability, the Cloud and Everything
Availability, the Cloud and Everything
 
Cassandra devoxx 2010
Cassandra devoxx 2010Cassandra devoxx 2010
Cassandra devoxx 2010
 
Introduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDBIntroduction to NoSQL with MongoDB
Introduction to NoSQL with MongoDB
 
Big Data @ Bodensee Barcamp 2010
Big Data @ Bodensee Barcamp 2010Big Data @ Bodensee Barcamp 2010
Big Data @ Bodensee Barcamp 2010
 
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability GroupCassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability Group
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
 
Pmit 6102-14-lec1-intro
Pmit 6102-14-lec1-introPmit 6102-14-lec1-intro
Pmit 6102-14-lec1-intro
 
Learning from ubicomp deployments keio 2010
Learning from ubicomp deployments keio 2010Learning from ubicomp deployments keio 2010
Learning from ubicomp deployments keio 2010
 
Boosting performance with Mysql partitions
Boosting performance with Mysql partitionsBoosting performance with Mysql partitions
Boosting performance with Mysql partitions
 
MongoDB.pdf
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
 
no sql presentation
no sql presentationno sql presentation
no sql presentation
 
NoSQL - Post-Relational Databases - BarCamp Ruhr3
NoSQL - Post-Relational Databases - BarCamp Ruhr3NoSQL - Post-Relational Databases - BarCamp Ruhr3
NoSQL - Post-Relational Databases - BarCamp Ruhr3
 
noSQL @ QCon SP
noSQL @ QCon SPnoSQL @ QCon SP
noSQL @ QCon SP
 
PHP and the Cloud (phpbenelux conference)
PHP and the Cloud (phpbenelux conference)PHP and the Cloud (phpbenelux conference)
PHP and the Cloud (phpbenelux conference)
 
Databases in 30 minutes.
Databases in 30 minutes.Databases in 30 minutes.
Databases in 30 minutes.
 
Data Management.pptx
Data Management.pptxData Management.pptx
Data Management.pptx
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
2. network elements
2. network elements2. network elements
2. network elements
 

Dernier

Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 

Dernier (20)

Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 

Scaling GIS Data in Non-relational Data Stores

  • 1. Scaling GIS Data in Nonrelational Data Stores featuring Mike Malone Tuesday, March 30, 2010
  • 2. Mike Malone @mjmalone Tuesday, March 30, 2010
  • 4. SimpleGeo Scalable turnkey location infrastructure Allows you to easily add geo-aware features to an existing application That result: we need to store and query lots of data (data set is already approaching 1TB, and we haven’t launched) Tuesday, March 30, 2010
  • 5. Scaling HTTP is easy No shared state - shared-nothing architecture • HTTP requests contain all of the information necessary to generate a response • HTTP responses contain all of the information necessary for clients to interpret them • In other words, requests are self-contained and different requests can be routed to different servers Uniform interface - allows middleware applications to proxy requests, creating a tiered architecture and making load balancing trivial Tuesday, March 30, 2010
  • 6. So what’s the problem? Individual HTTP requests have no shared state, but the applications that communicate via HTTP can and do Application state has to live somewhere • Path of least resistance is usually a relational database • But RDBMSs aren’t always the best tool for the job Tuesday, March 30, 2010
  • 7. Desirable Data Store Characteristics Massively distributed Horizontally scalable Fault tolerant Fast Always available Tuesday, March 30, 2010
  • 8. Relational Databases Based on the “relational model” first proposed by E.F. Codd in 1969 Tons of implementation experience and lots of robust open source and proprietary implementations Tuesday, March 30, 2010
  • 9. RDBMS Strenghts Theoretically pure Clean abstraction Declarative syntax Mostly standardized Easy to reason about data Tuesday, March 30, 2010
  • 10. ACID Atomicity - if one part of a transaction fails, the entire transaction fails Consistency - all data constraints must be met for a transaction to be successful Isolation - other operations can’t see a transaction that has not yet completed Durability - once the client has been notified that a transaction succeeded, the transaction will not be lost Tuesday, March 30, 2010
  • 11. RDBMS Weaknesses SQL is opaque, and query parsers don’t always do the right thing • Geospatial SQL is particularly bad The best ones are crazy expensive Really bad at scaling writes Strong consistency requirements make horizontal scaling difficult Tuesday, March 30, 2010
  • 12. RDBMS Writes Relational databases almost always use B- Tree (or some other tree-based) indexes Writes are typically implemented by doing an in-place update on disk • Requires random seek to a specific location on disk • May require additional seeks to read indexes if they outgrow the disk cache Disk seeks are bad. Tuesday, March 30, 2010
  • 13. CAP Theorem There are three desirable characteristics of a shared data system that is deployed in a distributed environment like the web. Tuesday, March 30, 2010
  • 14. CAP Theorem 1. Consistency - every node in the system contains the same data (e.g., replicas are never out of date) 2. Availability - every request to a non-failing node in the system returns a response 3. Partition Tolerance - system properties (consistency and/or availability) hold even when the system is partitioned and data is lost Tuesday, March 30, 2010
  • 15. CAP Theorem Choose two. Tuesday, March 30, 2010
  • 16. Client reads & writes reads & writes replicates Node A Node B Tuesday, March 30, 2010
  • 17. Client writes replicates Node A Node B Tuesday, March 30, 2010
  • 18. Client responds acknowledges Node A Node B Tuesday, March 30, 2010
  • 19. Client responds o noes! Node A Node B Tuesday, March 30, 2010
  • 20. What now? 1. Write fails: data store is unavailable 2. Write succeeds on Node A: data is inconsistent Tuesday, March 30, 2010
  • 21. RDBMS Consistency Relational databases prioritize consistency Large scale distributed systems need to be highly available • As we add servers, the possibility of a network partition or node failure becomes an inevitability We could write an abstraction layer on top of a relational data store that trades consistency for availability Or we could switch to a data store that prioritizes the characteristics we really want Tuesday, March 30, 2010
  • 22. Nonrelational DBs Over the past couple years, a number of specialized data stores have emerged • CouchDB • Redis • Cassandra • MongoDB • Dynamo • SimpleDB • BigTable • Memcached • Riak • MemcacheDB Tuesday, March 30, 2010
  • 23. Also Known As NoSQL Not entirely appropriate, since SQL can be implemented on non-relational DBs But SQL is an opaque abstraction with lots of features that are difficult or impossible to efficiently distribute Tuesday, March 30, 2010
  • 24. So what’s different? Most “non-relational” stores specifically emphasize partition tolerance and availability Typically provide a more relaxed guarantee of eventually consistent Tuesday, March 30, 2010
  • 26. BASE Basically Available Soft State Eventually Consistent Tuesday, March 30, 2010
  • 27. Eventual Consistency Write operations are attempted on n nodes that are “authoritative” for the provided key In the event of a network partition, data is written to another node in the cluster When the network heals and nodes become available again, inconsistent data is updated Tuesday, March 30, 2010
  • 28. SimpleGeo Cassandra No single point of failure Efficient online cluster rebalancing allows for incremental scalability Emphasizes availability and partition tolerance • Eventually consistent • Tradeoff between consistency and latency exposed to the client Battle tested - large clusters at Facebook, Digg, and Twitter Tuesday, March 30, 2010
  • 29. Cassandra Data Model Column - a tuple containing a name, value, and timestamp Column Family - a group of columns that are stored together on disk Row - identifier for a specific group of columns in a column family Super Column - a column that has columns Tuesday, March 30, 2010
  • 30. Cassandra Data Model { '9xj5ss824mzyv.12345': { 'Record': { 'lat': 40.0149856, 'lon': -105.2705456, 'city': 'Boulder', 'state': 'CO' }, }, 'dr5regy3zcfgr.67890': { 'Record': { 'lat': 40.7142691, 'lon': -74.0059729, 'city': 'New York', 'state': 'NY' } } } Tuesday, March 30, 2010
  • 31. Cassandra Data Model { '9xj5ss824mzyv.12345': { 'Record': { 'lat': 40.0149856, 'lon': -105.2705456, 'city': 'Boulder', 'state': 'CO' }, }, 'dr5regy3zcfgr.67890': { 'Record': { 'lat': 40.7142691, 'lon': -74.0059729, 'city': 'New York', 'state': 'NY' } } } Tuesday, March 30, 2010
  • 32. Cassandra Data Model { '9xj5ss824mzyv.12345': { 'Record': { 'lat': 40.0149856, 'lon': -105.2705456, 'city': 'Boulder', 'state': 'CO' }, }, 'dr5regy3zcfgr.67890': { 'Record': { 'lat': 40.7142691, 'lon': -74.0059729, 'city': 'New York', 'state': 'NY' } } } Tuesday, March 30, 2010
  • 33. Cassandra Data Model { '9xj5ss824mzyv.12345': { 'Record': { 'lat': 40.0149856, 'lon': -105.2705456, 'city': 'Boulder', 'state': 'CO' }, }, 'dr5regy3zcfgr.67890': { 'Record': { 'lat': 40.7142691, 'lon': -74.0059729, 'city': 'New York', 'state': 'NY' } } } Tuesday, March 30, 2010
  • 34. Writes are crazy fast Writes are written to a commit log in the order they’re received - serial I/O New data is stored in an in-memory table Memory table is periodically synced to a file Files are occasionally merged Reads may end up checking multiple files (bloom filter helps) and merging results • Thats okay because reads are pretty easy to scale Tuesday, March 30, 2010
  • 35. How can I query? Depends on the partitioner you use • Random partitioner: makes it really easy to keep a cluster balanced, but can only do lookups by row key • Order-preserving partitioner: stores data ordered by row key, so it can query for ranges of keys, but it’s a lot harder to keep balanced Tuesday, March 30, 2010
  • 36. BYOI • If you need an index on something other than the row key, you need to build an inverted index yourself • Row key: attribute you're interested in plus row key being indexed • “dr5regy3zcfgr:com.simplegeo/1” • But what about indexing multiple attributes..? Tuesday, March 30, 2010
  • 37. The Curse of Dimensionality Location data is multidimensional Traditional GIS software typically uses some variation of a Quadtree or R-Tree for indexes Like B-Trees, R-Trees need to be updated in-place and are expensive to manipulate when they outgrow memory Tuesday, March 30, 2010
  • 38. Dimensionality Reduction If we think of the world as two-dimensional cartesian plane, we can think of latitude and longitude as coordinates for that plane Instead of using (x, y) coordinates, we can break the plane into a grid and number each box • Space-filling curve: a continuous line that intersects every point in a two-dimensional plane Tuesday, March 30, 2010
  • 40. Geohash A convenient dimensionality reduction mechanism for (latitude, longitude) coordinates that uses a Z-Curve Simply interleave the bits of a (latitude, longitude) pair and base32 encode the result Interesting characteristics • Easy to calculate and to reverse • Represent bounding boxes • Truncating bits from the end of a geohash results in a larger geohash bounding the original Tuesday, March 30, 2010
  • 41. Geohash Drawbacks Z-Curves are not necessarily the most efficient space-filling curve for range queries • Points on either end of the Z’s diagonal seem close together when they’re not • Points next to each other on the spherical earth may end up on opposite sides of our plane These inefficiencies mean we sometimes have to run multiple queries, or expand bounding box queries to cover very large expanses Tuesday, March 30, 2010
  • 42. Geohash Alternatives Hilbert curves: improve on Z-Curves but have different drawbacks Non-algorithmic unique identifiers • Provide unique identifiers for geopolitical and colloquial bounding polygons • Yahoo! GeoPlanet’s WOEIDs are a good example Tuesday, March 30, 2010
  • 43. Other stuff we use Tuesday, March 30, 2010
  • 44. Memcache Useful for storing ephemeral or short-lived data and for caching Super crazy extra fast Robust support from pretty much every language in the world Tuesday, March 30, 2010
  • 45. MemcacheDB BDB backed memcache We use it for statistics • Can’t use Cassandra because it doesn’t support eventually consistent increment and decrement operations (yet) Giant con: it’s pretty much impossible to rebalance if you add a node Tuesday, March 30, 2010
  • 46. Pushpin Service Custom storage solution R-Tree index for fast lookups Mostly fixed data sets so it’s ok that we can’t update data efficiently Tuesday, March 30, 2010
  • 47. MySQL! Our website still uses MySQL for some stuff... though we’re moving away from it Tuesday, March 30, 2010
  • 49. Ask me questions! @mjmalone Tuesday, March 30, 2010

Notes de l'éditeur

  1. So first of all, I’ve been head-down coding like 14 hours a day for the past couple weeks. So these slides aren’t as polished as I’d like them to be. Luckily, I’ve been working on exactly this stuff, so it’s all in the front of my mind. And since this is a workshop I can be more interactive. Interrupt with questions any time!
  2. I’ve been interested in GIS for a while, but I’m relatively new to the scene. I’ve done lots of work building scalable websites though. And, honestly, that’s a problem that’s more or less solved.
  3. HTTP has no “session state,” but applications that communicate via HTTP do have to maintain state. Without application state there’d really be no reason to have a web site or web service - if there’s no application state (nothing the web server knows that the client doesn’t) then the algorithm can be completely distributed.
  4. And, the truth is, relational databases are pretty awesome. They have a number of great characteristics. They’re well understood. And they’re robust.
  5. Most RDBMS systems are ACID compliant or at least pretty close. These characteristics allow client code to make simplifying assumptions about the data that is returned from the data store.
  6. RDBMS weaknesses are essentially the inverse of their strengths.
  7. The upshot of all this is that write performance is much poorer than read performance. But writes are much harder to scale than reads because they have to happen on an authoritative node. Reads can be scaled easily using replication.
  8. Popularized by Eric Brewer at Principles of Distributed Computing in 2000. Brewer’s Conjecture. Later formally proven. You can design a scalable architecture that maintains some of the ACID characteristics of a typical relational datastore, but eventually you’ll have to relax some of these constraints. Note that consistency, as defined here, does not mean the same thing as “consistency” in “ACID” - before it meant that all data constraints were met.
  9. So this is probably something you’ve heard before. And people often just throw it out there without an explanation. But it’s pretty easy to prove to yourself by contraction, so let’s try that.
  10. Node A and B are a master/master pair, so they replicate data to one another. To meet the ACID requirements both nodes have to write the data durably before one of them can respond successfully to a write.
  11. Abstraction layers: sharding, caching, and client-side replication are basically kludges on top of relational data stores that trade consistency for availability and partition tolerance - but why re-invent the wheel?
  12. Specialized data stores: graph databases, document databases, key-value stores, and various combinations.
  13. Facebook’s Hive project adds SQL on top of HBase, but SQL queries are translated into map-reduce jobs that are run across the distributed system
  14. Specialized data stores: graph databases, document databases, key-value stores, and various combinations.
  15. We should probably call these datastores NoACID instead of Nonrelational or NoSQL.
  16. That’d make much more sense. But I digress.
  17. Eventual consistency is a concept that was popularized by Amazon CTO Werner Vogel. This is a gross simplification, and the approaches data stores take to perform node recovery, rebalancing, and repair are often their most distinguishing characteristics. This is actually why we chose Cassandra - the distributed cluster logic is more robust than any other store I’ve seen.
  18. The outermost layer is the key.
  19. Column families are stored together on disk and are the next layer of the structure.
  20. And finally we have columns. You can have as many of them as you want, and each row and column family can have different columns. It’s schema-less - thus non-relational.
  21. Cassandra partitioners are used to decide which node data should be stored on and which node responds to a query.
  22. If we can get our data to fit model where we’re simply retrieving items by key from a sorted set then it’s pretty easy to store and query efficiently. Anything more complicated usually requires heuristics and deep insight into the data set to do at scale.
  23. At the very least we need to index on (latitude, longitude)... may also need to through altitude and time into the mix. That’s four dimensions.
  24. Space-filling curve: developed by Peano, refined by Hilbert.
  25. The non-algorithmic approach is a massive undertaking that requires constant attention and involves a large amount of ambiguity.
  26. Ask me questions!