SlideShare une entreprise Scribd logo
1  sur  54
Advanced Querying
    Brian Mitchell (strmpnk)
Query
Query

finding the right information
Query

 finding the right information
scanning and processing data
Query

 finding the right information
scanning and processing data
  traversing data structures
Everything ends up in some sort of data structure.
B-tree
               B-tree         B-tree
   B-tree   B-tree B-tree   B-tree B-tree B-tree B-tree



shallow, append only, compressed, awesome
I/O
I/O
all of your data structures are limited by the medium

         Throughput (MB/s)      Latency (microseconds)

  3000


  2250


  1500


  750


     0
           HDD                SSD                RAM
Obviously RAM is good.
        Cheap too.
         Not unlimited.
Not unlimited.
all your data




   working set
all your data




   working set




Keep it in RAM
"Working Set"

• recently accessed documents

• replicating documents

• compaction files

• index files
Controlling Working Set Size

•   smaller documents

    •   short object keys, less repetition

•   smaller databases

    •   increases locality and minimizes compaction overhead

•   fewer or smaller views

    •   multi-purpose

    •   avoid repeating document data
Primary Index
Your first line of defense against bloat
Function of an Index


  Key        Value
Function of a Primary Index
            In Couchbase




      Key                  Doc
Uniqueness



A   B    C
Uniqueness



A   B    C

    B
Uniqueness
    Semantic Keying




A         B           C

          B
One File
Always Fresh, No Extra Cleaning
Secondary Index
        aka. View
• Projects a new sequence

• Custom mapped values

• M-N

• Links back to source document
View Techniques
• Join by collation

• Page by key

• Foreign includes

• Cheap aggregates

• Flexible grouping
Join By Collation
Contact A    Contact B   Note for A   Note for B   Note for A
Join By Collation
Contact A    Contact B   Note for A   Note for B   Note for A




                          Emit


   A            B         A-note       B-note       A-note
Join By Collation
Contact A    Contact B   Note for A   Note for B   Note for A




                          Emit


   A          A-note      A-note          B         B-note
Page By Key


A   B   C    D    E
Page By Key
    limit=2


A             B   C   D   E
Page By Key
    limit=2


A             B        C        D             E
                  limit=2&start_key=Bufff0
Foreign Includes

 A           B
      Emit


 a           a
Foreign Includes

 A                   B
        Reference


_id=A               _id=B
Cheap Aggregates
• It pays to know your data well

• Reduce values are stored inline with the view
  b-tree

• Small values take very little space

• Nice built-in reduce functions

• Not just for user visible data
Flexible Grouping
2008-10-02   2008-08-17   2009-02-12




              Emit


[2008,10]     [2008, 8]    [2009, 2]
Flexible Grouping
2008-10-02   2008-08-17   2009-02-12




              Emit


[2008,10]     [2008, 8]    [2009, 2]
Flexible Grouping
2008-10-02   2008-08-17   2009-02-12




              Emit


[2008,10]     [2008, 8]    [2009, 2]
Traditional CouchDB
20%




10%



            70%
20%           20%          20%          20%
10%          10%           10%          10%
       70%           70%          70%          70%




 20%          20%           20%          20%

10%          10%           10%          10%
       70%           70%          70%          70%




 20%           20%          20%          20%
10%          10%           10%          10%
       70%           70%          70%          70%
Clustering
Single Key
Single Key
Single Key
Single Key
Query
Query
Query
Query
Alternatives
Manual Indexing
• Store an index as a document

• Good properties for mostly static indexing

• Cluster friendly

• Create custom constrains (uniqueness)

• Snapshot of a slow query for speed
GeoCouch
• R-tree based

• First-class Erlang

  • improved with view engine refactor

• Can be abused for multi-dimensional queries

  • more than just geo-data
CouchDB Lucene

• Based on CouchDB Externals

• Limited to Couchbase Single Server

• Faceted queries

• Full-text indexing
Hybrid
• Application managed

• Allow stand alone service to work with
  Couchbase cluster

  • i.e. Solr, Redis, PostgreSQL

• Complex concurrency

• More moving parts
Fin
twitter: @strmpnk
 email: b@p2p.io

Contenu connexe

En vedette

En vedette (20)

How and why governments should use OpenStreetMap - Pete Lancaster - State of ...
How and why governments should use OpenStreetMap - Pete Lancaster - State of ...How and why governments should use OpenStreetMap - Pete Lancaster - State of ...
How and why governments should use OpenStreetMap - Pete Lancaster - State of ...
 
How releasing faster changes testing
How releasing faster changes testingHow releasing faster changes testing
How releasing faster changes testing
 
Student Mentoring Programs: The Why's, How's, and More
Student Mentoring Programs: The Why's, How's, and MoreStudent Mentoring Programs: The Why's, How's, and More
Student Mentoring Programs: The Why's, How's, and More
 
An Introduction to Multisite - WordCamp Phoenix
An Introduction to Multisite - WordCamp PhoenixAn Introduction to Multisite - WordCamp Phoenix
An Introduction to Multisite - WordCamp Phoenix
 
Yippee-IA: All you need to know about Information Architecture in 5 minutes
Yippee-IA: All you need to know about Information Architecture in 5 minutesYippee-IA: All you need to know about Information Architecture in 5 minutes
Yippee-IA: All you need to know about Information Architecture in 5 minutes
 
GitHub for the Rest of Us
GitHub for the Rest of UsGitHub for the Rest of Us
GitHub for the Rest of Us
 
HTTP 2
HTTP 2HTTP 2
HTTP 2
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Make your web apps "Go, Go" like Power Rangers
Make your web apps "Go, Go" like Power RangersMake your web apps "Go, Go" like Power Rangers
Make your web apps "Go, Go" like Power Rangers
 
Martijn van Exel - Collaborate to compete: Regain your Competitive Edge with osm
Martijn van Exel - Collaborate to compete: Regain your Competitive Edge with osmMartijn van Exel - Collaborate to compete: Regain your Competitive Edge with osm
Martijn van Exel - Collaborate to compete: Regain your Competitive Edge with osm
 
TypeScript kata: The TDD Style
TypeScript kata: The TDD StyleTypeScript kata: The TDD Style
TypeScript kata: The TDD Style
 
Marketing Your Tech Talent - OSCON 2014
Marketing Your Tech Talent - OSCON 2014Marketing Your Tech Talent - OSCON 2014
Marketing Your Tech Talent - OSCON 2014
 
Lean Agile Adoption Enterprise Challenges - XP 2012
Lean Agile Adoption Enterprise Challenges - XP 2012Lean Agile Adoption Enterprise Challenges - XP 2012
Lean Agile Adoption Enterprise Challenges - XP 2012
 
Is having no limits a limitation [distilled version]
Is having no limits a limitation [distilled version]Is having no limits a limitation [distilled version]
Is having no limits a limitation [distilled version]
 
Vetting Plugins : WordCamp Columbus 2015
Vetting Plugins : WordCamp Columbus 2015Vetting Plugins : WordCamp Columbus 2015
Vetting Plugins : WordCamp Columbus 2015
 
AfriGadget @ Webmontag Frankfurt, June 6, 2011
AfriGadget @ Webmontag Frankfurt, June 6, 2011AfriGadget @ Webmontag Frankfurt, June 6, 2011
AfriGadget @ Webmontag Frankfurt, June 6, 2011
 
Engaging students in publishing on the internet early in their careers
Engaging students in publishing on the internet early in their careersEngaging students in publishing on the internet early in their careers
Engaging students in publishing on the internet early in their careers
 
Benchmarking APIs - LNUG February 2014
Benchmarking APIs - LNUG February 2014Benchmarking APIs - LNUG February 2014
Benchmarking APIs - LNUG February 2014
 
Presenting the work of OSMF Working Groups - State of the Map 2013
Presenting the work of OSMF Working Groups - State of the Map 2013Presenting the work of OSMF Working Groups - State of the Map 2013
Presenting the work of OSMF Working Groups - State of the Map 2013
 
Web Frontend development: tools and good practices to (re)organize the chaos
Web Frontend development: tools and good practices to (re)organize the chaosWeb Frontend development: tools and good practices to (re)organize the chaos
Web Frontend development: tools and good practices to (re)organize the chaos
 

Similaire à Advanced querying

MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
MongoDB
 
Scaling PostgreSQL with Skytools
Scaling PostgreSQL with SkytoolsScaling PostgreSQL with Skytools
Scaling PostgreSQL with Skytools
Gavin Roy
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
Andrew Brust
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
Daum DNA
 

Similaire à Advanced querying (20)

MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
 
6910 week 3 - web metircs and tools
6910   week 3 - web metircs and tools6910   week 3 - web metircs and tools
6910 week 3 - web metircs and tools
 
Biug 20112026 dimensional modeling and mdx best practices
Biug 20112026   dimensional modeling and mdx best practicesBiug 20112026   dimensional modeling and mdx best practices
Biug 20112026 dimensional modeling and mdx best practices
 
No sql Database
No sql DatabaseNo sql Database
No sql Database
 
Introducing DynamoDB
Introducing DynamoDBIntroducing DynamoDB
Introducing DynamoDB
 
An Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media PlatformAn Elastic Metadata Store for eBay’s Media Platform
An Elastic Metadata Store for eBay’s Media Platform
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
TechEd AU 2014: Microsoft Azure DocumentDB Deep DiveTechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive
 
Python - A Comprehensive Programming Language
Python - A Comprehensive Programming LanguagePython - A Comprehensive Programming Language
Python - A Comprehensive Programming Language
 
Austin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at BazaarvoiceAustin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at Bazaarvoice
 
Scaling PostgreSQL with Skytools
Scaling PostgreSQL with SkytoolsScaling PostgreSQL with Skytools
Scaling PostgreSQL with Skytools
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB Cluster
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 
IT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & TechnologyIT Press Tour #17 - OpenIO & Technology
IT Press Tour #17 - OpenIO & Technology
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Advanced querying

Notes de l'éditeur

  1. This presentation shares some tips on how I've gotten CouchDB to perform well for me in the past as well as things to looks forward to in the future.\n\nAdvanced is kind of a distraction. CouchDB is simple so what you see here shouldn't be that different from basic queries.\n
  2. Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
  3. Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
  4. Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
  5. Everything. Even when it's calculated live, in memory. Not all of these are created equal however. Fortunately CouchDB keeps it simple and presents one general structure for most use cases.\n
  6. I won't cover B-trees in depth here. Wikipedia is a good start if you're wondering. Keep in mind that CouchDB has a specific incarnation that gives us some special properties.\n
  7. Cornerstone to all databases, I/O will decide if your ideas fly or fail. Feeding your intense, networked, interactive software of today requires a serious study of I/O characteristics.\n
  8. Throughput and latency tend to be the measurements of choice. Notice how big of a jump RAM is. Imagine how many CPU cycles o e HDD seek is.\n
  9. So let's keep RAM in mind. Couchbase does make good use of RAM in their clustered product for documents but it's not available for queries.\n
  10. Usually enough but this should actually be measured. How, well, let's look at what I call a "working set".\n
  11. All of your data might exist somewhere on disk. That doesn't mean it can't have those disk pages cached in RAM. Keep it there. Try to keep data clustered on disk so you have better page cache and buffer cache efficiency.\n
  12. All of your data might exist somewhere on disk. That doesn't mean it can't have those disk pages cached in RAM. Keep it there. Try to keep data clustered on disk so you have better page cache and buffer cache efficiency.\n
  13. What a working set is.\n
  14. Controlling the working set by tuning your database design. This talk will focus on views for queries but all of these point matter. Measure because it better add up or your performance will be painfully slow.\n
  15. I always like to start talking about indexing by declaring that it's already there. We already have an automatic index. I call this the primary index, but that's just me.\n
  16. Key-value anyone? How do we make key based access fast. How do we accelerate random access vs sequential access. It's all about data layout. It equates to an index.\n
  17. Key-value applies to CouchDB.\n
  18. A nice property of this key index is that it provides a method of uniques. I hear this question all the time. "How do I constrain fields of a document to a unique value?" Short answer is _id.\n
  19. This leads beautifully to revision based concurrency. Semantic keying is a good idea, even if it's not in you primary index, but why wait to build a view?\n
  20. Finally, my favorite part of the primary document tree is that it's just one file. No duplication of information, do your overhead is nice and small. It's always fresh too, unlike views.\n
  21. \n\n
  22. These are just a few ideas I've made up names for.\n
  23. \n\n
  24. \n\n
  25. \n\n
  26. \n\n
  27. \n\n
  28. \n\n
  29. It's pretty obvious how this key design helps turn joins into a range query.\n
  30. \n\n
  31. \n\n
  32. \n\n
  33. \n\n
  34. \n\n
  35. _rev can also be passed, but be careful as revisions can be pruned during compaction.\n
  36. They don't cost much so it pays to have default reduce functions. It's all about knowing your data better.\n
  37. \n\n
  38. \n\n
  39. \n\n
  40. When you have one big database, you pay all costs all at once. Compaction costs, for example, can be huge.\n
  41. When you have many smaller databases, costs can be paid for incrementally. Compaction will take much less overhead for example.\n
  42. \n\n
  43. Key access is fast. Simple.\n
  44. Key access is fast. Simple.\n
  45. Key access is fast. Simple.\n
  46. \n\n
  47. \n\n
  48. \n\n
  49. Merging queries means you might have cases with partial results.\n
  50. \n\n
  51. It's still an option, especially if you need certain performance on a cluster.\n
  52. Available as part of Couchbase Single/Mobile.\n
  53. CouchDB, Couchbase Single only.\n
  54. Good way to extend an existing cluster. Up to the application layer.\n
  55. \n\n