SlideShare une entreprise Scribd logo
1  sur  55
Querying Nested
JSON Data Using
N1QL and
Couchbase
TriNUG Data SIG
6/6/2018
Who is this guy?
• Brant Burnett - @btburnett3
• Systems Architect at CenterEdge Software
• .NET since 1.0, SQL Server since 7.0
• MCSD, MCDBA
• Experience from desktop apps to large
scale cloud services
NoSQL Credentials
• Couchbase user since 2012 (v1.8)
• Couchbase Community Expert
• Open source contributions:
• Couchbase .NET SDK
• Couchbase.Extensions for .NET Core
• Couchbase LINQ provider (Linq2Couchbase)
• CouchbaseFakeIt
• couchbase-index-manager
Content
Attributions
• Matthew Groves
Couchbase Developer Advocate
@mgroves
crosscuttingconcerns.com
What is Couchbase
• NoSQL document database
• Get and set documents by key
• Imagine a giant folder full of JSON files
• If you know the filename, you can get or
update the content
• Additional features:
• Query using N1QL (SQL-based)
• Map-Reduce Views
• Full Text Search
• Analytics (Preview in 5.5)
• Eventing (5.5)
• Couchbase is not CouchDB
Why Couchbase
• Scalability
• Availability
• Performance
• Agility
Agenda
Introduction to N1QL
Working with JSON Types
Joins In N1QL
Indexing in Couchbase
Query Optimization
Introduction to N1QL Pronounced “nickel”
What’s a Bucket?
• Large collection of JSON
documents
• Every document may have a
different schema
• Documents are accessed by a
string called the key
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer
{
"Name": "Jane Smith",
"DOB": "1990-01-30",
"type": "customer"
}
Document Key: customer-CBL2015
So how do I query data from a bucket?
This Photo by Unknown Author is licensed under CC BY-SA
{
"Name": "John Smith",
"DOB": "1990-06-29",
"type": "customer"
}
Document Key: customer-CBL2016
SELECT Name, DOB FROM Bucket
WHERE type = 'customer' AND Name LIKE '%Smith'
ORDER BY Name
[{
"Name": "Jane Smith",
"DOB": "1990-01-30"
},
{
"Name": "John Smith",
"DOB": "1990-06-29"
}]
{
"Name": "Jane Smith",
"DOB": "1990-01-30",
"type": "customer"
}
Document Key: customer-CBL2015
What other SQL features are
supported?
• Aggregation (MIN, MAX, SUM, AVG, COUNT, etc)
• GROUP BY/HAVING
• OFFSET/LIMIT
• Subqueries
• UNION/INTERSECT/EXCEPT
• Joins (more details to come…)
• UPDATE/INSERT/DELETE/UPSERT
Accessing Nested Objects
Key: airport_3484
{
"airportname": "Los Angeles Intl",
"city": "Los Angeles",
"country": "United States",
"faa": "LAX",
"geo": {
"alt": 126,
"lat": 33.942536,
"lon": -118.408075
},
"icao": "KLAX",
"id": 3484,
"type": "airport",
"tz": "America/Los_Angeles"
}
SELECT *
FROM `travel-sample`
WHERE type = 'airport' AND geo.alt < 1000
SELECT *
FROM `travel-sample`
WHERE type = 'route'
AND schedules[0].day = 1
O backtick, backtick!
Wherefore art thou a
backtick?
• ANSI SQL delimits identifiers with double quotes
• SELECT * FROM "table-name"
• T-SQL also delimits identifiers with square
brackets
• SELECT * FROM [table-name]
• Both of these are used in JSON!
• {"array": ["string1", "string2"]}
• So, N1QL uses the backtick instead
• SELECT * FROM `bucket-name`
This Photo by Unknown Author is licensed under CC BY-NC-ND
Working with
JSON Types
Strings
Supported by JSON
Collation is always
case sensitive
1
Literals are delimited with
either double or single
quotes
x = 'my string here’
x = "my string here"
2
Various supporting
functions
•String concatenation (||)
•LENGTH
•LOWER
•CONTAINS
•TRIM, etc…
3
Numbers
Supported by JSON
1
Literals are included
without delimiters
x = 123456.05124
2
Various supporting
functions
• Arithmetic operators
• ABS
• CEIL
• SQRT
• TRUNC, etc...
3
Booleans
Supported by JSON
1
Literals are true and
false
x = true
2
Various supporting
operators
• NOT
• AND
• OR
3
Arrays
Supported by JSON
1
Literals are comma
delimited and surrounded
by square brackets
[1, 2, 3, "a"]
2
Various supporting
functions
• subqueries
• ARRAY_CONTAINS
• ARRAY_AVG
• ARRAY_INSERT
• ARRAY_LENGTH, etc…
3
Objects
Supported by JSON
1
Literals are comma
delimited key/value pairs
surrounded by curly
braces
{"key": "value"}
2
Various supporting
functions
• OBJECT _NAMES
• OBJECT_PAIRS
• OBJECT_VALUES, etc…
3
Nulls
Supported by JSON
1
Literal is the word null,
no delimiters
{"key": null}
2
Various supporting
operators and
functions
• IS NULL
• IS NOT NULL
• IFNULL, etc…
3
Missing attributes
Supported by JSON
Can’t be explicitly
declared
Similar to undefined in
Javascript
1
No literal, simply don’t
include an attribute in
an object
{}
2
Various supporting
operators and functions
• IS MISSING
• IS NOT MISSING
• IFMISSING
• IFMISSINGORNULL, etc…
3
Date/times
Not officially supported by
JSON
Can be stored using other data
types
Usually either ISO8601 string
or number of milliseconds
since the Unix epoch
1
Literal depends on the data
type
"2018-04-06T19:26:29.000Z"
1528140389000
2
Various supporting functions
• STR_TO_MILLIS
• CLOCK_MILLIS, ClOCK_STR
• DATE_PART_STR, DATE_PART_MILLIS
• DATE_DIFF_STR, DATE_DIFF_MILLIS
• etc…
3
Joins In N1QL
Key: route_10000
{
"airline": "AF",
"airlineid": "airline_137",
"destinationairport": "MRS",
"distance": 2881.617376098415,
"equipment": "320",
"id": 10000,
"schedule": [
{
"day": 0,
"flight": "AF198",
"utc": "10:13:00"
},
{
"day": 0,
"flight": "AF547",
"utc": "19:14:00"
}
],
"sourceairport": "TLV",
"stops": 0,
"type": "route"
}
Referenced 1:N Relationship
Key: airline_137
{
"callsign": "AIRFRANS",
"country": "France",
"iata": "AF",
"icao": "AFR",
"id": 137,
"name": "Air France",
"type": "airline"
}
Joining by Primary Key
SELECT route.sourceairport, route.destinationairport, airline.name
FROM `travel-sample` AS route
INNER JOIN `travel-sample` AS airline
ON route.airlineid = META(airline).id
WHERE route.type = 'route'
ORDER BY route.sourceairport, route.destinationairport, airline.name
Key: route_10000
{
"airline": "AF",
"airlineid": "airline_137",
"destinationairport": "MRS",
"distance": 2881.617376098415,
"equipment": "320",
"id": 10000,
"schedule": [
{
"day": 0,
"flight": "AF198",
"utc": "10:13:00"
},
{
"day": 0,
"flight": "AF547",
"utc": "19:14:00"
}
],
"sourceairport": "TLV",
"stops": 0,
"type": "route"
}
Referenced 1:N Relationship
Key: airline_137
{
"callsign": "AIRFRANS",
"country": "France",
"iata": "AF",
"icao": "AFR",
"id": 137,
"name": "Air France",
"type": "airline"
}
Joining by Attributes
SELECT route.sourceairport, route.destinationairport, airline.name
FROM `travel-sample` AS route
INNER JOIN `travel-sample` AS airline
ON route.airline = airline.iata AND airline.type = 'airline'
WHERE route.type = 'route'
ORDER BY route.sourceairport, route.destinationairport, airline.name
Key: route_10000
{
"airline": "AF",
"airlineid": "airline_137",
"destinationairport": "MRS",
"distance": 2881.617376098415,
"equipment": "320",
"id": 10000,
"schedule": [
{
"day": 0,
"flight": "AF198",
"utc": "10:13:00"
},
{
"day": 0,
"flight": "AF547",
"utc": "19:14:00"
}
],
"sourceairport": "TLV",
"stops": 0,
"type": "route"
}
Embedded 1:N Relationship
Flattening Embedded Lists
SELECT route.sourceairport, route.destinationairport, schedule.utc
FROM `travel-sample` AS route
UNNEST route.schedule AS schedule
WHERE route.type = 'route' AND schedule.day = 0
ORDER BY route.sourceairport, route.destinationairport, schedule.utc
My data’s not flat, why are my queries?
This Photo by Unknown Author is licensed under CC BY-SA
Key: route_50490
{
"airline": "SQ",
"airlineid": "airline_4435",
"destinationairport": "ORD",
"distance": 2802.1171926467396,
"equipment": "320",
"id": 50490,
"schedule": [{
"day": 0,
"flight": "SQ279",
"utc": "15:13:00"
}, {
"day": 0,
"flight": "SQ835",
"utc": "21:10:00"
}],
"sourceairport": "LAX",
"stops": 0,
"type": "route"
}
Referenced 1:N Relationship
Key: airport_3484
{
"airportname": "Los Angeles Intl",
"city": "Los Angeles",
"country": "United States",
"faa": "LAX",
"geo": {
"alt": 126,
"lat": 33.942536,
"lon": -118.408075
},
"icao": "KLAX",
"id": 3484,
"type": "airport",
"tz": "America/Los_Angeles"
}
Nesting (a.k.a. LINQ GroupJoin)
SELECT
airport.*,
(SELECT RAW r2.destinationairport FROM routes AS r2) AS destinations
FROM `travel-sample` AS airport
INNER NEST `travel-sample` AS routes
ON airport.faa = routes.sourceairport AND routes.type = 'route'
WHERE airport.type = 'airport'
AND airport.airportname LIKE 'Los Angeles%'
Nesting (a.k.a. LINQ GroupJoin)
{
"airportname": "Los Angeles Intl",
"city": "Los Angeles",
"country": "United States",
"destinations": ["PHX", "SEA", "MCO", "ATL", "SYD", "YYZ", "LIM", "LHR", "IND", "CLE", "..."],
"faa": "LAX",
"geo": {
"alt": 126,
"lat": 33.942536,
"lon": -118.408075
},
"icao": "KLAX",
"id": 3484,
"type": "airport",
"tz": "America/Los_Angeles"
}
Indexing in
Couchbase
Global Secondary Indexes
a.k.a. GSI
The Primary Index
CREATE PRIMARY INDEX ON bucket SELECT * FROM bucket
Single Attribute Index
CREATE INDEX docsByName
ON bucket (name)
SELECT * FROM bucket
WHERE name LIKE 'A%'
SELECT * FROM bucket
WHERE name >= 'A' AND name < 'N'
Multiple Attribute Index
CREATE INDEX docsByNames ON bucket
(lastName, firstName)
SELECT * FROM bucket
WHERE lastName LIKE 'A%'
SELECT * FROM bucket
WHERE lastName = 'Burnett'
AND firstName LIKE ‘B%'
Expression Index
CREATE INDEX docsByName ON bucket
(LOWER(lastName), LOWER(firstName))
SELECT * FROM bucket
WHERE LOWER(lastName) LIKE 'a%'
SELECT * FROM bucket
WHERE LOWER(lastName) = 'burnett'
AND LOWER(firstName) LIKE 'b%'
Filtered Index
CREATE INDEX custsByName ON bucket
(LOWER(lastName), LOWER(firstName))
WHERE type = 'customer'
SELECT * FROM bucket
WHERE LOWER(lastName) LIKE 'a%'
AND type = 'customer'
SELECT * FROM bucket
WHERE LOWER(lastName) = 'burnett'
AND LOWER(firstName) LIKE 'b%'
AND type = 'customer'
Array Index
CREATE INDEX custsByNickName ON bucket
(DISTINCT ARRAY p FOR p IN nickNames END)
WHERE type = 'customer’
SELECT * FROM bucket
WHERE ANY p IN nickNames SATISFIES p = 'Buzz' END
AND type = 'customer'
CREATE INDEX custsByNickName ON bucket
(DISTINCT ARRAY LOWER(p) FOR p IN nickNames END)
WHERE type = 'customer’
SELECT * FROM bucket
WHERE ANY p IN nickNames SATISFIES LOWER(p) = 'buzz' END
AND type = 'customer'
Index Nodes
Node B
Index Architecture
Data Nodes
Node A
DCP
DCP
Index 1
Index 2
Replica
Index 3
Index 1
Replica
Index 2 Index 4
Deferring Index Build
CREATE INDEX docsByName
ON bucket (name)
WITH {"defer_build": true}
CREATE INDEX docsByNames
ON bucket (lastName, firstName)
WITH {"defer_build": true}
BUILD INDEX ON bucket
(docsByName, docsByNames)
Replicated Index
CREATE INDEX custsByName ON bucket
(LOWER(lastName), LOWER(firstName))
WHERE type = 'customer'
WITH {"num_replica": 1}
SELECT * FROM bucket
WHERE LOWER(lastName) LIKE 'a%’
AND type = 'customer'
SELECT * FROM bucket
WHERE LOWER(lastName) = 'burnett'
AND LOWER(firstName) LIKE 'b%’
AND type = 'customer'
Partitioned Index
CREATE INDEX custsByName ON bucket
(LOWER(lastName), LOWER(firstName))
WHERE type = 'customer'
PARTITION BY hash(tenantId)
WITH {"num_replica": 1}
SELECT * FROM bucket
WHERE LOWER(lastName) LIKE 'a%’
AND type = 'customer'
SELECT * FROM bucket
WHERE LOWER(lastName) = 'burnett'
AND tenantId = 123456
AND type = 'customer'
Query Optimization
Index Selection Criteria
• All predicates on the index must be included in
the query
• The first index expression must be in the
predicate
• Chooses the index with the most matching
expressions
• If more than one option, chooses one at random
for load balancing
• Does not use statistics for optimization (yet…)
Query Node
Query Process (a simplified subset)
Data Nodes
Index Node
1. Incoming Query 7. Query Result
2. Query Plan
7. Filter, Sort, Agg, etc
Live Demo!
This should be interesting…
This Photo by Unknown Author is licensed under CC BY-NC-SA
Nested Loop vs Hash Join in C#
Nested Loop Join
IEnumerable<RouteAirlines> Join(
IList<Route> routes, IList<Airline> airlines)
{
foreach (var route in routes)
{
var routeAirlines = new RouteAirlines
{
Route = route,
Airlines = new List<Airline>()
};
foreach (var airline in airlines)
{
if (airline.Iata == route.Airline) {
routeAirlines.Add(airline);
}
}
yield return routeAirlines;
}
}
Hash Join
IEnumerable<RouteAirlines> Join(
IList<Route> routes, IList<Airline> airlines)
{
var hashTable = airlines.ToLookup(p => p.Iata);
foreach (var route in routes)
{
var routeAirlines = new RouteAirlines
{
Route = route,
Airlines = hashTable[route.Airline].ToList()
};
yield return routeAirlines;
}
}
N1QL Hash Join
SELECT route.sourceairport, route.destinationairport, airline.name
FROM `travel-sample` AS route
INNER JOIN `travel-sample` AS airline USE HASH(build)
ON route.airline = airline.iata AND airline.type = 'airline'
WHERE route.type = 'route'
ORDER BY route.sourceairport, route.destinationairport, airline.name
Key Optimization Takeaways
Make sure fetch
is no larger than
necessary
1
Design covering
indexes where
possible
2
Watch out for
pagination
3
Consider USE
HASH where
applicable
4
Keep joins to a
minimum
5
Thanks for Coming!
Questions?
This Photo by Unknown Author is licensed under CC BY-NC-ND

Contenu connexe

Tendances

Effective testing with pytest
Effective testing with pytestEffective testing with pytest
Effective testing with pytestHector Canto
 
Refactoring: A First Example - Martin Fowler’s First Example of Refactoring, ...
Refactoring: A First Example - Martin Fowler’s First Example of Refactoring, ...Refactoring: A First Example - Martin Fowler’s First Example of Refactoring, ...
Refactoring: A First Example - Martin Fowler’s First Example of Refactoring, ...Philip Schwarz
 
jq: JSON - Like a Boss
jq: JSON - Like a Bossjq: JSON - Like a Boss
jq: JSON - Like a BossBob Tiernay
 
Real Time Integration with Salesforce Platform Events
Real Time Integration with Salesforce Platform EventsReal Time Integration with Salesforce Platform Events
Real Time Integration with Salesforce Platform EventsSalesforce Developers
 
Introduction to apex code
Introduction to apex codeIntroduction to apex code
Introduction to apex codeEdwinOstos
 
Implementing the IO Monad in Scala
Implementing the IO Monad in ScalaImplementing the IO Monad in Scala
Implementing the IO Monad in ScalaHermann Hueck
 
Idiomatic Kotlin
Idiomatic KotlinIdiomatic Kotlin
Idiomatic Kotlinintelliyole
 
T pn r3trans
T pn r3transT pn r3trans
T pn r3transRaj p
 
2 years with python and serverless
2 years with python and serverless2 years with python and serverless
2 years with python and serverlessHector Canto
 
Reactive programming with RxJava
Reactive programming with RxJavaReactive programming with RxJava
Reactive programming with RxJavaJobaer Chowdhury
 
Core Archive for SAP Solutions
Core Archive for SAP SolutionsCore Archive for SAP Solutions
Core Archive for SAP SolutionsOpenText
 
ScalikeJDBC Tutorial for Beginners
ScalikeJDBC Tutorial for BeginnersScalikeJDBC Tutorial for Beginners
ScalikeJDBC Tutorial for BeginnersKazuhiro Sera
 
Getting started with Marketing Cloud
Getting started with Marketing CloudGetting started with Marketing Cloud
Getting started with Marketing Cloudsonumanoj
 
Etiquetas html 2
Etiquetas html 2Etiquetas html 2
Etiquetas html 2Pepe Potamo
 
Best Practices for Forwarder Hierarchies
Best Practices for Forwarder HierarchiesBest Practices for Forwarder Hierarchies
Best Practices for Forwarder HierarchiesSplunk
 

Tendances (20)

Effective testing with pytest
Effective testing with pytestEffective testing with pytest
Effective testing with pytest
 
Advanced Json
Advanced JsonAdvanced Json
Advanced Json
 
Refactoring: A First Example - Martin Fowler’s First Example of Refactoring, ...
Refactoring: A First Example - Martin Fowler’s First Example of Refactoring, ...Refactoring: A First Example - Martin Fowler’s First Example of Refactoring, ...
Refactoring: A First Example - Martin Fowler’s First Example of Refactoring, ...
 
jq: JSON - Like a Boss
jq: JSON - Like a Bossjq: JSON - Like a Boss
jq: JSON - Like a Boss
 
Real Time Integration with Salesforce Platform Events
Real Time Integration with Salesforce Platform EventsReal Time Integration with Salesforce Platform Events
Real Time Integration with Salesforce Platform Events
 
Introduction to JavaFX
Introduction to JavaFXIntroduction to JavaFX
Introduction to JavaFX
 
Sql queires
Sql queiresSql queires
Sql queires
 
Introduction to apex code
Introduction to apex codeIntroduction to apex code
Introduction to apex code
 
Implementing the IO Monad in Scala
Implementing the IO Monad in ScalaImplementing the IO Monad in Scala
Implementing the IO Monad in Scala
 
Idiomatic Kotlin
Idiomatic KotlinIdiomatic Kotlin
Idiomatic Kotlin
 
T pn r3trans
T pn r3transT pn r3trans
T pn r3trans
 
2 years with python and serverless
2 years with python and serverless2 years with python and serverless
2 years with python and serverless
 
WebGL 2.0 Reference Guide
WebGL 2.0 Reference GuideWebGL 2.0 Reference Guide
WebGL 2.0 Reference Guide
 
Introduction to Redux
Introduction to ReduxIntroduction to Redux
Introduction to Redux
 
Reactive programming with RxJava
Reactive programming with RxJavaReactive programming with RxJava
Reactive programming with RxJava
 
Core Archive for SAP Solutions
Core Archive for SAP SolutionsCore Archive for SAP Solutions
Core Archive for SAP Solutions
 
ScalikeJDBC Tutorial for Beginners
ScalikeJDBC Tutorial for BeginnersScalikeJDBC Tutorial for Beginners
ScalikeJDBC Tutorial for Beginners
 
Getting started with Marketing Cloud
Getting started with Marketing CloudGetting started with Marketing Cloud
Getting started with Marketing Cloud
 
Etiquetas html 2
Etiquetas html 2Etiquetas html 2
Etiquetas html 2
 
Best Practices for Forwarder Hierarchies
Best Practices for Forwarder HierarchiesBest Practices for Forwarder Hierarchies
Best Practices for Forwarder Hierarchies
 

Similaire à Querying Nested JSON Data Using N1QL and Couchbase

Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...Codemotion
 
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB
 
NoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseNoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseBrant Burnett
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flinkKenny Gorman
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0Keshav Murthy
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioRandall Hunt
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorCouchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorKeshav Murthy
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosJoe Stein
 
What are customers building with new Bing Maps capabilities
What are customers building with new Bing Maps capabilitiesWhat are customers building with new Bing Maps capabilities
What are customers building with new Bing Maps capabilitiesMicrosoft Tech Community
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...confluent
 
JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...
JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...
JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...Jim Czuprynski
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5Keshav Murthy
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesKeshav Murthy
 

Similaire à Querying Nested JSON Data Using N1QL and Couchbase (20)

Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
 
MongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDBMongoDB World 2016: The Best IoT Analytics with MongoDB
MongoDB World 2016: The Best IoT Analytics with MongoDB
 
NoSQL Data Modeling using Couchbase
NoSQL Data Modeling using CouchbaseNoSQL Data Modeling using Couchbase
NoSQL Data Modeling using Couchbase
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flink
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.io
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
Socket.io
Socket.ioSocket.io
Socket.io
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorCouchbase N1QL: Index Advisor
Couchbase N1QL: Index Advisor
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
What are customers building with new Bing Maps capabilities
What are customers building with new Bing Maps capabilitiesWhat are customers building with new Bing Maps capabilities
What are customers building with new Bing Maps capabilities
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
 
Presentation
PresentationPresentation
Presentation
 
JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...
JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...
JSON, A Splash of SODA, and a SQL Chaser: Real-World Use Cases for Autonomous...
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
CARTO ENGINE
CARTO ENGINECARTO ENGINE
CARTO ENGINE
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune Queries
 

Dernier

VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 

Dernier (20)

VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 

Querying Nested JSON Data Using N1QL and Couchbase

  • 1. Querying Nested JSON Data Using N1QL and Couchbase TriNUG Data SIG 6/6/2018
  • 2. Who is this guy? • Brant Burnett - @btburnett3 • Systems Architect at CenterEdge Software • .NET since 1.0, SQL Server since 7.0 • MCSD, MCDBA • Experience from desktop apps to large scale cloud services
  • 3. NoSQL Credentials • Couchbase user since 2012 (v1.8) • Couchbase Community Expert • Open source contributions: • Couchbase .NET SDK • Couchbase.Extensions for .NET Core • Couchbase LINQ provider (Linq2Couchbase) • CouchbaseFakeIt • couchbase-index-manager
  • 4. Content Attributions • Matthew Groves Couchbase Developer Advocate @mgroves crosscuttingconcerns.com
  • 5. What is Couchbase • NoSQL document database • Get and set documents by key • Imagine a giant folder full of JSON files • If you know the filename, you can get or update the content • Additional features: • Query using N1QL (SQL-based) • Map-Reduce Views • Full Text Search • Analytics (Preview in 5.5) • Eventing (5.5) • Couchbase is not CouchDB
  • 6. Why Couchbase • Scalability • Availability • Performance • Agility
  • 7. Agenda Introduction to N1QL Working with JSON Types Joins In N1QL Indexing in Couchbase Query Optimization
  • 8. Introduction to N1QL Pronounced “nickel”
  • 9. What’s a Bucket? • Large collection of JSON documents • Every document may have a different schema • Documents are accessed by a string called the key
  • 10. CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Table: Customer { "Name": "Jane Smith", "DOB": "1990-01-30", "type": "customer" } Document Key: customer-CBL2015
  • 11. So how do I query data from a bucket? This Photo by Unknown Author is licensed under CC BY-SA
  • 12. { "Name": "John Smith", "DOB": "1990-06-29", "type": "customer" } Document Key: customer-CBL2016 SELECT Name, DOB FROM Bucket WHERE type = 'customer' AND Name LIKE '%Smith' ORDER BY Name [{ "Name": "Jane Smith", "DOB": "1990-01-30" }, { "Name": "John Smith", "DOB": "1990-06-29" }] { "Name": "Jane Smith", "DOB": "1990-01-30", "type": "customer" } Document Key: customer-CBL2015
  • 13. What other SQL features are supported? • Aggregation (MIN, MAX, SUM, AVG, COUNT, etc) • GROUP BY/HAVING • OFFSET/LIMIT • Subqueries • UNION/INTERSECT/EXCEPT • Joins (more details to come…) • UPDATE/INSERT/DELETE/UPSERT
  • 14. Accessing Nested Objects Key: airport_3484 { "airportname": "Los Angeles Intl", "city": "Los Angeles", "country": "United States", "faa": "LAX", "geo": { "alt": 126, "lat": 33.942536, "lon": -118.408075 }, "icao": "KLAX", "id": 3484, "type": "airport", "tz": "America/Los_Angeles" } SELECT * FROM `travel-sample` WHERE type = 'airport' AND geo.alt < 1000 SELECT * FROM `travel-sample` WHERE type = 'route' AND schedules[0].day = 1
  • 15. O backtick, backtick! Wherefore art thou a backtick? • ANSI SQL delimits identifiers with double quotes • SELECT * FROM "table-name" • T-SQL also delimits identifiers with square brackets • SELECT * FROM [table-name] • Both of these are used in JSON! • {"array": ["string1", "string2"]} • So, N1QL uses the backtick instead • SELECT * FROM `bucket-name` This Photo by Unknown Author is licensed under CC BY-NC-ND
  • 17. Strings Supported by JSON Collation is always case sensitive 1 Literals are delimited with either double or single quotes x = 'my string here’ x = "my string here" 2 Various supporting functions •String concatenation (||) •LENGTH •LOWER •CONTAINS •TRIM, etc… 3
  • 18. Numbers Supported by JSON 1 Literals are included without delimiters x = 123456.05124 2 Various supporting functions • Arithmetic operators • ABS • CEIL • SQRT • TRUNC, etc... 3
  • 19. Booleans Supported by JSON 1 Literals are true and false x = true 2 Various supporting operators • NOT • AND • OR 3
  • 20. Arrays Supported by JSON 1 Literals are comma delimited and surrounded by square brackets [1, 2, 3, "a"] 2 Various supporting functions • subqueries • ARRAY_CONTAINS • ARRAY_AVG • ARRAY_INSERT • ARRAY_LENGTH, etc… 3
  • 21. Objects Supported by JSON 1 Literals are comma delimited key/value pairs surrounded by curly braces {"key": "value"} 2 Various supporting functions • OBJECT _NAMES • OBJECT_PAIRS • OBJECT_VALUES, etc… 3
  • 22. Nulls Supported by JSON 1 Literal is the word null, no delimiters {"key": null} 2 Various supporting operators and functions • IS NULL • IS NOT NULL • IFNULL, etc… 3
  • 23. Missing attributes Supported by JSON Can’t be explicitly declared Similar to undefined in Javascript 1 No literal, simply don’t include an attribute in an object {} 2 Various supporting operators and functions • IS MISSING • IS NOT MISSING • IFMISSING • IFMISSINGORNULL, etc… 3
  • 24. Date/times Not officially supported by JSON Can be stored using other data types Usually either ISO8601 string or number of milliseconds since the Unix epoch 1 Literal depends on the data type "2018-04-06T19:26:29.000Z" 1528140389000 2 Various supporting functions • STR_TO_MILLIS • CLOCK_MILLIS, ClOCK_STR • DATE_PART_STR, DATE_PART_MILLIS • DATE_DIFF_STR, DATE_DIFF_MILLIS • etc… 3
  • 26. Key: route_10000 { "airline": "AF", "airlineid": "airline_137", "destinationairport": "MRS", "distance": 2881.617376098415, "equipment": "320", "id": 10000, "schedule": [ { "day": 0, "flight": "AF198", "utc": "10:13:00" }, { "day": 0, "flight": "AF547", "utc": "19:14:00" } ], "sourceairport": "TLV", "stops": 0, "type": "route" } Referenced 1:N Relationship Key: airline_137 { "callsign": "AIRFRANS", "country": "France", "iata": "AF", "icao": "AFR", "id": 137, "name": "Air France", "type": "airline" }
  • 27. Joining by Primary Key SELECT route.sourceairport, route.destinationairport, airline.name FROM `travel-sample` AS route INNER JOIN `travel-sample` AS airline ON route.airlineid = META(airline).id WHERE route.type = 'route' ORDER BY route.sourceairport, route.destinationairport, airline.name
  • 28. Key: route_10000 { "airline": "AF", "airlineid": "airline_137", "destinationairport": "MRS", "distance": 2881.617376098415, "equipment": "320", "id": 10000, "schedule": [ { "day": 0, "flight": "AF198", "utc": "10:13:00" }, { "day": 0, "flight": "AF547", "utc": "19:14:00" } ], "sourceairport": "TLV", "stops": 0, "type": "route" } Referenced 1:N Relationship Key: airline_137 { "callsign": "AIRFRANS", "country": "France", "iata": "AF", "icao": "AFR", "id": 137, "name": "Air France", "type": "airline" }
  • 29. Joining by Attributes SELECT route.sourceairport, route.destinationairport, airline.name FROM `travel-sample` AS route INNER JOIN `travel-sample` AS airline ON route.airline = airline.iata AND airline.type = 'airline' WHERE route.type = 'route' ORDER BY route.sourceairport, route.destinationairport, airline.name
  • 30. Key: route_10000 { "airline": "AF", "airlineid": "airline_137", "destinationairport": "MRS", "distance": 2881.617376098415, "equipment": "320", "id": 10000, "schedule": [ { "day": 0, "flight": "AF198", "utc": "10:13:00" }, { "day": 0, "flight": "AF547", "utc": "19:14:00" } ], "sourceairport": "TLV", "stops": 0, "type": "route" } Embedded 1:N Relationship
  • 31. Flattening Embedded Lists SELECT route.sourceairport, route.destinationairport, schedule.utc FROM `travel-sample` AS route UNNEST route.schedule AS schedule WHERE route.type = 'route' AND schedule.day = 0 ORDER BY route.sourceairport, route.destinationairport, schedule.utc
  • 32. My data’s not flat, why are my queries? This Photo by Unknown Author is licensed under CC BY-SA
  • 33. Key: route_50490 { "airline": "SQ", "airlineid": "airline_4435", "destinationairport": "ORD", "distance": 2802.1171926467396, "equipment": "320", "id": 50490, "schedule": [{ "day": 0, "flight": "SQ279", "utc": "15:13:00" }, { "day": 0, "flight": "SQ835", "utc": "21:10:00" }], "sourceairport": "LAX", "stops": 0, "type": "route" } Referenced 1:N Relationship Key: airport_3484 { "airportname": "Los Angeles Intl", "city": "Los Angeles", "country": "United States", "faa": "LAX", "geo": { "alt": 126, "lat": 33.942536, "lon": -118.408075 }, "icao": "KLAX", "id": 3484, "type": "airport", "tz": "America/Los_Angeles" }
  • 34. Nesting (a.k.a. LINQ GroupJoin) SELECT airport.*, (SELECT RAW r2.destinationairport FROM routes AS r2) AS destinations FROM `travel-sample` AS airport INNER NEST `travel-sample` AS routes ON airport.faa = routes.sourceairport AND routes.type = 'route' WHERE airport.type = 'airport' AND airport.airportname LIKE 'Los Angeles%'
  • 35. Nesting (a.k.a. LINQ GroupJoin) { "airportname": "Los Angeles Intl", "city": "Los Angeles", "country": "United States", "destinations": ["PHX", "SEA", "MCO", "ATL", "SYD", "YYZ", "LIM", "LHR", "IND", "CLE", "..."], "faa": "LAX", "geo": { "alt": 126, "lat": 33.942536, "lon": -118.408075 }, "icao": "KLAX", "id": 3484, "type": "airport", "tz": "America/Los_Angeles" }
  • 37. The Primary Index CREATE PRIMARY INDEX ON bucket SELECT * FROM bucket
  • 38. Single Attribute Index CREATE INDEX docsByName ON bucket (name) SELECT * FROM bucket WHERE name LIKE 'A%' SELECT * FROM bucket WHERE name >= 'A' AND name < 'N'
  • 39. Multiple Attribute Index CREATE INDEX docsByNames ON bucket (lastName, firstName) SELECT * FROM bucket WHERE lastName LIKE 'A%' SELECT * FROM bucket WHERE lastName = 'Burnett' AND firstName LIKE ‘B%'
  • 40. Expression Index CREATE INDEX docsByName ON bucket (LOWER(lastName), LOWER(firstName)) SELECT * FROM bucket WHERE LOWER(lastName) LIKE 'a%' SELECT * FROM bucket WHERE LOWER(lastName) = 'burnett' AND LOWER(firstName) LIKE 'b%'
  • 41. Filtered Index CREATE INDEX custsByName ON bucket (LOWER(lastName), LOWER(firstName)) WHERE type = 'customer' SELECT * FROM bucket WHERE LOWER(lastName) LIKE 'a%' AND type = 'customer' SELECT * FROM bucket WHERE LOWER(lastName) = 'burnett' AND LOWER(firstName) LIKE 'b%' AND type = 'customer'
  • 42. Array Index CREATE INDEX custsByNickName ON bucket (DISTINCT ARRAY p FOR p IN nickNames END) WHERE type = 'customer’ SELECT * FROM bucket WHERE ANY p IN nickNames SATISFIES p = 'Buzz' END AND type = 'customer' CREATE INDEX custsByNickName ON bucket (DISTINCT ARRAY LOWER(p) FOR p IN nickNames END) WHERE type = 'customer’ SELECT * FROM bucket WHERE ANY p IN nickNames SATISFIES LOWER(p) = 'buzz' END AND type = 'customer'
  • 43. Index Nodes Node B Index Architecture Data Nodes Node A DCP DCP Index 1 Index 2 Replica Index 3 Index 1 Replica Index 2 Index 4
  • 44. Deferring Index Build CREATE INDEX docsByName ON bucket (name) WITH {"defer_build": true} CREATE INDEX docsByNames ON bucket (lastName, firstName) WITH {"defer_build": true} BUILD INDEX ON bucket (docsByName, docsByNames)
  • 45. Replicated Index CREATE INDEX custsByName ON bucket (LOWER(lastName), LOWER(firstName)) WHERE type = 'customer' WITH {"num_replica": 1} SELECT * FROM bucket WHERE LOWER(lastName) LIKE 'a%’ AND type = 'customer' SELECT * FROM bucket WHERE LOWER(lastName) = 'burnett' AND LOWER(firstName) LIKE 'b%’ AND type = 'customer'
  • 46. Partitioned Index CREATE INDEX custsByName ON bucket (LOWER(lastName), LOWER(firstName)) WHERE type = 'customer' PARTITION BY hash(tenantId) WITH {"num_replica": 1} SELECT * FROM bucket WHERE LOWER(lastName) LIKE 'a%’ AND type = 'customer' SELECT * FROM bucket WHERE LOWER(lastName) = 'burnett' AND tenantId = 123456 AND type = 'customer'
  • 48. Index Selection Criteria • All predicates on the index must be included in the query • The first index expression must be in the predicate • Chooses the index with the most matching expressions • If more than one option, chooses one at random for load balancing • Does not use statistics for optimization (yet…)
  • 49. Query Node Query Process (a simplified subset) Data Nodes Index Node 1. Incoming Query 7. Query Result 2. Query Plan 7. Filter, Sort, Agg, etc
  • 50. Live Demo! This should be interesting… This Photo by Unknown Author is licensed under CC BY-NC-SA
  • 51. Nested Loop vs Hash Join in C# Nested Loop Join IEnumerable<RouteAirlines> Join( IList<Route> routes, IList<Airline> airlines) { foreach (var route in routes) { var routeAirlines = new RouteAirlines { Route = route, Airlines = new List<Airline>() }; foreach (var airline in airlines) { if (airline.Iata == route.Airline) { routeAirlines.Add(airline); } } yield return routeAirlines; } } Hash Join IEnumerable<RouteAirlines> Join( IList<Route> routes, IList<Airline> airlines) { var hashTable = airlines.ToLookup(p => p.Iata); foreach (var route in routes) { var routeAirlines = new RouteAirlines { Route = route, Airlines = hashTable[route.Airline].ToList() }; yield return routeAirlines; } }
  • 52. N1QL Hash Join SELECT route.sourceairport, route.destinationairport, airline.name FROM `travel-sample` AS route INNER JOIN `travel-sample` AS airline USE HASH(build) ON route.airline = airline.iata AND airline.type = 'airline' WHERE route.type = 'route' ORDER BY route.sourceairport, route.destinationairport, airline.name
  • 53. Key Optimization Takeaways Make sure fetch is no larger than necessary 1 Design covering indexes where possible 2 Watch out for pagination 3 Consider USE HASH where applicable 4 Keep joins to a minimum 5
  • 55. Questions? This Photo by Unknown Author is licensed under CC BY-NC-ND

Notes de l'éditeur

  1. Scalability – Multi node, auto-sharded architecture makes it easy to scale out horizontally Availability – Multi node architecture makes high availability easy COUCH = Cluster of Unreliable Commodity Hardware Agility – JSON documents without schema enforcement makes it easy for teams to iterate quickly
  2. Non-first normal form query language
  3. Think millions of documents Schema is not enforced by the DB
  4. Let’s see how to represent customer data in JSON. The primary (CustomerID) becomes the DocumentKey Column name-Column value becomes KEY-VALUE pair.
  5. What if I wanted to filter to airports with an altitude less than 1000? Just use Javascript dot notation to access attributes at any depth You may also use Javascript square bracket array notation to access items in arrays by index
  6. Non-first normal form query language
  7. You can use the LOWER function to avoid case-sensitive collation String concatenation is one difference from SQL, uses double vertical bars. Since we can’t know the type in advance, we need a separate concat operator from the addition operator
  8. Note that array elements don’t necessarily have to be of the same type, though they usually are
  9. Non-first normal form query language
  10. First animation: note that we use an alias on the bucket name. This prevents confusion when we’re getting multiple document types from the same bucket. Second animation: note that we’re using META().id to get the primary key of the document to join Also, note that this syntax is only available in Couchbase Server 5.5
  11. But what if I want to join based on an attribute instead of the primary key?
  12. The type filter on the second extent should be part of the ON clause, not the WHERE clause There must be an index to support looking up the second extent based on these clauses Not as performant as a join based on primary key, which doesn’t need an index at all
  13. Embedding an array inside a document creates an implicit 1:N relationship between the root document and the items in the array. But how do I join across this relationship?
  14. The type filter on the second extent should be part of the ON clause, not the WHERE clause There must be an index to support looking up the second extent based on these clauses Not as performant as a join based on primary key, which doesn’t need an index at all
  15. Note that we want to know all the routes for a set of airports. In traditional SQL, we’d have to flatten the output, repeating the airport data for every matching route.
  16. Nesting is analogous to GroupJoin in LINQ, where all matching documents are returned in an array We’re using an additional subquery on the array in the select projection to reduce the data we’re returning There is also LEFT OUTER NEST
  17. Nesting is analogous to GroupJoin in LINQ, where all matching documents are returned in an array We’re using an additional subquery on the array in the select projection to reduce the data we’re returning There is also LEFT OUTER NEST
  18. Indexes every document in the bucket by the primary key Supports any query, but with poor performance Kind of like a table scan in SQL, except it scans every document in the entire bucket Not recommend for production, except some very specific use cases
  19. Automatically excludes any documents where “name” is MISSING The attribute must be included in the predicate for the index to be used, just like a SQL index
  20. Automatically excludes any documents where “lastName” or “firstName” is MISSING At least the first attribute must be included in the predicate for the index to be used, just like a SQL index The second attribute will be used if possible, and so on for multiple attributes
  21. Can use any deterministic function to adjust attributes before they are indexed Predicates must use the same function to match the index Still excludes MISSING lastName and firstName, since LOWER(MISSING) = MISSING
  22. Can include any quantity of deterministic predicates Requires that queries must include the same predicates (all of them!) in order to match the index Because query planning occurs before parameter substitution, the type = ‘customer’ clause cannot be parameterized
  23. Any expression in the index definition can be an ARRAY clause, though only one array is allowed Includes all values in the array, so long arrays can significantly increase index size Animation: You can also use functions as part of the array clause
  24. DCP streams mutations (inserts, updates, and deletes) to the index nodes Streaming is async, thus indexes are eventually consistent High availability and load balancing is provided by having replicas on more than one node, each replica is a full copy of the index Any given query only accesses one copy of the index on one node, avoiding scatter/gather for low latency
  25. When building an index, streams the entire bucket from the data nodes to the index node Only one index build can be running at a time By building more than one index with a single BUILD command, we can share the stream
  26. Creates two complete copies of the index, on two different index nodes Provides HA and load balancing
  27. New in 5.5, only available in enterprise edition Spreads the index across all nodes in the cluster (optionally a subset of nodes), deciding which node receives which part of the index based on a deterministic hash of the referenced attribute Good for particularly large indexes, as index can now scale horizontally Creates a scatter/gather situation which can increase latency, but that is eliminated if you include a equality predicate for the hashed attribute so it can go to just one node
  28. Note that if the index contains all data needed by the query, it will “cover” the query, meaning steps 4 and 5 are skipped Key to optimizing this process is to reduce waste in step 4, avoid having documents returned from the index that are then thrown out by Step 7
  29. Oversimplification, but delivers the concept Which one of these do you think is most efficient? Depends on the relative sizes of the two lists Short first list, isn’t worth the time to build the hash table
  30. Normal attribute join uses an inner loop, which is inefficient if the left hand extent has lots of data and the right hand side is small repeating set Hash join is an optimization automatically selected by RDBMS implementations, but must be manually chosen in N1QL Builds a hash table of all possible matches on the right hand extent, and uses the hash table when processing the left hand extent Use “probe” instead of “build” to build the hash table on the left side instead of the right (should be the smaller set) Only available on 5.5 Enterprise Edition (free for dev, but costs for production, includes support)