More Related Content Similar to The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 2012 (20) More from Big Data Spain (20) The top five questions to ask about NoSQL. JONATHAN ELLIS at Big Data Spain 20122. how do I
my application?
model
©2012 DataStax
3. Popular options
• Key/value
• Tabular
• Document
• Graph?
©2012 DataStax
4. Schema is your friend
{
"id": "e451dd42-ece3-11e1-a0a3-34159e154f4c",
"name": "jbellis",
"state": "TX",
"birthdate": "1/1/1976",
"email_addresses": ["jbellis@gmail", "jbellis@datastax.com"],
}
©2012 DataStax
5. SQL can be your friend too
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date date
);
CREATE INDEX ON users(state);
SELECT * FROM users
WHERE state=‘Texas’ AND birth_date > ‘1950-01-01’;
©2012 DataStax
6. Collections
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date date
);
CREATE TABLE users_addresses (
user_id uuid REFERENCES users,
email text
);
SELECT *
FROM users NATURAL JOIN users_addresses;
©2012 DataStax
7. Collections
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
X
birth_date date
);
CREATE TABLE users_addresses (
user_id uuid REFERENCES users,
email text
);
SELECT *
FROM users NATURAL JOIN users_addresses;
©2012 DataStax
8. Collections
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date date,
email_addresses set<text>
);
UPDATE users
SET email_addresses = email_addresses + {‘jbellis@gmail.com’,
‘jbellis@datastax.com’};
©2012 DataStax
9. Joins don’t scale
• No joins
• No subqueries
• No aggregation functions* or GROUP BY
• ORDER BY?
©2012 DataStax
10. SELECT * FROM tweets
WHERE user_id IN (SELECT follower FROM followers
WHERE user_id = ’driftx’)
followers
?
©2012 DataStax
tweets
11. Clustering in Cassandra
CREATE TABLE timeline ( user_id tweet_id _author _body
user_id uuid,
tweet_id timeuuid, jbellis 3290f9da.. rbranson lorem
tweet_author uuid, jbellis 3895411a.. tjake ipsum
tweet_body text, ... ... ...
PRIMARY KEY (user_id,
tweet_id) driftx 3290f9da.. rbranson lorem
);
driftx 71b46a84.. yzhang dolor
... ... ...
yukim 3290f9da.. rbranson lorem
yukim e451dd42.. tjake amet
... ... ...
©2012 DataStax
12. Clustering in Cassandra
CREATE TABLE timeline ( user_id tweet_id _author _body
user_id uuid,
tweet_id timeuuid, jbellis 3290f9da.. rbranson lorem
tweet_author uuid, jbellis 3895411a.. tjake ipsum
tweet_body text, ... ... ...
PRIMARY KEY (user_id,
tweet_id) driftx 3290f9da.. rbranson lorem
);
driftx 71b46a84.. yzhang dolor
... ... ...
SELECT * FROM timeline
WHERE user_id = ’driftx’; yukim 3290f9da.. rbranson lorem
yukim e451dd42.. tjake amet
... ... ...
©2012 DataStax
17. UPDATE users
SET email_addresses = email_addresses + {...}
WHERE user_id = ‘jbellis’;
©2012 DataStax
20. write( k1 , c1:v1 )
Memory
k1 c1:v1
Memtable
k1 c1:v1
Commit log
©2012 DataStax Hard drive
21. write( k1 , c2:v2 )
Memory
k1 c1:v1 c2:v2
k1 c1:v1
k1 c2:v2
©2012 DataStax Hard drive
22. write( k2 , c1:v1 c2:v2 )
Memory
k1 c1:v1 c2:v2
k2 c1:v1 c2:v2
k1 c1:v1
k1 c2:v2
k2 c1:v1 c2:v2
©2012 DataStax Hard drive
23. write( k1 , c1:v4 c3:v3 )
Memory
k1 c1:v4 c2:v2 c3:v3
k2 c1:v1 c2:v2
k1 c1:v1
k1 c2:v2
k2 c1:v1 c2:v2
k1 c1:v4 c3:v3
©2012 DataStax Hard drive
24. Memory
flush
index / BF
cleanup k1 c1:v4 c2:v2 c3:v3
k2 c1:v1 c2:v2
SSTable
©2012 DataStax Hard drive
30. Availability
• “High availability implies that a single fault will
not bring down your system. Not ‘we’ll recover
quickly.’”
-- Ben Coverston: DataStax
• “The biggest problem with failover is that you're
almost never using it until it really hurts. It's like
backups that you never test.”
-- Rick Branson: Instagram
©2012 DataStax
34. Self-healing
request 1
Client Coordinator internal
request 2
response 4
internal
response 3
Replica
©2012 DataStax
35. Self-healing
request 1
Client Coordinator internal
request 2
response 4
internal
response 3
Replica
©2012 DataStax
36. Self-healing
request 1
Client Coordinator internal
request 2
timeout
response 4
Replica
©2012 DataStax replica fails
37. Self-healing
request 1
Client Coordinator internal
request 2
X
timeout
response 4
Replica
©2012 DataStax replica fails
38. Self-healing
request 1
Client Coordinator internal
request 2
timeout
response 4
hint 3
Replica
©2012 DataStax replica fails
39. Self-healing
request 1
Client Coordinator internal
request 2
X
timeout
response 4
hint 3
Replica
©2012 DataStax replica fails
44. Scaling antipatterns
• Metadata servers
• Router bottlenecks
• Overloading existing nodes when adding
capacity
©2012 DataStax
47. Data model: Realtime
LiveStocks stock last
GOOG $95.52
AAPL $186.10
AMZN $112.98
Portfolios user stock shares
jbellis GOOG 80
jbellis LNKD 20
yukim AMZN 100
StockHist stock date price
GOOG 2011-01-01 $8.23
GOOG 2011-01-02 $6.14
GOOG 2011-001-03 $7.78
©2012 DataStax
48. Data model: Analytics
HistLoss worst_date loss
Portfolio1 2011-07-23 -$34.81
Portfolio2 2011-03-11 -$11432.24
Portfolio3 2011-05-21 -$1476.93
©2012 DataStax
49. Data model: Analytics
10dayreturns
stock rdate return
GOOG 2011-07-25 $8.23
GOOG 2011-07-24 $6.14
GOOG 2011-07-23 $7.78
AAPL 2011-07-25 $15.32
AAPL 2011-07-24 $12.68
INSERT OVERWRITE TABLE 10dayreturns
SELECT a.stock,
b.date as rdate,
b.price - a.price
FROM StockHist a
JOIN StockHist b
ON (a.stock = b.stock
AND date_add(a.date, 10) = b.date);
©2012 DataStax
50. Data model: Analytics
portfolio_returns
portfolio rdate preturn
Portfolio1 2011-07-25 $118.21
Portfolio1 2011-07-24 $60.78
Portfolio1 2011-07-23 -$34.81
Portfolio2 2011-07-25 $2143.92
Portfolio3 2011-07-24 -$10.19
INSERT OVERWRITE TABLE portfolio_returns
SELECT portfolio,
rdate,
SUM(b.return)
FROM portfolios a JOIN 10dayreturns b
ON (a.stock = b.stock)
GROUP BY portfolio, rdate;
©2012 DataStax
51. Data model: Analytics
HistLoss
worst_date loss
Portfolio1 2011-07-23 -$34.81
Portfolio2 2011-03-11 -$11432.24
Portfolio3 2011-05-21 -$1476.93
INSERT OVERWRITE TABLE HistLoss
SELECT a.portfolio, rdate, minp
FROM (
SELECT portfolio, min(preturn) as minp
FROM portfolio_returns
GROUP BY portfolio
) a
JOIN portfolio_returns b
ON (a.portfolio = b.portfolio and a.minp = b.preturn);
©2012 DataStax
54. Questions?
Image credits
• http://www.flickr.com/photos/26817893@N05/2573006312/
• http://www.flickr.com/photos/rowanbank/7686239548
• http://www.flickr.com/photos/mervtheswerve/6081933265
• http://www.flickr.com/photos/dg_pics/2526208830
• http://www.flickr.com/photos/wainwright/351684037
• http://www.flickr.com/photos/mikeneilson/1606662529
• http://www.flickr.com/photos/sbisson/3852905534
• http://www.flickr.com/photos/breadnbadger/2674928517