Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
No sql findings
1. NoSQL Findings
Christian van der Leeden
Thursday, September 23, 2010
2. Our problem
• Growth is not linear and not predictable
• e.g. History::Session table now > 30 Mio entries
• Activities > 26 Mio entries
• Postgres will be the performance bottleneck
Thursday, September 23, 2010
3. Criteria
• Allow us to scale from 100k Daily Active Users (DAU)
to 1 Mio DAU up to 10Mio DAU
• Scale horizontally (“Just add servers”)
• Good ruby performance
• Good transition from Rails/Postgres -> Rails/NoSQL
• Actively developed
Thursday, September 23, 2010
4. Goal
• Scores (@ 10 Mio Daily Active Users)
• 10 Mio Scores/day == 350 inserts/second
• around same read rate for Leaderboards
• Game with 10 Mio Players
• Leaderboard with 10 Mio entries
• Session (@ 10 Mio DAU)
• > 10 Mio session handshakes/day
Thursday, September 23, 2010
5. Data Patterns
• Most data is accessed time based (the most recent
data is accessed the most often)
• Write-Read rate is around the same
• Eventually consistency is good enough most of the
time
Thursday, September 23, 2010
6. Rating criteria
• Type (Document Store, Key/Value Store, Big Table)
• Deployment
• How easy is it to scale?
• Existing installations
• How big are known installations?
• Heritage and activity
• Where does the solution come from and how actively is it
developed by whom?
Thursday, September 23, 2010
8. MongoDB
• document store
• “SQL DB” without relations
• easy transition with MongoMapper, Mongoid
• supports sharding over replication sets (since August
2010)
• Haven’t found a big shareded server installation
Thursday, September 23, 2010
9. Experience with Mongo
• nice/easy to program with
• deployment woes we’ve encountered (1.6.0)
• segmentation fault
• cannot read beacuse: invalid BSON object
• when index is > RAM performance degradation (from
20ms to 200 ms for queries)
• Global write lock makes data migrations slow
Thursday, September 23, 2010
10. Cassandra
• Big Table data store
• Was developed by Facebook and is actively maintained
• Easy to add servers and to setup (peer to peer concept)
• Thrift API to Ruby was slow in tests (Our tests: around 150 write
ops/second)
• Avro API promises to be faster (will be an option in 0.7)
• Used by Facebook
• Not using it because it is too slow with ruby
Thursday, September 23, 2010
11. Redis
• Memcache with simple persistence
• Supports many different data types and atomic
operations on them
• Sharding is done client side (difficult to add new
servers)
• We’re using it for indexes on SQL data
• Very fast (Our tests: 4000 write operations/second)
Thursday, September 23, 2010
12. HBase
• Big Table Database
• Complex to setup and to maintain
• Very often used for Analytics Jobs with Hadoop/HIVE
e.g as Amazon EC2 Elastic Map Reduce
• For Analytics also look at Scribe for data collection
Thursday, September 23, 2010
13. Membase
• Key-Value Store
• Distributed, persistent Memcache
• Easy to add nodes
• Used by Zynga
Thursday, September 23, 2010
14. Example Leaderboards
• User has many scores
• Each score has one result (integer)
• Game has many scores
• Query: the leaderboard for one game
• Insert one score into the leaderboard
• What is my rank?
• Give me 10 scores starting at position 100,000
Thursday, September 23, 2010
15. SQL vs NoSQL
• Think about Data • Think about Queries
• Redundancy is bad • Redundancy is ok
• Indexes are managed by • Roll your own indexes
the DB depending on queries
• Query over relations • No Joins and connecting
entities
• Always exact results
• Query results don’t have to
return latest write
operation
Thursday, September 23, 2010
16. SQL vs NoSQL
• standardized query • some solutions share
language and DDL standards
• All DBs are “the • Many different
same” approaches
• Document store
• Big Table
• Key Value
Thursday, September 23, 2010
17. Postgres
1 n n 1
User Score Game
• Create new score:
Score.new(attributes)
Score.save => insert into scores;
• What is my rank?
select count(*) from scores inner join games on (games.id =
scores.game_id)
where result > #{my_score.result} and games.name = #{game_name}
order by result desc
• Give me 10 scores in leaderboard from position 100000
select * from scores inner join games on (games.id = scores.game_id)
order by result desc
offset 100000 limit 10;
Thursday, September 23, 2010
18. Redis
SortedSet
• New Score
key: game_name
score: result
value: score_id
redis.zadd(“Jewels”,
key: "Jewels"
result, score_id)
100 99 96
<2563> <96877> <6752>
... • My Rank?
key: "Bug Landing" redis.zrevrank("Jewels",
key: "Toss It" result)
...
• 10 scores from position 100000
KeyValue Store
key: score_id
redis.zrevrange(“Jewels”,
value: marshalled score object
100000, 10)
2563: { result : 100, user_id : 52345, game_id: 57142 }
96877: { result : 99, user_id : 2541, game_id: 57142 }
9752: { result : 96, user_id : 3652, game_id: 57142 }
Thursday, September 23, 2010
19. Mongo
Collection
key: Scores
{ _id: 2563, result : 100, user_id : 52345, game_id: 57142 }
{ _id: 96877, result : 99, user_id : 2541, game_id: 57142 }
{ _id: 6752, result : 96, user_id : 3652, game_id: 57142 }
• New Score
Score.create!(attributes)
db.scores.insert( { result: 100, user_id: 52345,
game_id: 57142 } )
• What is my rank?
db.scores.count( { result: { $gt: #{my_score.result} }})
• 10 scores from position 100000
db.scores.find({}).sort({ result: -1 }).skip
(100000).limit(10)
Thursday, September 23, 2010
21. ColumFamily: Leaderboards
row_key: game_name
Cassandra row_key: "Jewels"
100: 2563 99: 96877
row_key: "Bug Landing"
96: 6752
row_key: "Toss It"
• Insert new score: ...
client.insert(“ScoreList”, “Jewels”, result => id)
client.insert(id, :result => result, :user_id =>
user_id, :game_id => game_id)
• What is my rank?
=> not easy, need help from other tools
• Give me the next 10 scores starting at score X
client.get(“ScoreList”, “Jewels”, :start =>
X.result, count => 10)
Thursday, September 23, 2010
22. Findings
• Use and test the tools you want to use on the scale
you are going to use them
• There is no “Best NoSQL” solution
• Mix and match the tools you need
• NoSQL requires a lot of rethinking and change in
your Ruby Code.
Thursday, September 23, 2010