1. Building a web scale
architecture
Kaushik Paranjape
kaushik@sokrati.com
CTO,
Sokrati
2. Put your thinking caps on!
• Lets design an e-com web-site which should
• capture all user interactions (every event)
• should be able to run analytics and come up
with good recommendations
• have a stock ticker for the owner to monitor
performance across categories
3. This is what it looks like!
Website with recommendations
category wise conversions
ticker for the owner
5. Correct way?
• Yes as-long-as-it-works
• build simple solutions with lesser time to market
• Don’t run blind
6. Problems?
• Which queuing system to choose?
• How do I handle the load?
• How do I provide real time insights?
• How real time is data fetcher?
• When am I doomed?
11. Scaling up service layer
• Load balancing + auto scaling
• stateless services - easier to scale
12. Scaling up app layer
• Distributed scheduler
• Map-Reduce jobs
• Storm
• Spark
• Kafka + storm for stream processing
• SQS
13. #mychoice?
• HBase, Mongo, neo4j are cool
• operational maturity
• expertise / skills
• MySQL / PostgreSQL
• Every computer engineer would have learnt this in college
• Start with a simple solution, capture right signals, know
when to scale
14. Signals to capture
• Disk usage
• RAM usage
• size of indexes
• Disk / RAM ratio
• Slow logs
• Table crashes
• Box crashes
• Number of queries
• Locks? Lock wait timeouts?
15. Scaling up database layer
• Probably the hardest
• Inherently stateful!
• Replication is a must
• Large data-sets! - GBs, TBs, PBs - keeps growing
• fault tolerance harder
• “last mile” of complete web-stack scalability
16. Challenges for high volume
MySQL
• Indexes don’t fit in memory any more!
• schema changes are harder / impossible
• frequent table crashes
• Reliable backup-restore
• locking issues
17. Sharding
• Scale out
• MySQL clustering
DB
Service
Routing
Table
DatabaseDatabaseDatabase
18. Helps?
• Small databases are fast
• Bigger ones are slower
• keep them small and reap the benefits
• Run queries using parallel processing and collate the results
• Keep collecting stats!
• Re-shard when needed
• replication lag can result in lost transactions
19. #NoSQL
• Johan Oskarsson
• In-Memory database
• eventual consistency
• no transactional support
• Typical NoSQL DBs
• Document databases
• key/value store
• Hybrid
• graph databases
• columnar databases
20. Criteria for choosing a DB
• ACID Properties
• Join support?
• Performance (inserts, updates, queries, deletes)
• Machine requirements -> TCO
• Community edition / enterprise edition / community support
• Schemaless?
• scalable?
• write-to-master-read-from-slave
• Always consistent / eventual consistency
• Business problem being solved
26. Problem statement
• Lets design an e-com web-site which should
• capture all user interactions (every event)
• should be able to run analytics and come up
with good recommendations
• have a stock ticker for the owner to monitor
performance across categories
30. DB As A Service
• We decided to build our DB warehouse as a service
• for it makes developers life easier
• for it makes schema modifications seamless
• for it makes database choice more flexible
• for it lets app teams focus exclusively on business
logic
• One service to rule all data :-)
31. Take-aways
• All the databases are here to stay
• Your solution will have a combination of databases
• Choose the right one for your problem
• Business needs drive selection
• collect every stat, monitor every event!
• Be prepared for a failure