1. Scaling up (and easing) operations at 1 Million
TPS @ <1 ms latency.
LSPE, Jun 14, 2014
2. Agenda of this talk
●
Some types of Big Data?
●
What are the problems that come with scale?
●
What is the solution? (Or how Aerospike tackle
these problem and how is Aerospike the
solution for the above problems).
4. Big Data Type
●
Volume – Hadoop – PB / Hrs of jobs
●
Variety – ETL – Many data sources, mashup,
analyze
●
Velocity – Do it fast, do it now!
→ Volume and Variety need Velocity to be useful.
5. What starts failing at scale?
●
Machines / hardware
●
Network
●
Unplanned load
●
Operator error
6. Big Data..
●
Volume – Hadoop – PB / Hrs of jobs
●
Variety – ETL – Many data sources, mashup,
analyze
●
Velocity – Do it fast, do it now!
→ Volume and Variety need Velocity to be useful.
7. Velocity in Aerospike
●
Latency
Page SLA 700ms , Ads SLA 50 ms
→Data store <5ms
– Hybrid DRAM + SSD optimized storage
●
Throughput
– Horizontal scalability (Linear is desirable)
12. What starts failing at scale?
●
Machines / hardware
●
Network
●
Unplanned load
●
Operator error
13. Start scaling with Aerospike..
●
Machines / hardware
– Replication / auto-balancing
●
Network
– Availability of islands
– Auto balancing with eventual consistency
●
Unplanned load
– Have lot of headroom
●
Operator error
– What if the system reduces operational needs
– Tools
14. Operational Ease
●
Reducing initial setup time
– Auto sharding
– Auto cluster discovery
●
Configuration
– People don't read documents
●
RTFM!
– Good default value
– retain the power to control when needed
●
Static configs
●
Dynamic configs
15. Tools
●
Do all nodes have same config?
– asmonitor -e 'compareconfig'
●
Whats the cluster status?
– asmonitor -e 'info'
●
Oops, this needs to be changed!
– asinfo -v 'set-
config:context=service;letschangethis=value'