Scaling up with Aerospike!

•

0 j'aime•614 vues

Anshu Prateek

LSPE presentation on Jun 14, 2014.

Logiciels

Scaling up (and easing) operations at 1 Million
TPS @ <1 ms latency.
LSPE, Jun 14, 2014

Agenda of this talk
●
Some types of Big Data?
●
What are the problems that come with scale?
●
What is the solution? (Or how Aerospike tackle
these problem and how is Aerospike the
solution for the above problems).

●
Anshu Prateek
●
Aerospike Devops Lead
●
Ex - Yahoo! Search Operations
●
http://about.me/anshuprateek
●
anshu@aerospike.com

Big Data Type
●
Volume – Hadoop – PB / Hrs of jobs
●
Variety – ETL – Many data sources, mashup,
analyze
●
Velocity – Do it fast, do it now!
→ Volume and Variety need Velocity to be useful.

What starts failing at scale?
●
Machines / hardware
●
Network
●
Unplanned load
●
Operator error

Big Data..
●
Volume – Hadoop – PB / Hrs of jobs
●
Variety – ETL – Many data sources, mashup,
analyze
●
Velocity – Do it fast, do it now!
→ Volume and Variety need Velocity to be useful.

Velocity in Aerospike
●
Latency
Page SLA 700ms , Ads SLA 50 ms
→Data store <5ms
– Hybrid DRAM + SSD optimized storage
●
Throughput
– Horizontal scalability (Linear is desirable)

Prod example:
●
20 Nodes
●
1.6TB per node
●
50GB DRAM usage
●
14 Billion objects
●
70k TPS (r+w) per node peak

Start scaling with Aerospike..
●
Machines / hardware
– Replication / auto-balancing
●
Network
– Availability of islands
– Auto balancing with eventual consistency
●
Unplanned load
– Have lot of headroom
●
Operator error
– What if the system reduces operational needs
– Tools

Operational Ease
●
Reducing initial setup time
– Auto sharding
– Auto cluster discovery
●
Configuration
– People don't read documents
●
RTFM!
– Good default value
– retain the power to control when needed
●
Static configs
●
Dynamic configs

Tools
●
Do all nodes have same config?
– asmonitor -e 'compareconfig'
●
Whats the cluster status?
– asmonitor -e 'info'
●
Oops, this needs to be changed!
– asinfo -v 'set-
config:context=service;letschangethis=value'

●
330 GCE
●
300 x 1TB
●
Debian, Cassandra 2.2
●
Median Latency – 10.3 ms
●
95% < 23 ms

Recommandé

Presto Apache BigData 2017Zhenxiao Luo

Presto @ Uber Hadoop summit2017Zhenxiao Luo

Presto GeoSpatial @ Strata New York 2017Zhenxiao Luo

SOLR Power FTW: short versionAlex Pinkin

Austin bdug 2011_01_27_small_and_big_dataAlex Pinkin

Real time analytics at uber @ strata data 2019Zhenxiao Luo

Presto@UberZhenxiao Luo

No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...Brian Brazil

Recommandé

Presto Apache BigData 2017Zhenxiao Luo

Presto @ Uber Hadoop summit2017Zhenxiao Luo

Presto GeoSpatial @ Strata New York 2017Zhenxiao Luo

SOLR Power FTW: short versionAlex Pinkin

Austin bdug 2011_01_27_small_and_big_dataAlex Pinkin

Real time analytics at uber @ strata data 2019Zhenxiao Luo

Presto@UberZhenxiao Luo

No C-QL (Or how I learned to stop worrying, and love eventual consistency) (N...Brian Brazil

HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014Modern Data Stack France

Islamabad PUG - 7th meetup - performance tuningUmair Shahid

Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...ScyllaDB

Presto in my_use_case2wyukawa

Prometheus lightning talk (Devops Dublin March 2015)Brian Brazil

Avoiding Data Hotspots at ScaleScyllaDB

Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScyllaDB

Comparing pregel related systemsPrashant Raaghav

Scylla Summit 2018: OLAP or OLTP? Why Not Both?ScyllaDB

Speed Up Uber's Presto with AlluxioAlluxio, Inc.

The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightDatabricks

Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev

Sharding - patterns & antipatterns, Константин Осипов, Алексей РыбакOntico

presto-at-netflix-hadoop-summit-15Zhenxiao Luo

Event Pipe - Lambda ArchitectureBahadir Cambel

Data Lessons Learned at ScaleCharlie Reverte

Monitoring with Clickhouseunicast

Stream processing with Apache Flink @ OfferUpBowen Li

Webinar: Capacity PlanningMongoDB

OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS

20 Shades of BlueAnshu Prateek

what/why/how of IPv6 || 2002:3239:43c3::1Anshu Prateek

Contenu connexe

Tendances

HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014Modern Data Stack France

Islamabad PUG - 7th meetup - performance tuningUmair Shahid

Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...ScyllaDB

Presto in my_use_case2wyukawa

Prometheus lightning talk (Devops Dublin March 2015)Brian Brazil

Avoiding Data Hotspots at ScaleScyllaDB

Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScyllaDB

Comparing pregel related systemsPrashant Raaghav

Scylla Summit 2018: OLAP or OLTP? Why Not Both?ScyllaDB

Speed Up Uber's Presto with AlluxioAlluxio, Inc.

The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightDatabricks

Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev

Sharding - patterns & antipatterns, Константин Осипов, Алексей РыбакOntico

presto-at-netflix-hadoop-summit-15Zhenxiao Luo

Event Pipe - Lambda ArchitectureBahadir Cambel

Data Lessons Learned at ScaleCharlie Reverte

Monitoring with Clickhouseunicast

Stream processing with Apache Flink @ OfferUpBowen Li

Webinar: Capacity PlanningMongoDB

OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS

Tendances (20)

HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014

Islamabad PUG - 7th meetup - performance tuning

Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...

Presto in my_use_case2

Prometheus lightning talk (Devops Dublin March 2015)

Avoiding Data Hotspots at Scale

Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB

Comparing pregel related systems

Scylla Summit 2018: OLAP or OLTP? Why Not Both?

Speed Up Uber's Presto with Alluxio

The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight

Thorny path to the Large-Scale Graph Processing (Highload++, 2014)

Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак

presto-at-netflix-hadoop-summit-15

Event Pipe - Lambda Architecture

Data Lessons Learned at Scale

Monitoring with Clickhouse

Stream processing with Apache Flink @ OfferUp

Webinar: Capacity Planning

OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...

En vedette

20 Shades of BlueAnshu Prateek

what/why/how of IPv6 || 2002:3239:43c3::1Anshu Prateek

Openstack - getting it all up magically - and when the magic fails.Anshu Prateek

Drive Revenue and Loyalty by Engaging Mobile and Social ConsumersPhil Hendrix

Yql hacku iitd_2012Anshu Prateek

M-Commerce - Social-Loco Slides - Dr. Phil Hendrix, immrPhil Hendrix

Hacking up location aware appsAnshu Prateek

Tablet Market Outlook - April 2011 - Dr. Phil Hendrix, immrPhil Hendrix

AS Week 6 Observational ResearchJamie Davies

CloudFront DESIGN PATTERNSAbhishek Tiwari

Migrating enterprise workloads to AWSTom Laszewski

Maximizing EC2 and Elastic Block Store Disk PerformanceAmazon Web Services

Bcache and AerospikeAnshu Prateek

WSO2Con ASIA 2016: WSO2 IoT Server: Your Foundation for the Internet of ThingsWSO2

WSO2Con EU 2016: WSO2 IoT Server: Your Foundation for the Internet of ThingsWSO2

From Push Technology to Real-Time Messaging and WebSocketsAlessandro Alinone

En vedette (16)

20 Shades of Blue

what/why/how of IPv6 || 2002:3239:43c3::1

Openstack - getting it all up magically - and when the magic fails.

Drive Revenue and Loyalty by Engaging Mobile and Social Consumers

Yql hacku iitd_2012

M-Commerce - Social-Loco Slides - Dr. Phil Hendrix, immr

Hacking up location aware apps

Tablet Market Outlook - April 2011 - Dr. Phil Hendrix, immr

AS Week 6 Observational Research

CloudFront DESIGN PATTERNS

Migrating enterprise workloads to AWS

Maximizing EC2 and Elastic Block Store Disk Performance

Bcache and Aerospike

WSO2Con ASIA 2016: WSO2 IoT Server: Your Foundation for the Internet of Things

WSO2Con EU 2016: WSO2 IoT Server: Your Foundation for the Internet of Things

From Push Technology to Real-Time Messaging and WebSockets

Similaire à Scaling up with Aerospike!

Machine learning and big data @ uber a tale of two systemsZhenxiao Luo

BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal

Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA

Shootout at the PAAS CorralPostgreSQL Experts, Inc.

Benchmarks, performance, scalability, and capacity what's behind the numbersJustin Dorfman

Benchmarks, performance, scalability, and capacity what s behind the numbers...james tong

Capacity PlanningMongoDB

MongoDB Capacity PlanningNorberto Leite

Performance Whackamole (short version)PostgreSQL Experts, Inc.

Impala presentation ahad ranaData Con LA

Performance Whack-a-Mole Tutorial (pgCon 2009) PostgreSQL Experts, Inc.

Cloud arch patternsCorey Huinker

Hadoop Architecture_Cluster_Cap_PlanNarayana B

Spark autotuning talk finalRachel Warren

Capacity PlanningMongoDB

Hadoop at datasiftJairam Chandar

How Many Slaves (Ukoug)Doug Burns

Couchbase live 2016Pierre Mavro

AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty

Capacityplanning Paulo Fagundes

Similaire à Scaling up with Aerospike! (20)

Machine learning and big data @ uber a tale of two systems

BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...

Explore big data at speed of thought with Spark 2.0 and Snappydata

Shootout at the PAAS Corral

Benchmarks, performance, scalability, and capacity what's behind the numbers

Benchmarks, performance, scalability, and capacity what s behind the numbers...

Capacity Planning

MongoDB Capacity Planning

Performance Whackamole (short version)

Impala presentation ahad rana

Performance Whack-a-Mole Tutorial (pgCon 2009)

Cloud arch patterns

Hadoop Architecture_Cluster_Cap_Plan

Spark autotuning talk final

Capacity Planning

Hadoop at datasift

How Many Slaves (Ukoug)

Couchbase live 2016

AWS Big Data Demystified #1: Big data architecture lessons learned

Capacityplanning

Dernier

A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska

Introduction Computer Science - Software Design.pdfFerryKemperman

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea

Advantages of Odoo ERP 17 for Your BusinessEnvertis Software Solutions

Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López

Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran

Cyber security and its impact on E commercemanigoyal112

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl

Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko

Recruitment Management Software Benefits (Infographic)Hr365.us smith

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky

VK Business Profile - provides IT solutions and Web Developmentvyaparkranti

Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater

Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz

Dernier (20)

A healthy diet for your Java application Devoxx France.pdf

Introduction Computer Science - Software Design.pdf

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样

Advantages of Odoo ERP 17 for Your Business

Software Project Health Check: Best Practices and Techniques for Your Product...

Implementing Zero Trust strategy with Azure

Unveiling Design Patterns: A Visual Guide with UML Diagrams

Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...

Intelligent Home Wi-Fi Solutions | ThinkPalm

Cyber security and its impact on E commerce

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany

Cloud Data Center Network Construction - IEEE

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf

Recruitment Management Software Benefits (Infographic)

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...

VK Business Profile - provides IT solutions and Web Development

Ahmed Motair CV April 2024 (Senior SW Developer)

Folding Cheat Sheet #4 - fourth in a series

Scaling up with Aerospike!

1. Scaling up (and easing) operations at 1 Million TPS @ <1 ms latency. LSPE, Jun 14, 2014

2. Agenda of this talk ● Some types of Big Data? ● What are the problems that come with scale? ● What is the solution? (Or how Aerospike tackle these problem and how is Aerospike the solution for the above problems).

3. ● Anshu Prateek ● Aerospike Devops Lead ● Ex - Yahoo! Search Operations ● http://about.me/anshuprateek ● anshu@aerospike.com

4. Big Data Type ● Volume – Hadoop – PB / Hrs of jobs ● Variety – ETL – Many data sources, mashup, analyze ● Velocity – Do it fast, do it now! → Volume and Variety need Velocity to be useful.

5. What starts failing at scale? ● Machines / hardware ● Network ● Unplanned load ● Operator error

6. Big Data.. ● Volume – Hadoop – PB / Hrs of jobs ● Variety – ETL – Many data sources, mashup, analyze ● Velocity – Do it fast, do it now! → Volume and Variety need Velocity to be useful.

7. Velocity in Aerospike ● Latency Page SLA 700ms , Ads SLA 50 ms →Data store <5ms – Hybrid DRAM + SSD optimized storage ● Throughput – Horizontal scalability (Linear is desirable)

8. Prod example: ● 20 Nodes ● 1.6TB per node ● 50GB DRAM usage ● 14 Billion objects ● 70k TPS (r+w) per node peak

10. ● 98% of queries < 1ms ●

11. Yet another prod graph...

12. What starts failing at scale? ● Machines / hardware ● Network ● Unplanned load ● Operator error

13. Start scaling with Aerospike.. ● Machines / hardware – Replication / auto-balancing ● Network – Availability of islands – Auto balancing with eventual consistency ● Unplanned load – Have lot of headroom ● Operator error – What if the system reduces operational needs – Tools

14. Operational Ease ● Reducing initial setup time – Auto sharding – Auto cluster discovery ● Configuration – People don't read documents ● RTFM! – Good default value – retain the power to control when needed ● Static configs ● Dynamic configs

15. Tools ● Do all nodes have same config? – asmonitor -e 'compareconfig' ● Whats the cluster status? – asmonitor -e 'info' ● Oops, this needs to be changed! – asinfo -v 'set- config:context=service;letschangethis=value'

16. Tools ● Nagios ● Graphite ● AMC

17. Capacity Planning

18. Managing with AMC

19. Managing with AMC

20. Managing with AMC

21. Headroom! ● How many TPS can we do ?

22.

23. ● 330 GCE ● 300 x 1TB ● Debian, Cassandra 2.2 ● Median Latency – 10.3 ms ● 95% < 23 ms

24. Aerospike