SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Eric Lubow
@elubow
elubow@simplereach.com
Big
Architectures
for Big Data
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Overvie
• SimpleReach
• Goals
• Tools
• Architecture Implementation
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
The 2 Truths
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Even with the right tools, 80% of
the work of building a big data
system is acquiring and refining
The Real Truth
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
• Millions of URLs per day
• Over 1.25 billion page views per month
• 500m events per day (~6k events/second)
• Auto-scale 125-160 machines depending on traffic
SimpleReach
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
And It Goes Like This...
C*
Vertic
a
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Goals• Consistent non-data storage layer access patterns
• Data accuracy across storage engines
• Minimize downtime/Minimize cost of downtime
• High availability
• Allow access to many toolsets (for all languages, DBs,
Engines)
• Clients should have minimal architecture knowledge
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Consistent Access Patterns
realtime_scor
e
(‘score’,
‘realtime’)
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Authentication, Tracking,
Per service
access keys
Track call
volume by
access key
Prevent
internal
denial of
service
Monitor
availability and
performance
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Controlled Data Flow
Social
Event
Collector
Social
Data
Batch & Write
Processed
Data
Batch & Write
Raw Data
Calculate
Score
Write
NSQ Multicast NSQ NSQ
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
NSQ by Bit.ly• Distributed and de-centralized topology
• At least once delivery guaranteed
• Multicast style message routing
• Runtime discovery for consumers to find
producers
• Allow for maintenance windows with no
downtime
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Path of a Packet
Internet
EC
InternalAPI
Solr
C*
Mong
Redis
Vertic
API
Fire
Hos
SC
Consumers
Queue
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Evolution Takes Work• Know your access patterns
• Service Oriented Architecture (Internal API)
• Data accuracy checks: visual and programmatic
• Built framework for testing out engines (Storage,
Queueing, etc)
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Homogeneous Machines at Base
Application
Base AMI
Organizational Base
Event Collection
NSQ
Mongos
App Config
Users
Monitoring
Consumer
NSQ
Mongos
App Config
Users
Base Image Layout Producer Consumer
Amazon Linux
Monitoring
Amazon Linux
Application Group
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
DevOps Wizardry
• Extensive use of AWS
• Monitor: Nagios, Statsd, and Graphite
• Manage: Chef, OpsWorks, cSSHx, Vagrant
• Deployments
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Evolving Amazon Tools
• Full Featured API
• OpsWorks
• Cloud Formation
• S3 / CloudFront
• Elastic Beanstalk
• Elastic
MapReduce
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Service
Internal API
Solr
Real-time
C*
C*
Vertica
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Service Architecture Machines
Application
Base AMI
Organizational Base
iAPI Front End
nginx
App Config
Users
Monitoring
Data Store
App Config
Users
Base Image Layout Proxy Machines Storage Machines
Amazon Linux
Monitoring
Amazon Linux
Application Group
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Anatomy of an Endpoint
Mong
Mong
Vertic
C*
C*
hourly
content
Mong
Mong
Vertic
C*
C*
tenminute
content
QueryingMachines
Helen
Helen
PyVertic
PyMon
PyMon
PyVertic
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Endpoint Breakout• Availability
• Consistent Access Patterns
• Minimal downtime changes
• Smaller code deploys
• Non-monolithic code base
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Architecture Distribution
US-EAST-1a
MONGO-SHARD-0001-B
MONGO-SHARD-0000-A
CASSANDRA-0001
CASSANDRA-0010
REDIS-0001A
VERTICA-0001
iAPI-0001
US-EAST-1b
MONGO-SHARD-0002-B
MONGO-SHARD-0001-A
CASSANDRA-0002
CASSANDRA-0011
REDIS-0001B
iAPI-0002
US-EAST-1e
MONGO-SHARD-0002-A
MONGO-SHARD-0000-B
CASSANDRA-0003
CASSANDRA-0012
VERTICA-0003
iAPI-0003
VERTICA-0002
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Problems?
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
The Schrute of the Problem
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
New Service Questions
• Can its host be completely homogenous?
• Can it accept downtime (and what should downtime look
like)?
• Does it fit into an existing service?
• Does it require datacenter distribution?
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Summary• Solutions Require Evolution
• Build, Use, and Integrate Tools
• Abstraction
• Homogeneous Distribution
• Monitoring & Automation
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
We’re
(Ask about Food Coma Fridays)
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Questions are guaranteed in life.
Answers aren’t.
Eric Lubow
@elubow
elubow@simplereach.co
Thank
you.

Contenu connexe

Plus de DataStax Academy

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

Plus de DataStax Academy (20)

Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with Spark
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and Cassandra
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

C* Summit 2013: Big Architectures for Big Data by Eric Lubow

  • 2. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Overvie • SimpleReach • Goals • Tools • Architecture Implementation
  • 3. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 The 2 Truths
  • 4. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Even with the right tools, 80% of the work of building a big data system is acquiring and refining The Real Truth
  • 5. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13
  • 6. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13
  • 7. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 • Millions of URLs per day • Over 1.25 billion page views per month • 500m events per day (~6k events/second) • Auto-scale 125-160 machines depending on traffic SimpleReach
  • 8. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 And It Goes Like This... C* Vertic a
  • 9. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Goals• Consistent non-data storage layer access patterns • Data accuracy across storage engines • Minimize downtime/Minimize cost of downtime • High availability • Allow access to many toolsets (for all languages, DBs, Engines) • Clients should have minimal architecture knowledge
  • 10. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Consistent Access Patterns realtime_scor e (‘score’, ‘realtime’)
  • 11. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Authentication, Tracking, Per service access keys Track call volume by access key Prevent internal denial of service Monitor availability and performance
  • 12. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Controlled Data Flow Social Event Collector Social Data Batch & Write Processed Data Batch & Write Raw Data Calculate Score Write NSQ Multicast NSQ NSQ
  • 13. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 NSQ by Bit.ly• Distributed and de-centralized topology • At least once delivery guaranteed • Multicast style message routing • Runtime discovery for consumers to find producers • Allow for maintenance windows with no downtime
  • 14. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Path of a Packet Internet EC InternalAPI Solr C* Mong Redis Vertic API Fire Hos SC Consumers Queue
  • 15. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Evolution Takes Work• Know your access patterns • Service Oriented Architecture (Internal API) • Data accuracy checks: visual and programmatic • Built framework for testing out engines (Storage, Queueing, etc)
  • 16. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Homogeneous Machines at Base Application Base AMI Organizational Base Event Collection NSQ Mongos App Config Users Monitoring Consumer NSQ Mongos App Config Users Base Image Layout Producer Consumer Amazon Linux Monitoring Amazon Linux Application Group
  • 17. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 DevOps Wizardry • Extensive use of AWS • Monitor: Nagios, Statsd, and Graphite • Manage: Chef, OpsWorks, cSSHx, Vagrant • Deployments
  • 18. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Evolving Amazon Tools • Full Featured API • OpsWorks • Cloud Formation • S3 / CloudFront • Elastic Beanstalk • Elastic MapReduce
  • 19. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Service Internal API Solr Real-time C* C* Vertica
  • 20. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Service Architecture Machines Application Base AMI Organizational Base iAPI Front End nginx App Config Users Monitoring Data Store App Config Users Base Image Layout Proxy Machines Storage Machines Amazon Linux Monitoring Amazon Linux Application Group
  • 21. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Anatomy of an Endpoint Mong Mong Vertic C* C* hourly content Mong Mong Vertic C* C* tenminute content QueryingMachines Helen Helen PyVertic PyMon PyMon PyVertic
  • 22. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Endpoint Breakout• Availability • Consistent Access Patterns • Minimal downtime changes • Smaller code deploys • Non-monolithic code base
  • 23. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Architecture Distribution US-EAST-1a MONGO-SHARD-0001-B MONGO-SHARD-0000-A CASSANDRA-0001 CASSANDRA-0010 REDIS-0001A VERTICA-0001 iAPI-0001 US-EAST-1b MONGO-SHARD-0002-B MONGO-SHARD-0001-A CASSANDRA-0002 CASSANDRA-0011 REDIS-0001B iAPI-0002 US-EAST-1e MONGO-SHARD-0002-A MONGO-SHARD-0000-B CASSANDRA-0003 CASSANDRA-0012 VERTICA-0003 iAPI-0003 VERTICA-0002
  • 24. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Problems?
  • 25. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 The Schrute of the Problem
  • 26. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 New Service Questions • Can its host be completely homogenous? • Can it accept downtime (and what should downtime look like)? • Does it fit into an existing service? • Does it require datacenter distribution?
  • 27. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Summary• Solutions Require Evolution • Build, Use, and Integrate Tools • Abstraction • Homogeneous Distribution • Monitoring & Automation
  • 28. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 We’re (Ask about Food Coma Fridays)
  • 29. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow elubow@simplereach.co Thank you.