SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Cassandra in
            |   Online Advertising:
                Real Time Bidding




the prospect engine for brands.
Who are we?
Costa Sevdinoglou & Edward Capriolo
Impressions look like…
A High Level look at RTB




1. Browsers visit Publishers and create impressions.
2. Publishers sell impressions via Exchanges.
3. Exchanges serve as auction houses for the impressions
4. On behalf of the marketer, m6d bids the impressions via the
   auction house. If m6d wins, we display our ad to the
   browser.
Performance and Data
• Billions and billions of bid requests a day
  • A single request can result in multiple
       Cassandra Operations!
  • One cluster is just under 10TB and growing
• Low latency requirement below 120 ms typical
• Limited data available to m6d via the exchange
Segment Data

Segments are how we assign product or service
affinity to a group of users. User’s we consider to be
like minded with respect to a given brand will be
placed in the same segment.

Segment Data is just one component of our
overarching data model.

Segments help to reduce the number of calculations
we do in real time.
Old Approach for Segment Data
                  Application Nodes
                  (Tomcat + MySQL )
                                                   Limitations
                                                   •Periodically updated.
MySQL Data Push                       Event Logs   •Only subsection of
                                                   the data.
                                                   •Cluster performance
                                                   is effected during a
                                                   data push.
        Aggregation              Hadoop
Cassandra Approach
        for Segment Data

Application Nodes                  Better!
 (Tomcat + Less     •   Updating in real time now
 MySQL Usage)           possible
                    •   Distributed not duplicated
                    •   Less complexity to manage
                    •   Storing more information
                    •   We can now bid on users
   Cassandra            sooner!
One Ring to rule them all




http://askyyy.blog.163.com/blog/static/12345759920104288193
99/
Peer to Peer
            per operation replication
   Fail fast, self-healing
   Each write goes to all natural endpoints
   Hinted handoff if destination is down
   Repair on Read
   No more:
            STOP SLAVE; SET GLOBAL
             SQL_SLAVE_SKIP_COUNTER = 1; START
             SLAVE;
Multi Data Center
 No designing and managing complex replication topologies
 create keyspace world
with placement_strategy =
  'org.apache.cassandra.locator.NetworkTopologyStrategy'
and strategy_options={1:3, 2:3, 3:3};
 The same process as single data center
 No log shipping, or separate processes to run
Monitoring & Management
   Many Many things to monitor with JMX
   Nice command line tools
   Most values can be tweaked at run time
Capacity Planning

   How many
          Rows
          Columns
          Size of Average Column
   Latency requirements
   Throughput read and writes per sec
Unit Tests FTW!
Max 2 billion columns per row

   Awesome
          Unless you accidentally write 2 billion
           columns to a row key named “null”
   Check maxRowSize JMX
   Watch logs for messages about compacting
    large rows
Local (NYC) Meetups

   www.meetup.com/NYC-Cassandra-User-
    Group/

Contenu connexe

Tendances

M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 

Tendances (20)

Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
 
Apache Cassandra Management
Apache Cassandra ManagementApache Cassandra Management
Apache Cassandra Management
 
Seattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersSeattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffers
 
DynamoDB at HasOffers
DynamoDB at HasOffers DynamoDB at HasOffers
DynamoDB at HasOffers
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Архитектура приложений с использованием MySQL, Петр Зайцев (Percona)
Архитектура приложений с использованием MySQL, Петр Зайцев (Percona)Архитектура приложений с использованием MySQL, Петр Зайцев (Percona)
Архитектура приложений с использованием MySQL, Петр Зайцев (Percona)
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
Cloud computing fundamentals with Microsoft Azure
Cloud computing fundamentals with Microsoft AzureCloud computing fundamentals with Microsoft Azure
Cloud computing fundamentals with Microsoft Azure
 
CrateDB - Giacomo Ceribelli
CrateDB - Giacomo CeribelliCrateDB - Giacomo Ceribelli
CrateDB - Giacomo Ceribelli
 
Cassandra
CassandraCassandra
Cassandra
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & Toubleshooting
 
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to KubernetesApache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
Apache Cassandra Lunch #78: Deploy Cassandra using DSE Operator to Kubernetes
 
Migrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to CassandraMigrating Data Pipeline from MongoDB to Cassandra
Migrating Data Pipeline from MongoDB to Cassandra
 
Cassandra vs Databases
Cassandra vs Databases Cassandra vs Databases
Cassandra vs Databases
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 

Similaire à Real World Cassandra

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 

Similaire à Real World Cassandra (20)

C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?
 
Scaling graphite for application metrics
Scaling graphite for application metricsScaling graphite for application metrics
Scaling graphite for application metrics
 
Cassandra tw presentation
Cassandra tw presentationCassandra tw presentation
Cassandra tw presentation
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Intro to Databases
Intro to DatabasesIntro to Databases
Intro to Databases
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Internet Scale Architecture
Internet Scale ArchitectureInternet Scale Architecture
Internet Scale Architecture
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
 
The MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits inThe MySQL High Availability Landscape and where Galera Cluster fits in
The MySQL High Availability Landscape and where Galera Cluster fits in
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...
 
Webinar Slides: Geo-Scale MySQL in AWS
Webinar Slides: Geo-Scale MySQL in AWSWebinar Slides: Geo-Scale MySQL in AWS
Webinar Slides: Geo-Scale MySQL in AWS
 

Plus de GiltTech (9)

Riak a successful failure
Riak   a successful failureRiak   a successful failure
Riak a successful failure
 
Gotszling mogo db-membase
Gotszling mogo db-membaseGotszling mogo db-membase
Gotszling mogo db-membase
 
Couchdb at AMEX
Couchdb at AMEXCouchdb at AMEX
Couchdb at AMEX
 
Scala for the web Lightning Talk
Scala for the web Lightning TalkScala for the web Lightning Talk
Scala for the web Lightning Talk
 
Clojure Lightning Talk
Clojure Lightning TalkClojure Lightning Talk
Clojure Lightning Talk
 
CoffeeScript Lightning Talk
CoffeeScript Lightning TalkCoffeeScript Lightning Talk
CoffeeScript Lightning Talk
 
Erlang Lightning Talk
Erlang Lightning TalkErlang Lightning Talk
Erlang Lightning Talk
 
Groovy and Grails
Groovy and GrailsGroovy and Grails
Groovy and Grails
 
Java to scala
Java to scalaJava to scala
Java to scala
 

Dernier

Dernier (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Real World Cassandra

  • 1. Cassandra in | Online Advertising: Real Time Bidding the prospect engine for brands.
  • 2. Who are we? Costa Sevdinoglou & Edward Capriolo
  • 4. A High Level look at RTB 1. Browsers visit Publishers and create impressions. 2. Publishers sell impressions via Exchanges. 3. Exchanges serve as auction houses for the impressions 4. On behalf of the marketer, m6d bids the impressions via the auction house. If m6d wins, we display our ad to the browser.
  • 5. Performance and Data • Billions and billions of bid requests a day • A single request can result in multiple Cassandra Operations! • One cluster is just under 10TB and growing • Low latency requirement below 120 ms typical • Limited data available to m6d via the exchange
  • 6. Segment Data Segments are how we assign product or service affinity to a group of users. User’s we consider to be like minded with respect to a given brand will be placed in the same segment. Segment Data is just one component of our overarching data model. Segments help to reduce the number of calculations we do in real time.
  • 7. Old Approach for Segment Data Application Nodes (Tomcat + MySQL ) Limitations •Periodically updated. MySQL Data Push Event Logs •Only subsection of the data. •Cluster performance is effected during a data push. Aggregation Hadoop
  • 8. Cassandra Approach for Segment Data Application Nodes Better! (Tomcat + Less • Updating in real time now MySQL Usage) possible • Distributed not duplicated • Less complexity to manage • Storing more information • We can now bid on users Cassandra sooner!
  • 9. One Ring to rule them all http://askyyy.blog.163.com/blog/static/12345759920104288193 99/
  • 10. Peer to Peer per operation replication  Fail fast, self-healing  Each write goes to all natural endpoints  Hinted handoff if destination is down  Repair on Read  No more:  STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;
  • 11. Multi Data Center  No designing and managing complex replication topologies  create keyspace world with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options={1:3, 2:3, 3:3};  The same process as single data center  No log shipping, or separate processes to run
  • 12. Monitoring & Management  Many Many things to monitor with JMX  Nice command line tools  Most values can be tweaked at run time
  • 13. Capacity Planning  How many  Rows  Columns  Size of Average Column  Latency requirements  Throughput read and writes per sec
  • 15. Max 2 billion columns per row  Awesome  Unless you accidentally write 2 billion columns to a row key named “null”  Check maxRowSize JMX  Watch logs for messages about compacting large rows
  • 16. Local (NYC) Meetups  www.meetup.com/NYC-Cassandra-User- Group/