Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Cassandra at eBay - Cassandra Summit 2012

82 715 vues

Publié le

"Buy It Now! Cassandra at eBay" talk at Cassandra Summit 2012

http://www.datastax.com/events/cassandrasummit2012

  • Phil Stanley, former director and manager of national record label Polydor Records says "You correctly named your new system "Demolisher" because that's exactly what it does: It demolishes the sportsbooks!" ➤➤ http://t.cn/A6zP2wH9
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • This already saved me $120 on a car battery! When my car battery died, I decided to try out the EZ Battery Reconditioning method instead of buying a new (expensive) battery. And in just 25 minutes, I reconditioned the battery and it works fantastic again! This already saved me $120 on a car battery! I can�t wait to try this on other batteries now. ▲▲▲ http://t.cn/AiFAbuQp
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Dating for everyone is here: ♥♥♥ http://bit.ly/2Qu6Caa ♥♥♥
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Sex in your area is here: ❶❶❶ http://bit.ly/2Qu6Caa ❶❶❶
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Future of KiJiJi: http://www.slideshare.net/ishmelev/kijiji-strategy
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Cassandra at eBay - Cassandra Summit 2012

  1. August 8, 2012Cassandra at eBay Time left: 29m 59s Jay Patel Architect, Platform Systems @pateljay3001
  2. eBay Marketplaces 97 million active buyers and sellers 200+ million items 2 billion page views each day 80 billion database calls each day 5+ petabytes of site storage capacity 80+ petabytes of analytics storage capacity 2
  3. How do we scale databases? Shard – Patterns: Modulus, lookup-based, range, etc. – Application sees only logical shard/database Replicate – Disaster recovery, read availability/scalability Big NOs – No transactions – No joins – No referential integrity constraints 3
  4. We like Cassandra Multi-datacenter (active-active)  Write performance Availability - No SPOF  Distributed counters Scalability  Hadoop supportWe also utilize MongoDB & HBase 4
  5. Are we replacing RDBMS with NoSQL? Not at all! But, complementing. Some use cases don’t fit well - sparse data, big data, schema optional, real-time analytics, … Many use cases don’t need top-tier set-ups - logging, tracking, … 5
  6. A glimpse on our Cassandra deployment Dozens of nodes across multiple clusters 200 TB+ storage provisioned 400M+ writes & 100M+ reads per day, and growing QA, LnP, and multiple Production clusters 6
  7. Use Cases on Cassandra Social Signals on eBay product & item pages Hunch taste graph for eBay users & items Time series use cases (many):  Mobile notification logging and tracking  Tracking for fraud detection  SOA request/response payload logging  RedLaser server logs and analytics 7
  8. Served byCassandra 8
  9. Manage signals via “Your Favorites” Whole page is served by Cassandra 9
  10. Why Cassandra for Social Signals? Need scalable counters Need real (or near) time analytics on collected social data Need good write performance Reads are not latency sensitive 10
  11. Deployment User request has no datacenter affinity Non-sticky load balancingTopology - NTS Data is backed up periodicallyRF - 2:2 to protect against human orRead CL - ONE software errorWrite CL – ONE 11
  12. Data Model depends on query patterns 12
  13. Data Model (simplified) 13
  14. Wait… Duplicates! Oh, toggle button! Signal --> De-signal --> Signal… 14
  15. Yes, eventual consistency!One scenario that produces duplicate signals in UserLike CF: 1. Signal 2. De-signal (1st operation is not propagated to all replica) 3. Signal, again (1st operation is not propagated yet!) So, what’s the solution? Later… 15
  16. Social Signals, next phase: Real-time Analytics Most signaled or popular items per affinity groups (category, etc.) Aggregated item count per affinity group Example affinity group 16
  17. Initial Data Model for real-time analytics Items in an affinitygroup is physically stored sorted by their signal count Update counters for both individual item and all the affinity groups that item belongs to
  18. Deployment, next phaseTopology - NTSRF - 2:2:2
  19. user1 bid item1 buyitem2 watch sell user2 19
  20. Graph in CassandraEvent consumers listen for site events (sell/bid/buy/watch) & populate graph in Cassandra  30 million+ writes daily  Batch-oriented reads  14 billion+ edges already (for taste vector updates) 20
  21.  Mobile notification logging and tracking Tracking for fraud detection SOA request/response payload logging RedLaser server logs and analytics 21
  22. A glimpse on Data Model
  23. RedLaser tracking & monitoring console 23
  24. That’s all about the use cases..Remember the duplicate problem in Use Case #1? Let’s see some options we considered to solve this… 24
  25. Option 1 – Make ‘Like’ idempotent for UserLike Remove time (timeuuid) from the composite column name:  Multiple signal operations are now Idempotent  No need to read before de-signaling (deleting) X Need timeuuid for ordering! Already have a user with more than 1300 signals 25
  26. Option 2 – Use strong consistency Local Quorum – Won’t help us. User requests are not geo-load balanced (no DC affinity). Quorum – Won’t survive during partition between DCs (or, one of the DC is down). Also, adds additional latency. X Need to survive! 26
  27. Option 3 – Adapt to eventual consistencyIf desire survival! 27 http://www.strangecosmos.com/content/item/101254.html
  28. Adjustments to eventual consistency De-signal steps: – Don’t check whether item is already signaled by a user, or not – Read all (duplicate) signals from UserLike_unordered (new CF to avoid reading whole row from UserLike) – Delete those signals from UserLike_unordered and UserLikeStill, can get duplicate signals or false positives as there is a ‘read before delete’.To shield further, do ‘repair on read’. Not a full story! 28
  29. Lessons & Best Practices• Choose proper Replication Factor and Consistency Level. – They alter latency, availability, durability, consistency and cost. – Cassandra supports tunable consistency, but remember strong consistency is not free.• Consider all overheads in capacity planning. – Replicas, compaction, secondary indexes, etc.• De-normalize and duplicate for read performance. – But don’t de-normalize if you don’t need to.• Many ways to model data in Cassandra. – The best way depends on your use case and query patterns. More on http://ebaytechblog.com?p=1308
  30. Thank You @pateljay3001 #cassandra12 30

×