SlideShare une entreprise Scribd logo
1  sur  18
Making Hadoop & Cassandra
       work together

          © Altoros Systems, Inc.
About Altoros
  Software delivery acceleration specialist for big data application implementation
   services
  200+ employees globally (US, Eastern Europe, UK, Denmark, Norway)
  Big data practice areas
         Automated device analytics
         Advertising analytics
         Big data warehouse


Customers




Partners

                                                         Implementation Partner


                                     © Altoros Systems, Inc.
The Product




              © Altoros Systems, Inc.
The Problem: Data is Big


 10-20 sensors per house
 Ability to support tens of thousands of households

 1 sensor ~1.1 MB/day
 1,000 Households: 11 GB/day
 500,000 Households: 5TB/day




                          © Altoros Systems, Inc.
The Dashboard




                © Altoros Systems, Inc.
Full Visibility




                  © Altoros Systems, Inc.
The Problem: Performance


 MySQL showed slow performance under intensive writes
     Target throughput isn’t scalable

 Disk performance is a bottleneck
      Monitoring with iostat -dmx

 Old fashion single-threaded batch processing is slow
      Make it parallel!




                         © Altoros Systems, Inc.
Requirements


 High responsive system with parallel processing

 Reliable
   – Partial failure is acceptable
   – Node and data recoverability

 Scalable
   – Load capacity
   – Max throughput

 Total cost of ownership
   – Data compression
                            © Altoros Systems, Inc.
NoSQL Database Requirements


   –   Fast writes are critical
   –   Querying by column and range of keys
   –   Secondary indices
   –   Good map/reduce compatibility using Apache Hadoop




                         © Altoros Systems, Inc.
© Altoros Systems, Inc.
Why Cassandra


  – Good overall balance of features, scalability, reliability
  – We wanted BigTable-like features: columns, column
    families
  – Well suited for large streams of non-transactional data
  – Provides good, consistent write throughput
  – Tunable trade-offs for distribution and replication (N,
    R, W)




                          © Altoros Systems, Inc.
File system


 HDFS
   – Is a file system behind our Cassandra implementation
   – Data coherency: write-once-read-many access




                         © Altoros Systems, Inc.
Cassandra: Best Used When…



 When you write more than you read (logging).
 If every component of the system must be in Java
 You need/may need in the future complex configuration
  requirements




                         © Altoros Systems, Inc.
Cassandra Challenges


    High, Unpredictable Write Volume
    Varying Schema, Variable Msg Size
    2 Types of Series - Data, Lookups
    All time-series, even metadata - no supplemental DB




                        © Altoros Systems, Inc.
No Cassandra Compression?


 Built-in Cassandra compression claims to compress
  across columns with identical names.
 All our data columns are timestamped, so no two will
  ever have identical names.




                         © Altoros Systems, Inc.
Numbers


          “Benchmark” Cassandra node
               LZO Compression




                    © Altoros Systems, Inc.
Lessons Learned


 Consider hybrid
     RDBMS + NoSQL + Hadoop

 Hadoop
     Is for offline processing and analysis
     Is NOT for random reading and writing records

 Cassandra complements Hadoop with querying capabilities




                          © Altoros Systems, Inc.
Thank you!

 @renatkhasanshyn
      @altoros
renat.k@altoros.com




      © Altoros Systems, Inc.

Contenu connexe

Plus de Altoros

Plus de Altoros (20)

Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
 
Distributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter TradingDistributed Ledger Technology for Over-the-Counter Trading
Distributed Ledger Technology for Over-the-Counter Trading
 
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes5-Step Deployment of Hyperledger Fabric on Multiple Nodes
5-Step Deployment of Hyperledger Fabric on Multiple Nodes
 
Deploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with KubesprayDeploying Kubernetes on GCP with Kubespray
Deploying Kubernetes on GCP with Kubespray
 
UAA for Kubernetes
UAA for KubernetesUAA for Kubernetes
UAA for Kubernetes
 
Troubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud FoundryTroubleshooting .NET Applications on Cloud Foundry
Troubleshooting .NET Applications on Cloud Foundry
 
Continuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCFContinuous Integration and Deployment with Jenkins for PCF
Continuous Integration and Deployment with Jenkins for PCF
 
How to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment UnattendedHow to Never Leave Your Deployment Unattended
How to Never Leave Your Deployment Unattended
 
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and LogsCloud Foundry Monitoring How-To: Collecting Metrics and Logs
Cloud Foundry Monitoring How-To: Collecting Metrics and Logs
 
Smart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based SolutionSmart Baggage Tracking: End-to-End Sensor-Based Solution
Smart Baggage Tracking: End-to-End Sensor-Based Solution
 
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry TilesNavigating the Ecosystem of Pivotal Cloud Foundry Tiles
Navigating the Ecosystem of Pivotal Cloud Foundry Tiles
 
AI as a Catalyst for IoT
AI as a Catalyst for IoTAI as a Catalyst for IoT
AI as a Catalyst for IoT
 
Over-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and TreatmentOver-Engineering: Causes, Symptoms, and Treatment
Over-Engineering: Causes, Symptoms, and Treatment
 
What's New in the Cloud Foundry Ecosystem?
What's New in the Cloud Foundry Ecosystem?What's New in the Cloud Foundry Ecosystem?
What's New in the Cloud Foundry Ecosystem?
 
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.jsBluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
 
Deep Learning in Finance
Deep Learning in FinanceDeep Learning in Finance
Deep Learning in Finance
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
 
Toward ML-Assisted Tumor Boards Using Cross-Modal Learning
Toward ML-Assisted Tumor Boards Using Cross-Modal LearningToward ML-Assisted Tumor Boards Using Cross-Modal Learning
Toward ML-Assisted Tumor Boards Using Cross-Modal Learning
 
Future of IoT: Key Challenges to Face
Future of IoT: Key Challenges to FaceFuture of IoT: Key Challenges to Face
Future of IoT: Key Challenges to Face
 
Using Hyperledger Fabric to Manage Compliance with Fund Managers and Regulators
Using Hyperledger Fabric to Manage Compliance with Fund Managers and RegulatorsUsing Hyperledger Fabric to Manage Compliance with Fund Managers and Regulators
Using Hyperledger Fabric to Manage Compliance with Fund Managers and Regulators
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Making Hadoop and Cassandra Work Together

  • 1. Making Hadoop & Cassandra work together © Altoros Systems, Inc.
  • 2. About Altoros  Software delivery acceleration specialist for big data application implementation services  200+ employees globally (US, Eastern Europe, UK, Denmark, Norway)  Big data practice areas Automated device analytics Advertising analytics Big data warehouse Customers Partners Implementation Partner © Altoros Systems, Inc.
  • 3. The Product © Altoros Systems, Inc.
  • 4. The Problem: Data is Big  10-20 sensors per house  Ability to support tens of thousands of households  1 sensor ~1.1 MB/day  1,000 Households: 11 GB/day  500,000 Households: 5TB/day © Altoros Systems, Inc.
  • 5. The Dashboard © Altoros Systems, Inc.
  • 6. Full Visibility © Altoros Systems, Inc.
  • 7. The Problem: Performance  MySQL showed slow performance under intensive writes Target throughput isn’t scalable  Disk performance is a bottleneck Monitoring with iostat -dmx  Old fashion single-threaded batch processing is slow Make it parallel! © Altoros Systems, Inc.
  • 8. Requirements  High responsive system with parallel processing  Reliable – Partial failure is acceptable – Node and data recoverability  Scalable – Load capacity – Max throughput  Total cost of ownership – Data compression © Altoros Systems, Inc.
  • 9. NoSQL Database Requirements – Fast writes are critical – Querying by column and range of keys – Secondary indices – Good map/reduce compatibility using Apache Hadoop © Altoros Systems, Inc.
  • 11. Why Cassandra – Good overall balance of features, scalability, reliability – We wanted BigTable-like features: columns, column families – Well suited for large streams of non-transactional data – Provides good, consistent write throughput – Tunable trade-offs for distribution and replication (N, R, W) © Altoros Systems, Inc.
  • 12. File system  HDFS – Is a file system behind our Cassandra implementation – Data coherency: write-once-read-many access © Altoros Systems, Inc.
  • 13. Cassandra: Best Used When…  When you write more than you read (logging).  If every component of the system must be in Java  You need/may need in the future complex configuration requirements © Altoros Systems, Inc.
  • 14. Cassandra Challenges  High, Unpredictable Write Volume  Varying Schema, Variable Msg Size  2 Types of Series - Data, Lookups  All time-series, even metadata - no supplemental DB © Altoros Systems, Inc.
  • 15. No Cassandra Compression?  Built-in Cassandra compression claims to compress across columns with identical names.  All our data columns are timestamped, so no two will ever have identical names. © Altoros Systems, Inc.
  • 16. Numbers “Benchmark” Cassandra node LZO Compression © Altoros Systems, Inc.
  • 17. Lessons Learned  Consider hybrid RDBMS + NoSQL + Hadoop  Hadoop Is for offline processing and analysis Is NOT for random reading and writing records  Cassandra complements Hadoop with querying capabilities © Altoros Systems, Inc.
  • 18. Thank you! @renatkhasanshyn @altoros renat.k@altoros.com © Altoros Systems, Inc.