Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Simplifying Real-Time Architectures for IoT with Apache Kudu

2 901 vues

Publié le

3 Things to Learn About:

*Building scalable real time architectures for managing data from IoT
*Processing data in real time with components such as Kudu & Spark
*Customer case studies highlighting real-time IoT use cases

Publié dans : Logiciels
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Simplifying Real-Time Architectures for IoT with Apache Kudu

  1. 1. 1© Cloudera, Inc. All rights reserved. Simplifying Real-Time Architectures for IoT using Apache Kudu Vijay Raja| Solutions Marketing Lead, IoT Ryan Lippert | Product Marketing, Operational DB
  2. 2. 2© Cloudera, Inc. All rights reserved. IoT – Key Drivers & Objectives Drive Internal Efficiencies Improve Product & Customer Exp. New Services & Business Models • Predictive Maintenance • Real-time monitoring • Ops optimization • Reduced equipment down-times • Product Usage Analytics • Personalized products & offerings • Improved Product Development • New usage based business models • New service offerings • E.g. On Command Connect • Remote Monitoring Who are my customers? How are they using my products? How can I lower downtime? How can I drive efficiencies? How do we implement a usage-based model? How can I launch new revenue streams?
  3. 3. 3© Cloudera, Inc. All rights reserved. 2 PB of data/car/ year 1 – 2 TB of data / day 1 – 5 TB of data / day
  4. 4. 4© Cloudera, Inc. All rights reserved. IoT Data Characteristics - The Foundation of Hadoop’s Potential IoT data comes from a variety of different sources • Massive volumes of intermittent data streams • Generated from a variety of data sources • Predominantly time-series • Can come in streams (real-time) or batches • Diverse data structures and schemas • Some of it may be perishable Combining sensor data with contextual data is the key to value creation from IoT
  5. 5. 5© Cloudera, Inc. All rights reserved. Polling Question - 1 Where is your organization in your IoT journey? A. Not sure where to start B. Currently exploring use cases C. Implementing our first IoT use case D. Already deployed first IoT use case E. Multiple IoT use cases in production (Single Choice)
  6. 6. 6© Cloudera, Inc. All rights reserved. The IoT Ecosystem & Architecture IoT Gateway Data Center Gateway • Data Routing • Edge-Processing • Edge-Storage IoT Data Storage, Processing & Analytics Centralized IoT Data Analytics • Time Series Data, Trends • Machine Learning • Context Enrichment • Deeper business insights Distributed Data Processing & Analytics • Cloud & On-Premise Cloud Sensors/ Things • Analytics at the edge • For Immediate response IoT Analytics Enterprise Data Sources
  7. 7. 7© Cloudera, Inc. All rights reserved. What Happens at the Edge & What happens in the Cloud? • Analytics that needs to be acted upon immediately • Low latency req. - Hazard detection, collision avoidance etc. • Human response times • Context Enrichment • Time series Analysis • Comparative / Trend analysis • Machine Learning Cloud Analytics Edge Analytics Cloud Analytics
  8. 8. 8© Cloudera, Inc. All rights reserved. Cloudera Enterprise – Hadoop as a Data Platform for IoT Sensors/ IoT Data Sources Internal Systems External Sources BI Solutions Real-Time AppsSearch Data Science Workbench SQL Machine Learning Data Center Cloud Sensor/ IoT Data IoT Gateway • Data Storage • Data Processing • Machine Learning • Real-time Analytics OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer BATCH Sqoop REAL-TIME Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Partners
  9. 9. 9© Cloudera, Inc. All rights reserved. IoT: Lots of Buzz, but what is the core concept? And critically, what do we need from our infrastructure? IoT promises prediction and optimization, but often delivers monitoring. The right solution allows you to analyze data and serve information in time to change business outcomes. That means the right solution is built on real-time analytics.
  10. 10. 10© Cloudera, Inc. All rights reserved. IoT: Driven by Data
  11. 11. 11© Cloudera, Inc. All rights reserved. Polling Question - 2 What area of the real-time data chain does your organization need the most help with? A. Data ingest B. Data processing C. Data serving D. All of the above (Single Choice)
  12. 12. 12© Cloudera, Inc. All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Traditional Hadoop Databases Leave a Gap Use cases that fall between HDFS and HBase were difficult to manage Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Complex Hybrid Architectures Analytic Gap Pace of Analysis PaceofData
  13. 13. 13© Cloudera, Inc. All rights reserved. The Trouble with Lambda Batch Layer Serving Layer Speed Layer New Data Data Lake (HDFS) Precompute Views Stream or Micro Batch Increment Views Data Application “Real-time” Increment Batch Recompute Merge Hadoop Storm/Spark HBase Impala Code must be kept in sync Restatement is difficult
  14. 14. 14© Cloudera, Inc. All rights reserved. Updateable Analytic Storage Simple real-time analytics and updates with Apache Kudu Kudu: Storage for fast analytics on fast data • Simplified architecture for building real-time analytic applications • Designed for next-generation hardware for faster analytic performance across frameworks • Native Hadoop storage engine Flexibility for the right tools for the right use case in one platform • Only analytic database for Hadoop with Kudu + Impala • Simple real-time applications with Kudu + Spark Use cases • Time series data • Machine data analytics • Online reporting STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr OTHER Kite NoSQL HBase FILESYSTEM HDFS RELATIONAL Kudu OBJECT Cloud
  15. 15. 15© Cloudera, Inc. All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Kudu: Fast Analytics on Fast-Changing Data New storage engine enables new Hadoop use cases Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Kudu Kudu fills the Gap Modern analytic applications often require complex data flow & difficult integration work to move data between HBase & HDFS Analytic Gap Pace of Analysis PaceofData
  16. 16. 16© Cloudera, Inc. All rights reserved. Better Together Kudu Benefits from Integration with the Apache Ecosystem Spark – Stream Processing for Kudu • Open standard for real-time stream processing • Effective for automating decision processes and machine learning • Use Cases include: Time Series Data & Machine Data Analytics Impala – High-Performance BI & SQL for Kudu • Open standard for interactive SQL queries • Powers analytic database workloads with flexibility, scale, and open architecture • Use Cases include: Online Reporting
  17. 17. 17© Cloudera, Inc. All rights reserved. Why Kudu, Why Cloudera? A simultaneous combination of sequential and random reads and writes Can you insert time series data in real time? How long does it take to prepare it for analysis? Can you get results and act fast enough to change outcomes? Can you handle large volumes of machine-generated data? Do you have the tools to identify problems or threats? Can your system do machine learning? Time Series Data Machine Data Analytics
  18. 18. 18© Cloudera, Inc. All rights reserved. Kudu Increases the Value of Time Series Data Time Series Inserts, updates, scans, lookups Workload Examples Stream market data; IoT; fraud detection & prevention; risk monitoring; connected cars; Time series data is most valuable if you can analyze it to change outcomes in real time. Kudu simulateneously enables: • Time series data inserted/updated as it arrives • Analytic scans to find trends on fresh time series data • Lookups to quickly visit the point in time where an event occured
  19. 19. 19© Cloudera, Inc. All rights reserved. Kudu Keeps Your Business Operational Machine Data Analytics Inserts, scans, lookups Workload Examples Network threat detection; network health monitoring; application performance monitoring Kudu can help spot problems before they happen. Real-time data inserts with the ability to analyze trends identifies potential problems. Kudu identifies trouble through: • Unlimited storage, yielding better historic trend analysis • Fast inserts to enable an up-to-date network view • Fast scans identify/flag undesired states for remedy
  20. 20. 20© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Driving the Model Through Machine Learning Kafka Spark Streaming Spark MLlib IoT Analytics Individual Session Full Model/Learning Genesis Spark 1 Event Occurs 2 Messaging 3 Stream Processing 4 Land in Relational Store 5 Apply ML Libraries IoT Data Sources Other Data Sources
  21. 21. 21© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture MLlib & K-Means: Defining Microsegments via Machine Learning Height Weight Height Weight 1 2 Height Weight 3 Height Weight 4 L M S XL L M S XS Near Custom ?
  22. 22. 22© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Driving Prediction and Optimization Kafka Spark Streaming Spark MLlib IoT Analytics Individual Session 1 Data Processed Genesis Spark 2 Request Processed/ Kudu Queried 3 4 Results Returned Results Processed 5 Processed Data Returned Full Model/Learning IoT Data Sources Other Data Sources
  23. 23. 23© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Driving Prediction and Optimization Step 1: Data Processed Apache Spark processes the data from the event (car sensors, manufacturing, wearables, etc), which potentially involves keeping a running list of the last X number of events Step 2: Request Processed/Kudu Queried A Spark application uses the data gathered in step one to query Kudu’s database in a predefined manner to look for similar patterns defined via machine learning Step 3: Kudu Results Returned Kudu returns the results from the query in step 2 back to Spark to determine what needs to be returned to the application Step 4: Results Processed Spark associates the results from Kudu with the information stored from the current event to determine the next step to feed back to the application Step 5: Processed Data Returned The machine-generated, best possible outcome is prescribed and served to the application
  24. 24. 24© Cloudera, Inc. All rights reserved. Operational DB: IoT Use Case Prediction and Optimization Kafka Spark Streaming Spark MLlib Application Individual Session Sensor Data Spark Full Model/Learning Data Request Sent For Stream Processing Data Cleaned/Ordered/Processed, Then Delivered to Kudu for Modelling Automated processes based on machine learning enable prediction and optimization at a new level. Illustrative, models will likely have >2 dimensions IoT Data Sources Kudu Other Data Sources
  25. 25. 25© Cloudera, Inc. All rights reserved. Key IoT Use Cases
  26. 26. 26© Cloudera, Inc. All rights reserved. Using Predictive Maintenance to Improve Performance and Reduce Fleet Downtime • Real-time visibility of 300,000+ trucks in order to improve uptime and vehicle performance • OnCommand Connection is collecting telematics and geolocation data across the fleet • Reduced maintenance costs to $.03 per mile from $.12-$.15 per mile • Centralizing data from 13 systems with varying frequency and semantic definitions TRANSPORTATION » PREDICTIVE MAINTENANCE » IMPROVED SERVICE » DATA DRIVEN PRODUCTS DATA-DRIVEN PRODUCTS CASE STUDY
  27. 27. 27© Cloudera, Inc. All rights reserved. Predictive Maintenance on industrial- grade turbines for hydro power stations Challenge: • Gather, store and analyze noise levels from turbines for anomaly detection Solution: • Cloudera platform used to gather and analyze acoustic data/audio files coming from the turbines in real-time • Using diagnostic solution to monitor the health of turbines and predict failures in advance PREDICTIVE MAINTENANCE » INDUSTRIAL IoT » LOWERED DOWNTIME » LOWERED COSTS Predictive Maintenance - Turbines DATA-DRIVEN PROCESS CASE STUDY DATA-DRIVEN PRODUCTS
  28. 28. 28© Cloudera, Inc. All rights reserved. #1 Telematics provider with 130 billion miles of driving data collected from black boxes in connected cars Challenge: • Drive analytics on 12 million miles of driving data collected every hour Solution: • Telematics solution based on Cloudera to process data from black boxes • Analytics around driving behavior, risks, location, braking patterns, contextual elements and crash information TELEMATICS » CONNECTED VEHICLES » INSURANCE TELEMATICS » PREDICTIVE ANALYTICS Connected Car Telematics for Insurance CASE STUDY DATA-DRIVEN PROCESS DATA-DRIVEN PRODUCTS
  29. 29. 29© Cloudera, Inc. All rights reserved. Powering a Variety of IoT Use Cases… Connected Vehicles Usage Based Insurance Industrial IoT Predictive Maintenance Smart Cities/ Ports Oil & Gas Aerospace & Aviation Smart Healthcare
  30. 30. 30© Cloudera, Inc. All rights reserved. Connected Car Demo
  31. 31. 31© Cloudera, Inc. All rights reserved. Connected Car – Demo Architecture OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer BATCH Sqoop REAL-TIME Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Partners Cloudera Enterprise Data Hub MQTT - Kafka Bridge Connected Car Simulator Data Ingest & Pipeline Enterprise Data Hub BI & Visualization Streaming Data: • Time • VIN • Location • Mileage • Speed • Acceleration • Brakes applied? • Turn signal on? • Lane departed? • Collision detected? • Hazard detected? StreamSets Data Collector
  32. 32. 32© Cloudera, Inc. All rights reserved. Connected Car – Demo Architecture Cloudera Enterprise Data Hub MQTT - Kafka Bridge Connected Car Simulator Data Ingest & Pipeline Enterprise Data Hub BI & Visualization Streaming Data: • Time • VIN • Location • Mileage • Acceleration • Speed • Brakes applied? • Turn signal on? • Lane departed? • Collision detected? • Hazard detected? Data Storage Layer Search #2 #1 Pub-Sub Messaging System Real-Time Processing Engine StreamSets Data Collector Interactive SQL Engine
  33. 33. 33© Cloudera, Inc. All rights reserved. Thank You

×