Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Simplify Big Data with AWS

440 vues

Publié le

Webinar @ Salon du Big Data, 02/03/2016

Publié dans : Technologie
  • Soyez le premier à commenter

Simplify Big Data with AWS

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. @julsimon Webinar “Salon du Big Data” 02/03/2016 Simplify Big Data with AWS Julien Simon, Principal Technical Evangelist
  2. 2. Simplify Big Data Processing ingest / collect store process / analyze consume / visualize Time to Answer (Latency) Throughput Cost
  3. 3. Collect / Ingest
  4. 4. Types of Data •  Transactional •  Database reads & writes (OLTP) •  Cache •  Search •  Logs •  Streams •  File •  Log files (/var/log) •  Log collectors & frameworks •  Stream •  Log records •  Sensors & IoT data Database File Storage Stream Storage A iOS Android Web Apps Logstash Logging IoT Applications Transactional Data File Data Stream Data Mobile Apps Search Data Search Collect Store Logging IoT
  5. 5. Store
  6. 6. Stream Storage A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon ElastiCache SearchSQLNoSQLCache StreamStorage FileStorage Transactional Data File Data Stream Data Mobile Apps Search Data Database File Storage Search Collect Store Logging IoT Applications ü 
  7. 7. File Storage A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon ElastiCache SearchSQLNoSQLCache StreamStorage FileStorage Transactional Data File Data Stream Data Mobile Apps Search Data Database Search Collect Store Logging IoT Applications ü 
  8. 8. Database + Search Tier A Amazon
 S3 iOS Android Web Apps Logstash Amazon RDS / Aurora Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon ElastiCache SearchSQLNoSQLCache StreamStorage FileStorage Transactional Data File Data Stream Data Mobile Apps Search Data Collect Store ü Logging IoT Applications
  9. 9. Database + Search Tier Anti-pattern RDBMS Database + Search Tier Applications
  10. 10. What Data Store Should I Use? Amazon ElastiCache Amazon DynamoDB Amazon Aurora Amazon Elasticsearch Amazon EMR (HDFS) Amazon S3 Amazon Glacier Average latency ms ms ms, sec ms,sec sec,min,hrs ms,sec,min (~ size) hrs Data volume GB GB–TBs (no limit) GB–TB (64 TB Max) GB–TB GB–PB (~nodes) MB–PB (no limit) GB–PB (no limit) Item size B-KB KB (400 KB max) KB (64 KB) KB (1 MB max) MB-GB KB-GB (5 TB max) GB (40 TB max) Request rate High - Very High Very High (no limit) High High Low – Very High Low – Very High (no limit) Very Low Storage cost GB/month $$ ¢¢ ¢¢ ¢¢ ¢ ¢ ¢/10 Durability Low - Moderate Very High Very High High High Very High Very High Hot Data Warm Data Cold Data Hot Data Warm Data Cold Data
  11. 11. Process / Analyze
  12. 12. AnalyzeA iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessing Batch Interactive Logging StreamStorage IoT Applications FileStorage Hot Cold Warm Hot Hot ML Transactional Data File Data Stream Data Mobile Apps Search Data Collect Store Analyze ü  ü 
  13. 13. Analysis Tools and Frameworks Machine Learning •  Mahout, Spark ML, Amazon ML Interactive Analytics •  Amazon Redshift, Presto, Impala, Spark Batch Processing •  MapReduce, Hive, Pig, Spark Stream Processing •  Micro-batch: Spark Streaming, KCL, Hive, Pig •  Real-time: Storm, AWS Lambda, KCL Amazon Redshift Impala Pig Amazon Machine Learning Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce StreamProcessing Batch Interactive ML Analyze
  14. 14. What Data Processing Technology Should I Use? Amazon Redshift Impala Presto Spark Hive Query Latency Low Low Low Low Medium (Tez) – High (MapReduce) Durability High High High High High Data Volume 1.6 PB Max ~Nodes ~Nodes ~Nodes ~Nodes Managed Yes Yes (EMR) Yes (EMR) Yes (EMR) Yes (EMR) Storage Native HDFS / S3 HDFS / S3 HDFS / S3 HDFS / S3 SQL Compatibility High Medium High Low (SparkSQL) Medium (HQL) Query Latency High (Low is better) Medium
  15. 15. Consume / Visualize
  16. 16. Collect Store Analyze Consume A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessing Batch Interactive Logging StreamStorage IoT Applications FileStorage Analysis&Visualization Hot Cold Warm Hot Slow Hot ML Fast Fast Transactional Data File Data Stream Data Notebooks Predictions Apps & APIs Mobile Apps IDE Search Data ETL Amazon QuickSight
  17. 17. Consume •  Predictions •  Analysis and Visualization •  Notebooks •  IDE •  Applications & API Consume Analysis&Visualization Amazon QuickSight Notebooks Predictions Apps & APIs IDE Store Analyze ConsumeETL Business users Data Scientist, Developers
  18. 18. Putting It All Together
  19. 19. Collect Store Analyze Consume A iOS Android Web Apps Logstash Amazon RDS Amazon DynamoDB Amazon ES Amazon
 S3 Apache Kafka Amazon
 Glacier Amazon
 Kinesis Amazon
 DynamoDB Amazon Redshift Impala Pig Amazon ML Streaming Amazon
 Kinesis AWS Lambda AmazonElasticMapReduce Amazon ElastiCache SearchSQLNoSQLCache StreamProcessing Batch Interactive Logging StreamStorage IoT Applications FileStorage Analysis&Visualization Hot Cold Warm Hot Slow Hot ML Fast Fast Amazon QuickSight Transactional Data File Data Stream Data Notebooks Predictions Apps & APIs Mobile Apps IDE Search Data ETL
  20. 20. Interactive & Batch Analytics Producer Amazon S3 Amazon EMR Hive Pig Spark Amazon ML process store Consume Amazon Redshift Amazon EMR Presto Impala Spark Batch Interactive Batch Prediction Real-time Prediction
  21. 21. Real-time Analytics Producer Apache Kafka KCL AWS Lambda Spark Streaming Apache Storm Amazon SNS Amazon ML Notifications Amazon ElastiCache (Redis) Amazon DynamoDB Amazon RDS Amazon ES Alert App state Real-time Prediction KPI process store DynamoDB Streams Amazon Kinesis
  22. 22. Batch Layer Amazon Kinesis data process store Lambda Architecture Amazon Kinesis S3 Connector Amazon S3 A p p l i c a t i o n s Amazon Redshift Amazon EMR Presto Hive Pig Spark answer Speed Layer answer Serving Layer Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon ES answer Amazon ML KCL AWS Lambda Spark Streaming Storm
  23. 23. Summary •  Use the right tool for the job •  Latency, throughput, access patterns •  Leverage AWS managed services •  No/low admin •  Be cost conscious •  Big data ≠ big cost
  24. 24. Thank you. Let’s keep in touch! @aws_actus @julsimon facebook.com/groups/AWSFrance/ AWS User Groups in Paris, Lyon, Nantes, Lille & Rennes (meetup.com) March 7-8 AWS Summit May 31st April 20-22 March 23-24 April 6-7 (Lyon) April 25 March 16
  25. 25. Customer references & further reading •  Amazon Kinesis: https://aws.amazon.com/solutions/case-studies/supercell/ •  Amazon DynamoDB: https://aws.amazon.com/fr/solutions/case-studies/adroll/ •  Amazon S3 / Glacier: https://aws.amazon.com/fr/solutions/case-studies/soundcloud/ •  Amazon EMR: https://aws.amazon.com/fr/solutions/case-studies/yelp/ •  Amazon Aurora: https://aws.amazon.com/fr/rds/aurora/testimonials/ •  Amazon Redshift: https://aws.amazon.com/fr/solutions/case-studies/financial-times/ •  AWS Lambda: https://aws.amazon.com/fr/solutions/case-studies/nordstrom/ •  Many more case studies at https://aws.amazon.com/fr/solutions/case-studies/big-data/ •  Whitepaper: “Big Data Analytics Options on AWS” : http://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf •  AWS Big Data blog: https://blogs.aws.amazon.com/bigdata

×