SlideShare une entreprise Scribd logo
1  sur  102
Télécharger pour lire hors ligne
Big Data Analytics
with Amazon Web Services



            Dr. Matt Wood
   An Online Seminar. Tuesday 16th October.
Hello, and thank you.
Big Data Analytics

   An introduction
Big Data Analytics

   An introduction

   The story of analytics on AWS
Big Data Analytics

   An introduction

   The story of analytics on AWS

   AWS Marketplace
Big Data Analytics

   An introduction

   The story of analytics on AWS

   AWS Marketplace

   Success story: Brightcove
1




INTRODUCING BIG DATA
Data for competitive
     advantage.
Using data

  Customer segmentation,
  financial modeling,
  system analysis,
  line-of-sight,
  business intelligence.
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Cost of data generation
       is falling.
lower cost,
increased throughput

                             Generation




                         Collection & storage




                       Analytics & computation




                       Collaboration & sharing
Generation


                          HIGHLY CONSTRAINED

  Collection & storage




Analytics & computation




Collaboration & sharing
Very high barrier to turning
  data into information.
Move from a
data generation challenge to
    analytics challenge.
Enter the Cloud.
Remove the constraints.
Enable data-driven innovation.
Move to a distributed data
        approach.
Maturation of two things.
Software for distributed
      storage and analysis



Maturation of two things.
Software for distributed
      storage and analysis



Maturation of two things.

  Infrastructure for distributed
       storage and analysis
Software

  Frameworks for
  data-intensive workloads.

  Distributed by design.
Infrastructure

  Platform for
  data-intensive workloads.

  Distributed by design.
Support the
data timeline.
Generation


                          HIGHLY CONSTRAINED

  Collection & storage




Analytics & computation




Collaboration & sharing
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Lower the
barrier to entry.
Accelerate time to market
   and increase agility.
Enable new business
   opportunities.
Washington Post

   Pinterest

    NASA
“AWS enables Pfizer to explore
difficult or deep scientific
questions in a timely, scalable
manner and helps us make better
decisions more quickly”

Michael Miller, Pfizer
2




THE STORY OF ANALYTICS
EC2
Utility computing.
 6 years young.
Scale out systems


 Embarrassingly parallel problems.
 Queue based distribution.
 Small, medium and high scale.
Cost optimization.



    EC2
Utility computing.
 6 years young.
Achieving economies of scale
100%




                                      Time
Achieving economies of scale
100%




               Reserved capacity




                                      Time
Achieving economies of scale
100%




                On-demand




               Reserved capacity




                                      Time
Achieving economies of scale
                                   UNUSED CAPACITY
100%




                On-demand




               Reserved capacity




                                                     Time
Spot Instances


 Bid on unused EC2 capacity.
 Very large discount.
 Perfect for batch runs.
 Balance cost and scale.
<$1000 per hour
Map/reduce

 Pattern for distributed computing.

 Software frameworks such as
 Hadoop.

 Write two functions. Scale up.
Map/reduce

 Pattern for distributed computing.

 Software frameworks such as
 Hadoop.

 Write two functions. Scale up.

 Complex cluster configuration
 and management.
Amazon Elastic MapReduce

 Managed Hadoop clusters.

 Easy to provision and monitor.

 Write two functions. Scale up.

 Optimized for S3 access.
S3

Input data
S3

        Input data




Code     Elastic
       MapReduce
S3

        Input data




Code     Elastic     Name
       MapReduce     node
S3

        Input data




Code     Elastic     Name
       MapReduce     node




                            Elastic
                            cluster
S3

        Input data




Code     Elastic     Name
       MapReduce     node


                                      HDFS


                            Elastic
                            cluster
S3

        Input data




Code     Elastic              Name
       MapReduce              node

                         Queries
                                                     HDFS
                          + BI
                     Via JDBC, Pig, Hive
                                           Elastic
                                           cluster
S3

        Input data




Code     Elastic              Name                            Output
       MapReduce              node                          S3 + SimpleDB


                         Queries
                                                     HDFS
                          + BI
                     Via JDBC, Pig, Hive
                                           Elastic
                                           cluster
S3

Input data




                    Output
                  S3 + SimpleDB
Performance
Performance
 Compute performance
Cluster Compute

 Intel Xeon E5-2670
 10 gig E non-blocking network
 60.5 Gb
 Placement groupings
Cluster Compute

 Intel Xeon E5-2670
 10 gig E non-blocking network
 60.5 Gb
 Placement groupings

 + GPU enabled instances
Performance
 Compute performance
IO performance



Performance
 Compute performance
NoSQL
Unstructured data storage.
DynamoDB

 Predictable, consistent performance
 Unlimited storage
 Single digit millisecond latencies
 No schema for unstructured data
 Backed on solid state drives
...and SSDs for all.
  New Hi1 storage instances.
hi1.4xlarge

  2 x 1Tb SSDs
  10 GigE network
  HVM: 90k IOPS read, 9k to 75k write
  PV: 120k IOPS read, 10k to 85k write
“The hi1.4xlarge configuration is
about half the system cost for the
same throughput.”


Netflix
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Performance + ease of use
3




AWS MARKETPLACE
Extend platform with
     partners
Innovate on behalf of
    customers
Remove undifferentiated
    heavy lifting
AWS Marketplace
aws.amazon.com/marketplace
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Collection & storage



    Acunu Reflex
    Apache Cassandra NoSQL database


    MongoDB
    With and without EBS RAID storage


    Couchbase
    Community and Enterprise editions



    ScaleArc
    MySQL load balancing
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Analytics & computation



      KarmaSphere Analytics
      for Amazon Elastic MapReduce



      MapR M5
      Hadoop Distribution



      Metamarkets
      Event based data processing
Analytics & computation



      StackIQ Rocks+
      HPC clusters with MPI, Grid Engine



      Univa Grid Engine
      One click cluster deployment



      Quantivo
      Data association analytics
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Generation




  Collection & storage




Analytics & computation




Collaboration & sharing
Collaboration & sharing




Aspera Faspex
   20 Mbps data transfer
4




SUCCESS STORY

Contenu connexe

Tendances

AWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloudAWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloud
Amazon Web Services
 

Tendances (20)

Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adx
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Industry experts webinar slides (final v1.0)
Industry experts webinar slides (final   v1.0)Industry experts webinar slides (final   v1.0)
Industry experts webinar slides (final v1.0)
 
Survey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big DataSurvey of Real-time Processing Systems for Big Data
Survey of Real-time Processing Systems for Big Data
 
MCT Virtual Summit 2021
MCT Virtual Summit 2021MCT Virtual Summit 2021
MCT Virtual Summit 2021
 
The Microsoft BigData Story
The Microsoft BigData StoryThe Microsoft BigData Story
The Microsoft BigData Story
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
963
963963
963
 
AWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloudAWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloud
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu GantaAzure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
Azure Databricks – Customer Experiences and Lessons Denzil Ribeiro Madhu Ganta
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Amazon big success using big data analytics
Amazon big success using big data analyticsAmazon big success using big data analytics
Amazon big success using big data analytics
 
Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group Meet
 
Simplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta LakeSimplify and Scale Data Engineering Pipelines with Delta Lake
Simplify and Scale Data Engineering Pipelines with Delta Lake
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science Recap
 
Interactive query using hadoop
Interactive query using hadoopInteractive query using hadoop
Interactive query using hadoop
 

Similaire à Big Data Analytics with Amazon Web Services

Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
Amazon Web Services
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
Amazon Web Services
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Amazon Web Services
 

Similaire à Big Data Analytics with Amazon Web Services (20)

Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services Hadoop and HBase on Amazon Web Services
Hadoop and HBase on Amazon Web Services
 
Data Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web ServicesData Driven Innovation with Amazon Web Services
Data Driven Innovation with Amazon Web Services
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data-driven Innovation - Wood
Data-driven Innovation - WoodData-driven Innovation - Wood
Data-driven Innovation - Wood
 
Hadoop and DynamoDB
Hadoop and DynamoDBHadoop and DynamoDB
Hadoop and DynamoDB
 
Introduction to AWS tools
Introduction to AWS toolsIntroduction to AWS tools
Introduction to AWS tools
 
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
 
Understanding Player Behaviour
Understanding Player BehaviourUnderstanding Player Behaviour
Understanding Player Behaviour
 
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
Astroinformatics 2014: Scientific Computing on the Cloud with Amazon Web Serv...
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Big Data Analytics with Amazon Web Services