SlideShare a Scribd company logo
1 of 74
Amazon Web Services
Big Data and the Cloud: A Best Friend Story
Joe Ziegler
Technical Evangelist
zieglerj@amazon.com    @jiyosub
죠 지글러
테크니컬 에벤젤리스트
zieglerj@amazon.com   @jiyosub
Characteristics of
    Big Data



              How the Cloud Is
            Big Data’s Best Friend


                        Big Data on the Cloud
                          In the Real World
Characteristics of
    Big Data
BIG DATA
  When your data sets become
 so large that you have to start
innovating how to collect, store,
 organize, analyze and share it
Bigger Data
     is
Better Data
Features driven by MapReduce
Bigger Data
    is
Harder Data
Big Data is Getting Bigger
           Unconstrained data growth


                                       95% of the 1.2 zettabytes of
                            ZB         data in the digital universe is
                                       unstructured
                                       70% of of this is user-
                        EB             generated content
                                       Unstructured data growth
                                       explosive, with estimates of
               PB                      compound annual growth
                                       (CAGR) at 62% from 2008 –
GB    TB                               2012.
                                                             Source: IDC
Big Data is Hard
                 and getting harder




           Changing Data Requirements
       Faster response time of fresher data
Sampling is not good enough & history is important
        Increasing complexity of analytics
   Users demand inexpensive experimentation
Where is it Coming From?

Computer Generated               Human Generated
• Application server logs       • Twitter “Fire Hose” 50m
  (web sites, games)              tweets/day 1,400% growth
• Sensor data                     per year
  (weather, water, smart        • Blogs/Reviews/Emails/Pict
  grids)                          ures
• Images/videos                 • Social Graphs: Facebook,
  (traffic, security cameras)     Linked-in, Contacts
Storage       Big Data Compute
               Big Data
       How quickData has gravity it?
                do you need to read




 App                  Data                              App




                             http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
Storage      Big Data Compute
              Big Data
         …and inertia atto read
         quick do you need volume…
      How…and inertia at volume… it?




                    Data




                           http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
Storage        Big Data Compute
                Big Data
   …easierquick inertiaapplications to the data
           to move need to read
     How…and do youat volume… it?




                      Data




                              http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
The Role of Data
  is Changing
Until now, Questions you ask drove Data model




  New model is collect as much data as possible
  – “Data-First Philosophy”
Data is the new raw material for
Data is theanyraw material for on business on par
           new
                business       any
                                   par with
        with capital, people, labor
      capital, people, labor
We Need Tools Built Specifically
         for Big Data
Hadoop




• Scale out Easily     • Solves some Problems
• Parallel Computing   • Complex to Run
• Commodity Hardware   • Special Skills to Maintain
How the Cloud Is
Big Data’s Best Friend
How do we define the cloud?
       By Benefits!
No Cap Ex
                                     Pay Per
 Elasticity                           Use

                Cloud
Fast Time to Market          Focus on core
                              competency
Why is the Cloud
Big Data’s Best Friend
We know we want collect, store,
organize, analyze and share it.
But we have limited resources.
The Cloud Optimizes
Precious IT Resources
  i.e. Skilled People
“Over the next decade, the number of files or containers
that encapsulate the information in the digital universe
will grow by 75x.
While the pool of IT staff available to manage them will
grow only slightly. At 1.5x”
                                - 2011 IDC Digital Universe Study
Deploying a Hadoop cluster is hard
Cloud computing


              30%                    70%

The Old    Using Big         Managing All of the
IT World     Data      “Undifferentiated Heavy Lifting”
Cloud computing


                    30%                      70%

   The Old       Using Big          Managing All of the
   IT World        Data       “Undifferentiated Heavy Lifting”
                                                   Configuring
 Cloud-Based
                  Analyzing and Using Big Data       Cloud
Infrastructure
                                                     Assets
                             70%                      30%
The Cloud
   Reduces Cost
For Experimentation
Managed
Reusability   Services


  Scale       Innovation
Managed
Reusability   Services


  Scale       Innovation
Managed
Reusability   Services


  Scale       Innovation
Managed
Reusability   Services


  Scale       Innovation
Managed
Reusability   Services


  Scale       Innovation
The Cloud Optimizes
Capacity Resources
Elastic Compute Capacity




        On and Off           Fast Growth




       Variable peaks      Predictable peaks
Elastic Compute Capacity
                                             WASTE




        On and Off                 Fast Growth




       Variable peaks            Predictable peaks

                  CUSTOMER DISSATISFACTION
Elastic Compute Capacity

Capacity                           Traditional
                                   IT capacity
                                Elastic cloud capacity
                         Time
              Your IT needs
Elastic Compute Capacity




        On and Off           Fast Growth




       Variable peaks      Predictable peaks
The Cloud
Empowers Users to Balance
     Cost and Time
1 instance for 500 hours
            =
500 instances for 1 hour
Storage       Big Data Compute
               Big Data
            From one instance…
      How quick do you need to read it?
Storage       Big Data Compute
               Big Data
                …to thousands
      How quick do you need to read it?
The Cloud
 Scales
AMAZON ELASTIC MAPREDUCE
            • Managed Hadoop offering in the cloud
            • Integration with other AWS services
 •   Thousands of customers ran over 2 million clusters on EMR
                        over the last year
Prod Cluster
          S3                     (EMR)


                                EMR
                                      HDFS




Data streamed directly from S3 to the cluster
Prod Cluster
S3                     (EMR)


                      EMR
                            HDFS




     Results streamed back to S3
Recommendation   Ad-hoc
      Engine       Analysis   Personalization

                                Prod Cluster
            S3                     (EMR)


                                   EMR




Data consumed in multiple ways
Prod Cluster
                            (EMR)
        S3

                          EMR




Wide range of processing languages used
The Cloud
Enables Collection and Storage
         of Big Data
Simple Storage Service
                                             1 Trillion
    1000.000



     750.000



     500.000



     250.000



       0.000




               650k+ peak transactions per second
Global Accessibility

                                                        Region
US-WEST (N. California)                                 EU-WEST (Ireland)
                          GOV CLOUD                                                       ASIA PAC (Tokyo)




                                 US-EAST (Virginia)


US-WEST (Oregon)




                                                                             ASIA PAC
                                                                            (Singapore)
                                      SOUTH AMERICA (Sao Paulo)
Amazon DynamoDB
DynamoDB is a fully managed NoSQL database service
that provides extremely fast and predictable performance
with seamless scalability

                                       Zero Administration

                                        Low Latency SSD’s

                                        Reserved Capacity
                                   Unlimited Potential Storage and
                                            Throughput
The Cloud
 Enables
Processing
Big Data and the Cloud a Best Friend Story
We know we want
collect, store, organize, analyze and
               share it.
  But we have limited resources.
Big Data on the Cloud
  In the Real World
Big Data Verticals

                                                                                             Social
Media/Adverti                                               Financial
                Oil & Gas      Retail       Life Sciences                   Security      Network/Gami
    sing                                                    Services
                                                                                               ng


                                                                                              User
                                                                             Anti-virus
   Targeted                                                 Monte Carlo                    Demographics
                             Recommend
  Advertising                                               Simulations


                 Seismic                       Genome                         Fraud
                                                                                           Usage analysis
                 Analysis                      Analysis                      Detection


  Image and
                             Transactions
    Video                                                   Risk Analysis
                               Analysis                                       Image           In-game
  Processing
                                                                            Recognition        metrics
Big Data and the Cloud a Best Friend Story
Netflix Web Services
          (Honu)             S3




8 TB of event data per day
S3




                              Legacy Data




Legacy data from on-premise
                                            Netflix Data Center
        data center
Customer dimension data stored
        in Cassandra
S3




~1 PB of data stored in Amazon S3
Visualizations
Bank – Monte Carlo Simulations
                    “The AWS platform was a good fit for its
                 unlimited and flexible computational power to

23 Hours to         our risk-simulation process requirements.

                 With AWS, we now have the power to decide

20 Minutes         how fast we want to obtain simulation
                 results, and, more importantly, we have the
                 ability to run simulations not possible before
                  due to the large amount of infrastructure
                   required.” – Castillo, Director, Bankinter
Recommendations




        The Taste Test
http://www.etsy.com/tastetest
Recommendations

Gift Ideas for Facebook Friends




         etsy.com/gifts
Big Data and the Cloud a Best Friend Story
Click Stream Analysis


  User recently
   purchased a        Targeted Ad
sports movie and
                      (1.7 Million per day)
 is searching for
   video games
Characteristics of
    Big Data



              How the Cloud Is
            Big Data’s Best Friend


                        Big Data on the Cloud
                          In the Real World
Questions?
죠 지글러
테크니컬 에벤젤리스트
zieglerj@amazon.com   @jiyosub

More Related Content

What's hot

Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Amazon Web Services
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudDenodo
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in DeltaDatabricks
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureJoey Bolduc-Gilbert
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Amazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueAmazon Web Services
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudKent Graziano
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep DiveAmazon Web Services
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsAmazon Web Services
 

What's hot (20)

Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa Architecture
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Data engineering
Data engineeringData engineering
Data engineering
 
Google Cloud Platform Data Storage
Google Cloud Platform Data StorageGoogle Cloud Platform Data Storage
Google Cloud Platform Data Storage
 
Apache Arrow - An Overview
Apache Arrow - An OverviewApache Arrow - An Overview
Apache Arrow - An Overview
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 

Viewers also liked

Accelerating Your Connection to the Cloud
Accelerating Your Connection to the CloudAccelerating Your Connection to the Cloud
Accelerating Your Connection to the CloudAmazon Web Services
 
Delivering on the promise of the cloud for digital media, aspera on demand
Delivering on the promise of the cloud for digital media, aspera on demandDelivering on the promise of the cloud for digital media, aspera on demand
Delivering on the promise of the cloud for digital media, aspera on demandAmazon Web Services
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Datawaheed751
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and SharkYahooTechConference
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...DataStax Academy
 
Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataData Con LA
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonGreg Kirchoff
 
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaCassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaDataStax Academy
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation BriefBoni Bruno
 
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...Amazon Web Services
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
 
How to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - WebinarHow to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - WebinarAmazon Web Services
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the CloudDATAVERSITY
 
PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015Krishna-Kumar
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemBojan Babic
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData, Inc.
 
Towards secure and dependable storage service in cloud
Towards secure and dependable storage service in cloudTowards secure and dependable storage service in cloud
Towards secure and dependable storage service in cloudsibidlegend
 

Viewers also liked (20)

Spark and shark
Spark and sharkSpark and shark
Spark and shark
 
16h30 p duff-big-data-final
16h30   p duff-big-data-final16h30   p duff-big-data-final
16h30 p duff-big-data-final
 
Accelerating Your Connection to the Cloud
Accelerating Your Connection to the CloudAccelerating Your Connection to the Cloud
Accelerating Your Connection to the Cloud
 
Delivering on the promise of the cloud for digital media, aspera on demand
Delivering on the promise of the cloud for digital media, aspera on demandDelivering on the promise of the cloud for digital media, aspera on demand
Delivering on the promise of the cloud for digital media, aspera on demand
 
AWS Introduction - Ryland
AWS Introduction - RylandAWS Introduction - Ryland
AWS Introduction - Ryland
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueData
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with Isilon
 
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at OoyalaCassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
Cassandra Meetup: Real-time Analytics using Cassandra, Spark and Shark at Ooyala
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation Brief
 
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
How to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - WebinarHow to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
How to Extend your Datacenter into the Cloud - 2nd Watch - Webinar
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the Cloud
 
PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark Ecosystem
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 Overview
 
Towards secure and dependable storage service in cloud
Towards secure and dependable storage service in cloudTowards secure and dependable storage service in cloud
Towards secure and dependable storage service in cloud
 

Similar to Big Data and the Cloud a Best Friend Story

Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...Amazon Web Services
 
Building Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYCBuilding Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYCAmazon Web Services
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonKhazret Sapenov
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analyticsAmazon Web Services
 
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...Amazon Web Services
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your businessAcunu
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyHitachi Vantara
 
Esri Application on AWS Cloud Webinar
Esri Application on AWS Cloud WebinarEsri Application on AWS Cloud Webinar
Esri Application on AWS Cloud WebinarAmazon Web Services
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudKhazret Sapenov
 
Infochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the CloudInfochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the CloudBrian Krpec
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudAmazon Web Services
 
Architecting Data Lakes on AWS
Architecting Data Lakes on AWSArchitecting Data Lakes on AWS
Architecting Data Lakes on AWSSajith Appukuttan
 

Similar to Big Data and the Cloud a Best Friend Story (20)

Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
The Cloud Changing the Game
The Cloud Changing the GameThe Cloud Changing the Game
The Cloud Changing the Game
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...Keynote: Your Future With Cloud Computing - Dr. Werner Vogels  - AWS Summit 2...
Keynote: Your Future With Cloud Computing - Dr. Werner Vogels - AWS Summit 2...
 
Building Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYCBuilding Web Applications on AWS - AWS Summit 2012 - NYC
Building Web Applications on AWS - AWS Summit 2012 - NYC
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawson
 
Internet of Things
Internet of ThingsInternet of Things
Internet of Things
 
Big Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS CloudBig Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS Cloud
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
AWS for Media: Content in the Cloud, Miles Ward (Amazon Web Services) and Bha...
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage Strategy
 
Esri Application on AWS Cloud Webinar
Esri Application on AWS Cloud WebinarEsri Application on AWS Cloud Webinar
Esri Application on AWS Cloud Webinar
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
Infochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the CloudInfochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the Cloud
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
Architecting Data Lakes on AWS
Architecting Data Lakes on AWSArchitecting Data Lakes on AWS
Architecting Data Lakes on AWS
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 

Recently uploaded (20)

UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 

Big Data and the Cloud a Best Friend Story

  • 1. Amazon Web Services Big Data and the Cloud: A Best Friend Story
  • 4. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World
  • 6. BIG DATA When your data sets become so large that you have to start innovating how to collect, store, organize, analyze and share it
  • 7. Bigger Data is Better Data
  • 8. Features driven by MapReduce
  • 9. Bigger Data is Harder Data
  • 10. Big Data is Getting Bigger Unconstrained data growth 95% of the 1.2 zettabytes of ZB data in the digital universe is unstructured 70% of of this is user- EB generated content Unstructured data growth explosive, with estimates of PB compound annual growth (CAGR) at 62% from 2008 – GB TB 2012. Source: IDC
  • 11. Big Data is Hard and getting harder Changing Data Requirements Faster response time of fresher data Sampling is not good enough & history is important Increasing complexity of analytics Users demand inexpensive experimentation
  • 12. Where is it Coming From? Computer Generated Human Generated • Application server logs • Twitter “Fire Hose” 50m (web sites, games) tweets/day 1,400% growth • Sensor data per year (weather, water, smart • Blogs/Reviews/Emails/Pict grids) ures • Images/videos • Social Graphs: Facebook, (traffic, security cameras) Linked-in, Contacts
  • 13. Storage Big Data Compute Big Data How quickData has gravity it? do you need to read App Data App http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
  • 14. Storage Big Data Compute Big Data …and inertia atto read quick do you need volume… How…and inertia at volume… it? Data http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
  • 15. Storage Big Data Compute Big Data …easierquick inertiaapplications to the data to move need to read How…and do youat volume… it? Data http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
  • 16. The Role of Data is Changing
  • 17. Until now, Questions you ask drove Data model New model is collect as much data as possible – “Data-First Philosophy”
  • 18. Data is the new raw material for Data is theanyraw material for on business on par new business any par with with capital, people, labor capital, people, labor
  • 19. We Need Tools Built Specifically for Big Data
  • 20. Hadoop • Scale out Easily • Solves some Problems • Parallel Computing • Complex to Run • Commodity Hardware • Special Skills to Maintain
  • 21. How the Cloud Is Big Data’s Best Friend
  • 22. How do we define the cloud? By Benefits!
  • 23. No Cap Ex Pay Per Elasticity Use Cloud Fast Time to Market Focus on core competency
  • 24. Why is the Cloud Big Data’s Best Friend
  • 25. We know we want collect, store, organize, analyze and share it. But we have limited resources.
  • 26. The Cloud Optimizes Precious IT Resources i.e. Skilled People
  • 27. “Over the next decade, the number of files or containers that encapsulate the information in the digital universe will grow by 75x. While the pool of IT staff available to manage them will grow only slightly. At 1.5x” - 2011 IDC Digital Universe Study
  • 28. Deploying a Hadoop cluster is hard
  • 29. Cloud computing 30% 70% The Old Using Big Managing All of the IT World Data “Undifferentiated Heavy Lifting”
  • 30. Cloud computing 30% 70% The Old Using Big Managing All of the IT World Data “Undifferentiated Heavy Lifting” Configuring Cloud-Based Analyzing and Using Big Data Cloud Infrastructure Assets 70% 30%
  • 31. The Cloud Reduces Cost For Experimentation
  • 32. Managed Reusability Services Scale Innovation
  • 33. Managed Reusability Services Scale Innovation
  • 34. Managed Reusability Services Scale Innovation
  • 35. Managed Reusability Services Scale Innovation
  • 36. Managed Reusability Services Scale Innovation
  • 38. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  • 39. Elastic Compute Capacity WASTE On and Off Fast Growth Variable peaks Predictable peaks CUSTOMER DISSATISFACTION
  • 40. Elastic Compute Capacity Capacity Traditional IT capacity Elastic cloud capacity Time Your IT needs
  • 41. Elastic Compute Capacity On and Off Fast Growth Variable peaks Predictable peaks
  • 42. The Cloud Empowers Users to Balance Cost and Time
  • 43. 1 instance for 500 hours = 500 instances for 1 hour
  • 44. Storage Big Data Compute Big Data From one instance… How quick do you need to read it?
  • 45. Storage Big Data Compute Big Data …to thousands How quick do you need to read it?
  • 47. AMAZON ELASTIC MAPREDUCE • Managed Hadoop offering in the cloud • Integration with other AWS services • Thousands of customers ran over 2 million clusters on EMR over the last year
  • 48. Prod Cluster S3 (EMR) EMR HDFS Data streamed directly from S3 to the cluster
  • 49. Prod Cluster S3 (EMR) EMR HDFS Results streamed back to S3
  • 50. Recommendation Ad-hoc Engine Analysis Personalization Prod Cluster S3 (EMR) EMR Data consumed in multiple ways
  • 51. Prod Cluster (EMR) S3 EMR Wide range of processing languages used
  • 52. The Cloud Enables Collection and Storage of Big Data
  • 53. Simple Storage Service 1 Trillion 1000.000 750.000 500.000 250.000 0.000 650k+ peak transactions per second
  • 54. Global Accessibility Region US-WEST (N. California) EU-WEST (Ireland) GOV CLOUD ASIA PAC (Tokyo) US-EAST (Virginia) US-WEST (Oregon) ASIA PAC (Singapore) SOUTH AMERICA (Sao Paulo)
  • 55. Amazon DynamoDB DynamoDB is a fully managed NoSQL database service that provides extremely fast and predictable performance with seamless scalability Zero Administration Low Latency SSD’s Reserved Capacity Unlimited Potential Storage and Throughput
  • 58. We know we want collect, store, organize, analyze and share it. But we have limited resources.
  • 59. Big Data on the Cloud In the Real World
  • 60. Big Data Verticals Social Media/Adverti Financial Oil & Gas Retail Life Sciences Security Network/Gami sing Services ng User Anti-virus Targeted Monte Carlo Demographics Recommend Advertising Simulations Seismic Genome Fraud Usage analysis Analysis Analysis Detection Image and Transactions Video Risk Analysis Analysis Image In-game Processing Recognition metrics
  • 62. Netflix Web Services (Honu) S3 8 TB of event data per day
  • 63. S3 Legacy Data Legacy data from on-premise Netflix Data Center data center
  • 64. Customer dimension data stored in Cassandra
  • 65. S3 ~1 PB of data stored in Amazon S3
  • 67. Bank – Monte Carlo Simulations “The AWS platform was a good fit for its unlimited and flexible computational power to 23 Hours to our risk-simulation process requirements. With AWS, we now have the power to decide 20 Minutes how fast we want to obtain simulation results, and, more importantly, we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter
  • 68. Recommendations The Taste Test http://www.etsy.com/tastetest
  • 69. Recommendations Gift Ideas for Facebook Friends etsy.com/gifts
  • 71. Click Stream Analysis User recently purchased a Targeted Ad sports movie and (1.7 Million per day) is searching for video games
  • 72. Characteristics of Big Data How the Cloud Is Big Data’s Best Friend Big Data on the Cloud In the Real World

Editor's Notes

  1. The more misspelled words you collect, the better is your spellcheck application
  2. Data volume. As the data volume increases, it becomes increasingly difficult to process the data. Easy for 1 box: Harder for many boxes. When the data exceeds the capacity of one place.Data structure. Data comes in variety of formats from logs files to database schema to images. The diversity in data structures and format grows as well. To analyze this data holistically it is required to consolidate data across multiple data sources and multiple formats. Since valuable data comes from various companies like facebook, and linked-in it is also required to consolidate data across businesses.
  3. According to IDC, 95% of the 1.2 zettabytes of data in the digital universe is unstructured; and 70% of of this is user-generated content. Unstructured data is also projected for explosive growth, with estimates of compound annual growth (CAGR) at 62% from 2008 - 2012.ChallengesUnconstrained growth
  4. Finally complexity increases because demands on data are changing. Business requires faster response time on fresher data. Sampling is not good enough, history is important. Did the customer purchase something in February because his friend has a birthday or because it was a valentine's day – this information can help figure out how to help this customer next February. SQL is simply not enough to drive some of the answers. Data scientist require access to other statistical tools or other programing languages. Finally and most importantly users demand inexpensive experimentation. Often times we don’t know what products or facts will come out of our analytics so we cannot justify large upfront investment.
  5. Computers typically generate data as byproduct of interacting with people or other with other device. The more interactions, typically there is more data. This data comes in a variety of formats from semi-structured logs to in unstructured binaries. This data can be extremely valuable. It can be used to understand and track application or service behavior so that we can find errors or suboptimal user experience. We can mind it for patterns and correlations to generate recommendations.For example ecommerce sites can analyze user access logs to provide product recommendations, social networking sites provide new friends recommendations, dating sites find qualified soul mates, and so fourth.
  6. Big data is important.
  7. Now the Philosophy around data has changed. The philosophy is collect as much data as possible before you know what questions you are going to ask and most importantly you don't know which algorithms you are going to ask because you don't know what type of questions I might need in future. The ultimate mantra of collect and measure everything. How you are going to refine those algorithms, how much data, how much processing power, you really don't know how much resources you really need. Big data is what clouds are for. Its Big data analysis and cloud computing is the perfect marriage.Free of constraintsCollect and Store without limitsCompute and Analyze without limitsVisualize without limites
  8. Data is the next industrial revolutionToday, the core of any successful company is the data it manages and its ability to effectively model, analyze and process that data quickly – almost in real time - so that it can make the right decision faster and rise to the top.
  9. These resources are even more precious because of the rarity of skills.
  10. Our goal, and what our customers tell us they see, is that this ratio is inverted after moving to AWS. When you move your infrastructure to the cloud, this changes things drastically. Only 30% of your time should be spent architecting for the cloud and configuring your assets. This gives you 70% of your time to focus on your business. Project teams are free to add value to the business and it's customers, to innovate more quickly, and to deliver products to market quickly as well.
  11. Our goal, and what our customers tell us they see, is that this ratio is inverted after moving to AWS. When you move your infrastructure to the cloud, this changes things drastically. Only 30% of your time should be spent architecting for the cloud and configuring your assets. This gives you 70% of your time to focus on your business. Project teams are free to add value to the business and it's customers, to innovate more quickly, and to deliver products to market quickly as well.
  12. New model is collect as much data as possible – “Data-First Philosophy”Allows us to collect data and ask questions laterAsk many different kinds of questions
  13. There are many patterns of usage that make capacity planning a complex science. From on and off usage patterns, where capacity is only needed at fixed times and not at others, fast growth where an online service becomes so successful that step changes in traditional capacity need to be added, variable peaks - where you just don't know what demand will be when and best guess applies, to predictable peaks such as during commute times as customers use mobile devices to access your service.
  14. Each of these examples is typified by wasted IT resources. Where you planned correctly, the IT resources will be over provisioned so that services are not impacted and customers lost during high demand. In the worst cases, that capacity will not be enough, and customer dissatisfaction will result. Most businesses have a mix differing patterns at play, and much time and resource is dedicated to planning and management to ensure services are always available. And when a new online service is really successful, you often can't ship in new capacity fast enough. Some say that's a nice problem to have, but those that have lived through it will tell you otherwise!
  15. Elasticity with AWS enables your provisioned capacity to follow demand. To scale up when needed and down when not. And as you only pay for what is used, the savings can be significant.
  16. You control how and when your service scales, so you can closely match increasing load in small increments, scale up fast when needed, and cool off and reduce the resources being used at any time of day. Even the most variable and complex demand patterns can be matched with the right amount of capacity - all automatically handled by AWS.
  17. Vertical scaling on commodity hardware. Perfect for Hadoop.
  18. Elasticity works from just 1 EC2 instance to many thousands. Just dial up and down as required.
  19. New model is collect as much data as possible – “Data-First Philosophy”Allows us to collect data and ask questions laterAsk many different kinds of questions
  20. This is supported on the AWS cloud via Amazon Elastic MapReduce its managed Hadoop service. The EMR team’s reason for living is making Hadoop, and Big Data processing, just work in the cloud. Over the last year this has led to over 2 million clusters being run on the platform by thousands of paying customers. The EMR team is also focused on ensuring that Hadoop integrates seamlessly with other AWS services, not only supporting using Amazon S3 as a file system but also integrating with CW, our cloud-based monitoring service, and DynamoDB, our managed NoSQL offering.
  21. Netflix runs a persistent SLA-driven prod cluster to generate summary data and aggregate reports each day from the streaming data. The raw log data is streamed directly into the cluster from Amazon S3 with only intermediate data stored on HDFS on the cluster.
  22. The processed data is then streamed back into Amazon S3 where it is accessible by other teams including personalization/recommendation services.
  23. The processed data is then streamed back into Amazon S3 where it is accessible by other teams including personalization/recommendation services and to analysts through a real-time custom visualization tool called Sting.
  24. Netflix also uses a wide range of languages for data processing, including Pig for ETL, Hive for sql-driven analytics, python for streaming jobs, and java map/reduce.
  25. And scale is something AWS is used to dealing with. The Amazon Simple Storage Service, S3, recently passed 1 trillion objects in storage, with a peak transaction rate of 650 thousand per second. That's a lot of objects, all stored with 11 9's of durability.
  26. And just like an electricity grid, where you would not wire every factory to the same power station, the AWS infrastructure is global, with multiple regions around the globe from which services are available. This means you have control over things like where you applications run, where you data is stored, and where best to serve your customers from.
  27. Based on 15 years of experience . Originates from the NoSQL solution used in ecommerce side of business known as Dynamo this original No SQL solution is described in a paper we released in 2007 which is freely available. http://www.allthingsdistributed.com/2007/10/amazons_dynamo.htmlConsole or api to define tables – we take care of provisioning & durabilitySolid State Disks You define how much you wish to reserve for reads and writes. DynamoDB will reserve the necessary machine resources to meet your throughput needs while ensuring consistent, low-latency performance.Can raise default limits
  28. Netflix streams 8 TB of data into the cloud per day. This is collected, aggregated, and pushed to Amazon S3 via a fleet of EC2 servers running Apache Chukwa.
  29. This is supplemented with legacy data, such as customer service info, from Netflix’s on-premise data center.
  30. Low latency access to customer dimension data is served from a Cassandra deployment in the cloud.
  31. How do you efficiently, and cost effectively, analyze all of that data?
  32. Global reach (North Pole, Space)Native app every smartphoneSMSwebmobile-web10M+ users, 15M+ venues, ~1B check-insTerabytes of log data
  33. Bank at least 400,000 simulations to get realistic results.23 hours to 20 minutes and dramatically reduced processing, with the ability to reduce even further when required.Bankinter uses Amazon Web Services (AWS) as an integral part of their credit-risk simulation application, developing complex algorithms to simulate diverse scenarios in order to evaluate the financial health of their clients. “This requires high computational power,” says Bankinter Director of New Technologies Pedro Castillo. “We need to execute at least 400,000 simulations to get realistic results.”
  34. One result of such experimentation is Taste Test which is a recommendations product that helps Etsy figure out your tastes and to offer you relevant products. It works like this, you see 6 images at a time and you pick an image you like the most. You iterate through these sets of images a few times (you can also skip a set if you don’t like any images) and after a few iterations, Etsy displays the products that are most relevant to you. I encourage you to try – it’s a lot of fun.Today, Etsy uses Amazon Elastic MapReduce for web log analysis and recommendation algorithms. Because AWS easily and economically processes enormous amounts of data, it’s ideal for the type of processing that Etsy performs. Etsy copies its HTTP server logs every hour to Amazon S3, and syncs snapshots of the production database on a nightly basis. The combination of Amazon’s products and Etsy’s syncing/storage operation provides substantial benefits for Etsy. As Dr. Jason Davis, lead scientist at Etsy, explains, “the computing power available with [Amazon Elastic MapReduce] allows us to run these operations over dozens or even hundreds of machines without the need for owning the hardware.”Dr. Davis goes on to say, “Amazon Elastic MapReduce enables us to focus on developing our Hadoop-based analysis stack without worrying about the underlying infrastructure. As our cycles shift between development and research, our software and analysis requirements change and expand constantly, and [Amazon Elastic MapReduce] effectively eliminates half of our scaling issues, allowing us to focus on what is most important.”Etsy has realized improved results and performance by architecting their application for the cloud, with robustness and fault tolerance in mind, while providing a market for users to buy and sell handmade items online.
  35. Another example of such innovation is gift ideas. A lot of us struggle to pic the right present for our friends and so Etsy has a product that makes it easier. Etsy looks at your facebook social graph and learns about your interests and those of your friends. It uses this information to give you ideas for presents. For example, if your friend is an REM fan, Etsy may suggest a t-shirt with REM print on it.These innovative data products are just a few examples of innovation that is possible if we lower the cost barriers for data experimentation.
  36. Yelp is also doing product recommendations based on location, people reviews, or people searches. For example, “people who viewed this, viewed that” feature can help customers discover other relevant options in the area. People can discover interesting facts about places with “People viewed this after searching for that” feature. In this example, the westin hotel probably has glass elevators and is likely offers the best location to stay in san francisco at least by some definition of best.There is also “review highlights” feature. Yelp analyses written reviews and provides highlights about the places, so that their customers don’t have to read through all the reviews to get basic ideas about the place. All these differentiating features were possible because of Hadoop and flexible infrastructure for data processing.
  37. 500% increase in returns for advertising.Pedabytes of storage.There is a lot of data the retail business has about the users, it’s just never used it in advertising.For example, the retail knows that the customer has purchased a sports movie and is currently searching for video games, so it may make sense to advertise a sports video game for the customer.Efficient: Elastic infrastructure from AWS allows capacity to be provisioned as needed based on load, reducing cost and the risk of processing delays. Amazon Elastic MapReduce and Cascading lets Razorfish focus on application development without having to worry about time-consuming set-up, management, or tuning of Hadoop clusters or the compute capacity upon which they sit.Ease of integration: Amazon Elastic MapReduce with Cascading allows data processing in the cloud without any changes to the underlying algorithms.Flexible: Hadoop with Cascading is flexible enough to allow “agile” implementation and unit testing of sophisticated algorithms.Adaptable: Cascading simplifies the integration of Hadoop with external ad systems.Scalable: AWS infrastructure helps Razorfish reliably store and process huge (Petabytes) data sets.The AWS elastic infrastructure platform allows Razorfish to manage wide variability in load by provisioning and removing capacity as needed. Mark Taylor, Program Director at Razorfish, said, “With our implementation of Amazon Elastic MapReduce and Cascading, there was no upfront investment in hardware, no hardware procurement delay, and no additional operations staff was hired. We completed development and testing of our first client project in six weeks. Our process is completely automated. Total cost of the infrastructure averages around $13,000 per month. Because of the richness of the algorithm and the flexibility of the platform to support it at scale, our first client campaign experienced a 500% increase in their return on ad spend from a similar campaign a year before.”