SlideShare une entreprise Scribd logo
1  sur  15
The Elephant In The Room
 Big Data Analytics In the Cloud
    Bill Peer, Principal, Infosys Labs
                 UP 2012
      Cloud Computing Conference
San Francisco, California – December 12, 2012
What’s on the agenda?
•   Definitions
•   Big Data and Analytic Technologies
•   Architecture Stuff
•   Summary




Appendix : References

                                         2
Definitions – Big Data
• Big Data –       data processing scenarios wherein the volume,
                   variety, and/or velocity of the data is such that
                   conventional RDBMS and/or Data Warehouse
                   technologies alone do not suffice for the need
,   Bill’s Stake In The Ground:
          • Volume - Greater than 100 GB
          • Variety    - Structured and Unstructured (forms, video, blogs, photos, …)
          • Velocity - 10 GB per hour




                                         3
Definitions – Analytics
• Analytics –   discovery of meaningful patterns in data

  Bill’s Two Uses:
        • Decision Support (to help make a choice)
                    -Business Intelligence
                    -Operational Intelligence
        • Value Creation (to add worth)
                    -Algorithm Discovery
                    -Analytics as a Service




                                       4
Past, Present, Future




(images sourced from WikiCommons)




                                    5
Big Data and Analytic Technologies




                      6
Big Data and Analytic Technology : 3 to Know


• Based on Google Paper published in 2004 (MapReduce)
• Can be segmented into 2 key capabilities: MapReduce and HDFS
• Designed to work in a distributed, fault possible environment
 MapReduce –              HDFS –                    Job Based!
 Processing               Hadoop File System
 Orchestration            (Reliable independent     Pig Latin - Language to explore data
 Framework                of persistence            Hive QL– SQL like calls
 (Great if a problem      mechanism by way of       Mahout – Machine Learning collection
 can be easily divided)   multi-node replication)


                                              7
Big Data and Analytic Technology : 3 to Know
                                DRILL
• Based on Google Paper published in 2010 (Dremel)
• Provides analysis of large-scale datasets
• Designed to work in a distributed environment
 Query Languages-   Low-Latency        Apache Incubator Phase
 Google BigQuery    Distributed
                                       “[Dremel] is capable of running aggregation
                    Execution –
                                       queries over trillion-row tables in seconds.
                    Columnar centric   The system scales to thousands of CPUs
                    storage            and petabytes of data, and has thousands of
                                       users at Google.” src: Google Dremel Paper

                                  8
Big Data and Analytic Technology : 3 to Know
                                      Storm
• Event Streaming platform used by Twitter
• Allows for continuous real-time data spelunking
• Designed to work in a distributed environment leveraging clusters
 Resident Queries-      Topology Centric-   Event Streaming is Different
 Requests for event     You create graphs
                        of computation      Storm can be used effectively to build a
 patterns of interest
                                            Complex Event Processing (CEP)
 are continuously                           capability by an enterprise. As with other
 watched for                                CEP type frameworks, it requires a shift to
                                            an uncommon perspective to be effective.

                                        9
A Cloud Centric Big Data NRT Architecture
                   CEP
                                Interactive
                                   Query



                                                  *Architecture
                                                  Graphic is a
                                                  modified version of
                                                  WSO2’s BAM picture




    Not Cloud        In Cloud                 Not Cloud

                         10
Big, Big Data Analytic Architecture Consideration
• Data Transfer Speed
   • Where is your data? Is it where you will be processing?
       • 1TB of Data takes:
           • 300 hours over a 10Mbps network
           • 30 hours over a 100Mbps network
           • 3 hours over a 1Gbps network
           • 20 minutes over a 10Gbps network




                                   11
Framework For Selecting Approach




                     12
Summary
•   “approaches for near-real time Business Intelligence and Analytics”
•   “Info. on technologies ranging from Hadoop to Dremel to Event Streaming “
•   “applicability and limitations of these when in the Cloud”
•   “high-level architectures that must be considered will be shared”
•   “entertained, energized, and enlightened”
•   “realistic frame of reference to bring back to their organization”
•   “Journey to the Clouds”
•   “Dumbo can really fly”



                                        13
Feedback Forms
Please extract from your wallet
One of the feedback forms to the right

Add any commentary you have in the
White space, and hand to the
Presenter after the session


Thank you for attending!
See you in the Clouds!




                                         14
References
• Big Data Spectrum, Infosys
         http://www.infosys.com/cloud/resource-center/Pages/big-data-spectrum.aspx
• Dremel: Interactive Analysis of Web-Scale Datasets, Melnik et. all, Google
         http://research.google.com/pubs/pub36632.html
• DrillProposal, Apache
         http://wiki.apache.org/incubator/DrillProposal
• Storm Rationale
         https://github.com/nathanmarz/storm/wiki/Rationale
• WSO2 BAM, wso2
         http://wso2.com/products/business-activity-monitor/



                                            15

Contenu connexe

Tendances

Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
 
AWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloudAWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloudAmazon Web Services
 
[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩NAVER D2
 
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7Vijay Srinivas Agneeswaran, Ph.D
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance ComputingUmarudin Zaenuri
 
"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies
"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies
"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream TechnologiesDataconomy Media
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesKinetica
 
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsHow To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsKinetica
 
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...Databricks
 
Introduction to SQream and the IoT environment
Introduction to SQream and the IoT environmentIntroduction to SQream and the IoT environment
Introduction to SQream and the IoT environmentArnon Shimoni
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersRommel Garcia
 
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...Big Data Week
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Maya Lumbroso
 

Tendances (20)

Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
AWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloudAWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloud
 
[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩[161] 데이터사이언스팀 빌딩
[161] 데이터사이언스팀 빌딩
 
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7Distributed deep learning_framework_spark_4_may_2015_ver_0.7
Distributed deep learning_framework_spark_4_may_2015_ver_0.7
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance Computing
 
"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies
"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies
"Democratizing Big Data", Ami Gal, CEO & Co-Founder of SQream Technologies
 
GPU Acceleration for Financial Services
GPU Acceleration for Financial ServicesGPU Acceleration for Financial Services
GPU Acceleration for Financial Services
 
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUsHow To Achieve Real-Time Analytics On A Data Lake Using GPUs
How To Achieve Real-Time Analytics On A Data Lake Using GPUs
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
 
Introduction to SQream and the IoT environment
Introduction to SQream and the IoT environmentIntroduction to SQream and the IoT environment
Introduction to SQream and the IoT environment
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data Centers
 
WTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The FundamentalsWTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The Fundamentals
 
Cluster and Grid Computing
Cluster and Grid ComputingCluster and Grid Computing
Cluster and Grid Computing
 
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
BDW16 London - Deenar Toraskar, Think Reactive - Fast Data Key to Efficient C...
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 

Similaire à The elephantintheroom bigdataanalyticsinthecloud

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @ScaleDr Hajji Hicham
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESSysFera
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStackNati Shalom
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDGPrateek Jain
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Dr. Anita Goel
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.pptVipin Singhal
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.pptgeminass1
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 

Similaire à The elephantintheroom bigdataanalyticsinthecloud (20)

Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Processing Drone data @Scale
Processing Drone data @ScaleProcessing Drone data @Scale
Processing Drone data @Scale
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TES
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStack
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
In memory grids IMDG
In memory grids IMDGIn memory grids IMDG
In memory grids IMDG
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017Big data and cloud computing 9 sep-2017
Big data and cloud computing 9 sep-2017
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
CloudComputingJun28.ppt
CloudComputingJun28.pptCloudComputingJun28.ppt
CloudComputingJun28.ppt
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 

Plus de Khazret Sapenov

V mware evolutionary cloud 12 2012
V mware evolutionary cloud 12 2012V mware evolutionary cloud 12 2012
V mware evolutionary cloud 12 2012Khazret Sapenov
 
Virtual sharp cloud aware bc dr up 2012 cloud
Virtual sharp cloud aware bc dr up 2012 cloudVirtual sharp cloud aware bc dr up 2012 cloud
Virtual sharp cloud aware bc dr up 2012 cloudKhazret Sapenov
 
Up2012edit daniel chalef
Up2012edit daniel chalefUp2012edit daniel chalef
Up2012edit daniel chalefKhazret Sapenov
 
Up2012 scaling my sql in the cloud by moshe shadmon, founder, cto scaledb
Up2012  scaling my sql in the cloud by moshe shadmon, founder, cto scaledbUp2012  scaling my sql in the cloud by moshe shadmon, founder, cto scaledb
Up2012 scaling my sql in the cloud by moshe shadmon, founder, cto scaledbKhazret Sapenov
 
Up 2012 smart cloud presentation_final
Up 2012   smart cloud presentation_finalUp 2012   smart cloud presentation_final
Up 2012 smart cloud presentation_finalKhazret Sapenov
 
Up 2012 wally mac dermid - final
Up 2012   wally mac dermid - finalUp 2012   wally mac dermid - final
Up 2012 wally mac dermid - finalKhazret Sapenov
 
Up 2012 dave jilk - multi-tenancy in paa s (distribution version)
Up 2012   dave jilk - multi-tenancy in paa s (distribution version)Up 2012   dave jilk - multi-tenancy in paa s (distribution version)
Up 2012 dave jilk - multi-tenancy in paa s (distribution version)Khazret Sapenov
 
Transverse up cloud 2012 - final
Transverse   up cloud 2012 - finalTransverse   up cloud 2012 - final
Transverse up cloud 2012 - finalKhazret Sapenov
 
Transforming cloud infrastructure to support big data storage and workflows b...
Transforming cloud infrastructure to support big data storage and workflows b...Transforming cloud infrastructure to support big data storage and workflows b...
Transforming cloud infrastructure to support big data storage and workflows b...Khazret Sapenov
 
Taking control of bring your own device byod with desktops as a service (daa ...
Taking control of bring your own device byod with desktops as a service (daa ...Taking control of bring your own device byod with desktops as a service (daa ...
Taking control of bring your own device byod with desktops as a service (daa ...Khazret Sapenov
 
Rethink cloud security to get ahead of the risk curve by kurt johnson, vice p...
Rethink cloud security to get ahead of the risk curve by kurt johnson, vice p...Rethink cloud security to get ahead of the risk curve by kurt johnson, vice p...
Rethink cloud security to get ahead of the risk curve by kurt johnson, vice p...Khazret Sapenov
 
Regulatory compliant cloud computing rethinking web application architectures...
Regulatory compliant cloud computing rethinking web application architectures...Regulatory compliant cloud computing rethinking web application architectures...
Regulatory compliant cloud computing rethinking web application architectures...Khazret Sapenov
 
Memsql product overview_2013
Memsql product overview_2013Memsql product overview_2013
Memsql product overview_2013Khazret Sapenov
 
Managing application performance for cloud apps bmc
Managing application performance for cloud apps bmcManaging application performance for cloud apps bmc
Managing application performance for cloud apps bmcKhazret Sapenov
 
Glenn solomon up presso d 3.pptx
Glenn solomon up presso d 3.pptxGlenn solomon up presso d 3.pptx
Glenn solomon up presso d 3.pptxKhazret Sapenov
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonKhazret Sapenov
 
Efrat ip up con 2012 presentation
Efrat ip up con 2012 presentationEfrat ip up con 2012 presentation
Efrat ip up con 2012 presentationKhazret Sapenov
 
Decentralized cloud an industrial reality with higher resilience by jean-pa...
Decentralized cloud   an industrial reality with higher resilience by jean-pa...Decentralized cloud   an industrial reality with higher resilience by jean-pa...
Decentralized cloud an industrial reality with higher resilience by jean-pa...Khazret Sapenov
 

Plus de Khazret Sapenov (20)

V mware evolutionary cloud 12 2012
V mware evolutionary cloud 12 2012V mware evolutionary cloud 12 2012
V mware evolutionary cloud 12 2012
 
Virtual sharp cloud aware bc dr up 2012 cloud
Virtual sharp cloud aware bc dr up 2012 cloudVirtual sharp cloud aware bc dr up 2012 cloud
Virtual sharp cloud aware bc dr up 2012 cloud
 
Up2012edit daniel chalef
Up2012edit daniel chalefUp2012edit daniel chalef
Up2012edit daniel chalef
 
Up2012 scaling my sql in the cloud by moshe shadmon, founder, cto scaledb
Up2012  scaling my sql in the cloud by moshe shadmon, founder, cto scaledbUp2012  scaling my sql in the cloud by moshe shadmon, founder, cto scaledb
Up2012 scaling my sql in the cloud by moshe shadmon, founder, cto scaledb
 
Up 2012 smart cloud presentation_final
Up 2012   smart cloud presentation_finalUp 2012   smart cloud presentation_final
Up 2012 smart cloud presentation_final
 
Up 2012 wally mac dermid - final
Up 2012   wally mac dermid - finalUp 2012   wally mac dermid - final
Up 2012 wally mac dermid - final
 
Up 2012 dave jilk - multi-tenancy in paa s (distribution version)
Up 2012   dave jilk - multi-tenancy in paa s (distribution version)Up 2012   dave jilk - multi-tenancy in paa s (distribution version)
Up 2012 dave jilk - multi-tenancy in paa s (distribution version)
 
Transverse up cloud 2012 - final
Transverse   up cloud 2012 - finalTransverse   up cloud 2012 - final
Transverse up cloud 2012 - final
 
Transforming cloud infrastructure to support big data storage and workflows b...
Transforming cloud infrastructure to support big data storage and workflows b...Transforming cloud infrastructure to support big data storage and workflows b...
Transforming cloud infrastructure to support big data storage and workflows b...
 
Taking control of bring your own device byod with desktops as a service (daa ...
Taking control of bring your own device byod with desktops as a service (daa ...Taking control of bring your own device byod with desktops as a service (daa ...
Taking control of bring your own device byod with desktops as a service (daa ...
 
Rethink cloud security to get ahead of the risk curve by kurt johnson, vice p...
Rethink cloud security to get ahead of the risk curve by kurt johnson, vice p...Rethink cloud security to get ahead of the risk curve by kurt johnson, vice p...
Rethink cloud security to get ahead of the risk curve by kurt johnson, vice p...
 
Regulatory compliant cloud computing rethinking web application architectures...
Regulatory compliant cloud computing rethinking web application architectures...Regulatory compliant cloud computing rethinking web application architectures...
Regulatory compliant cloud computing rethinking web application architectures...
 
Memsql product overview_2013
Memsql product overview_2013Memsql product overview_2013
Memsql product overview_2013
 
Managing application performance for cloud apps bmc
Managing application performance for cloud apps bmcManaging application performance for cloud apps bmc
Managing application performance for cloud apps bmc
 
Making case up
Making case upMaking case up
Making case up
 
Green qloud up-con
Green qloud up-conGreen qloud up-con
Green qloud up-con
 
Glenn solomon up presso d 3.pptx
Glenn solomon up presso d 3.pptxGlenn solomon up presso d 3.pptx
Glenn solomon up presso d 3.pptx
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawson
 
Efrat ip up con 2012 presentation
Efrat ip up con 2012 presentationEfrat ip up con 2012 presentation
Efrat ip up con 2012 presentation
 
Decentralized cloud an industrial reality with higher resilience by jean-pa...
Decentralized cloud   an industrial reality with higher resilience by jean-pa...Decentralized cloud   an industrial reality with higher resilience by jean-pa...
Decentralized cloud an industrial reality with higher resilience by jean-pa...
 

The elephantintheroom bigdataanalyticsinthecloud

  • 1. The Elephant In The Room Big Data Analytics In the Cloud Bill Peer, Principal, Infosys Labs UP 2012 Cloud Computing Conference San Francisco, California – December 12, 2012
  • 2. What’s on the agenda? • Definitions • Big Data and Analytic Technologies • Architecture Stuff • Summary Appendix : References 2
  • 3. Definitions – Big Data • Big Data – data processing scenarios wherein the volume, variety, and/or velocity of the data is such that conventional RDBMS and/or Data Warehouse technologies alone do not suffice for the need , Bill’s Stake In The Ground: • Volume - Greater than 100 GB • Variety - Structured and Unstructured (forms, video, blogs, photos, …) • Velocity - 10 GB per hour 3
  • 4. Definitions – Analytics • Analytics – discovery of meaningful patterns in data Bill’s Two Uses: • Decision Support (to help make a choice) -Business Intelligence -Operational Intelligence • Value Creation (to add worth) -Algorithm Discovery -Analytics as a Service 4
  • 5. Past, Present, Future (images sourced from WikiCommons) 5
  • 6. Big Data and Analytic Technologies 6
  • 7. Big Data and Analytic Technology : 3 to Know • Based on Google Paper published in 2004 (MapReduce) • Can be segmented into 2 key capabilities: MapReduce and HDFS • Designed to work in a distributed, fault possible environment MapReduce – HDFS – Job Based! Processing Hadoop File System Orchestration (Reliable independent Pig Latin - Language to explore data Framework of persistence Hive QL– SQL like calls (Great if a problem mechanism by way of Mahout – Machine Learning collection can be easily divided) multi-node replication) 7
  • 8. Big Data and Analytic Technology : 3 to Know DRILL • Based on Google Paper published in 2010 (Dremel) • Provides analysis of large-scale datasets • Designed to work in a distributed environment Query Languages- Low-Latency Apache Incubator Phase Google BigQuery Distributed “[Dremel] is capable of running aggregation Execution – queries over trillion-row tables in seconds. Columnar centric The system scales to thousands of CPUs storage and petabytes of data, and has thousands of users at Google.” src: Google Dremel Paper 8
  • 9. Big Data and Analytic Technology : 3 to Know Storm • Event Streaming platform used by Twitter • Allows for continuous real-time data spelunking • Designed to work in a distributed environment leveraging clusters Resident Queries- Topology Centric- Event Streaming is Different Requests for event You create graphs of computation Storm can be used effectively to build a patterns of interest Complex Event Processing (CEP) are continuously capability by an enterprise. As with other watched for CEP type frameworks, it requires a shift to an uncommon perspective to be effective. 9
  • 10. A Cloud Centric Big Data NRT Architecture CEP Interactive Query *Architecture Graphic is a modified version of WSO2’s BAM picture Not Cloud In Cloud Not Cloud 10
  • 11. Big, Big Data Analytic Architecture Consideration • Data Transfer Speed • Where is your data? Is it where you will be processing? • 1TB of Data takes: • 300 hours over a 10Mbps network • 30 hours over a 100Mbps network • 3 hours over a 1Gbps network • 20 minutes over a 10Gbps network 11
  • 12. Framework For Selecting Approach 12
  • 13. Summary • “approaches for near-real time Business Intelligence and Analytics” • “Info. on technologies ranging from Hadoop to Dremel to Event Streaming “ • “applicability and limitations of these when in the Cloud” • “high-level architectures that must be considered will be shared” • “entertained, energized, and enlightened” • “realistic frame of reference to bring back to their organization” • “Journey to the Clouds” • “Dumbo can really fly” 13
  • 14. Feedback Forms Please extract from your wallet One of the feedback forms to the right Add any commentary you have in the White space, and hand to the Presenter after the session Thank you for attending! See you in the Clouds! 14
  • 15. References • Big Data Spectrum, Infosys http://www.infosys.com/cloud/resource-center/Pages/big-data-spectrum.aspx • Dremel: Interactive Analysis of Web-Scale Datasets, Melnik et. all, Google http://research.google.com/pubs/pub36632.html • DrillProposal, Apache http://wiki.apache.org/incubator/DrillProposal • Storm Rationale https://github.com/nathanmarz/storm/wiki/Rationale • WSO2 BAM, wso2 http://wso2.com/products/business-activity-monitor/ 15