SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Big Data & Cloud

Infinite Monkey Theorem

  CloudCon Expo & Conference
        October, 2012
First
What is Big Data?

“data sets so large and complex that it becomes
difficult to process using on-hand database
management tools.”


10/19/2012          Infochimps Confidential       2
Data Volume
                         Growing 44x




             2010 = 1.2                                  2020 = 35.2
             Zettabytes/yr                               Zettabytes/yr

                                                          Source: 2011 IDC Digital Universe Study
10/19/2012                     Infochimps Confidential                                              3
Enterprise Data Warehouse
       Request                            Answer
                         Parsing
         ?               Engines



                   BYNET Interconnect

Amp              Amp                               Amp
Node             Node                              Node




                                   ....


                                                          PARC | 4
Big Data Warehouse
Search      Recommend


         Rank
                              Analytic
                              Request                   Master:        Answer
 Score     Next-Best-Action                           Name Node
                                                      Job Tracker



                                            Ethernet Interconnect

                  Slave:                   Slave:                                 Slave:
                Task Trckr               Task Trckr                             Task Trckr
                Data Node                Data Node                              Data Node




                                                                                               Semi-
                                                                ....                         Structured
                                                                                                Data



                                                                                             PARC | 5
Real
             Time


                                    Traditional Operational
                                                 Application Ecosystem



                                                       Deployment in
                                Analytic             Public/Private Cloud
                               Appliances
                                                    Toolset Integration

                              Traditional
                            Decision Support        Hardened




             Batch
                 Large                                                        Small
               Enterprise                                                   Enterprise



10/19/2012                             Infochimps Confidential                           6
Next
Infinite Monkey Theorem (2):

an infinite number of monkeys hitting
keys on a typewriter for a period of time
will almost surely type a given text, such
as Shakespeare”s Hamlet.

10/19/2012      Infochimps Confidential   7
“unexperienced and unobservable“
                  based on
         “real experiences and real
                observations“



10/19/2012       Infochimps Confidential   8
““
Infinite Monkey Theorem (2):

an infinite number of monkeys hitting keys
on a typewriterfor a period of time will
   atypewriter for a period of time will
almost surely type a given text, such as
Shakespeare”s Hamlet.

10/19/2012       Infochimps Confidential     9
infinite number    keys on a               almost        Shakespeare”s
  of monkeys      typewriter               surely           Hamlet



   unlimited      processing             statistically      insights
 computational       data                significant
    power




10/19/2012             Infochimps Confidential                         10
#thisischimpy




10/19/2012      Infochimps Confidential   11
Problem
             “Little Data For Business Users“




10/19/2012              Infochimps Confidential   12
“Big Data For Business Users“




10/19/2012              Infochimps Confidential   15
Reduce
                                       Friction




                                                   $ $
                                                    $ $

                                                     ?




                                                  Executive
     Data


10/19/2012   Infochimps Confidential
                                                              16
#thisisreallygood




10/19/2012       Infochimps Confidential   17
Public


                    unlimited
                  computational
                     power
                                                  Private
             Virtual
             Private




10/19/2012              Infochimps Confidential             18
analysts use these images to
             count shipping containers
             coming off ships in California
             and are able to get a sense of
             overall US import activity



10/19/2012               Infochimps Confidential   19
Public


                          data
                       processing

                                                  Private
             Virtual
             Private




10/19/2012              Infochimps Confidential             20
Walmart




10/19/2012        Infochimps Confidential   21
Target




10/19/2012            Infochimps Confidential   22
Images      Web, Mobile, CRM,
             ERP, SCM…

                                              Business
  Docs,
                                           Transactions &
  Text                                      Interactions



  Web
  Logs                              SQL         NoSQL       NewSQL




  Social                            EDW         MPP         NewSQL




 Sensors                                      Business
                                           Intelligence &
                                              Analytics
             Dashboards, Reports
  GPS        Visualization…



10/19/2012       Infochimps Confidential                             23
Public


                       statistically
                       significant

                                                  Private
             Virtual
             Private




10/19/2012              Infochimps Confidential             24
#lotsofdata   + #simplealgorithms




10/19/2012            Infochimps Confidential   25
Cars
  In Lot




  News
  Text




   Web
  Pricing                                Quarterly
                                          Revenue
                                         Prediction
  Social
Sentiment



 Weather
 Sensors




  Local
Employment



  10/19/2012   Infochimps Confidential                26
Public




                        insights

                                                  Private
             Virtual
             Private




10/19/2012              Infochimps Confidential             27
New Media
                          Data Scientist                        App Developer
     Gnip
   Powertrack
                                                                                 Business Users


      Gnip
      EDC

                                                                                Sources   Sentiment

    Moreover
    Metabase
                     In-Motion
                    Data Delivery                                 APIs            Listening
                       Service                                                   Application
       TV
  Transcription
                                                      NoSQL

     Radio
  Transcription




      Print
  Transcription
                                           IT Staff
Traditional Media
     10/19/2012                       Infochimps Confidential                                28
unlimited     processing             statistically   insights
computational      data                significant
   power




10/19/2012           Infochimps Confidential                      29
#1BigDataCloudService



10/19/2012          Infochimps Confidential   30
#inspiredbyAvinashKaushik




10/19/2012            Infochimps Confidential   31

Contenu connexe

Tendances

Data Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryData Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryInside Analysis
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about DataBigDataExpo
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4jNeo4j
 
WEBINAR: Emerging Technologies in Supply Chain
WEBINAR: Emerging Technologies in Supply ChainWEBINAR: Emerging Technologies in Supply Chain
WEBINAR: Emerging Technologies in Supply ChainFlytBase
 
Linkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4jLinkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4jLinkurious
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017Ray Bugg
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersDataWorks Summit
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police storysemanticsconference
 
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...Dublinked .
 
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...Dublinked .
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlandssemanticsconference
 
Improving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APMImproving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APMElasticsearch
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Thailand
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Translating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsTranslating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsNeo4j
 
EclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science TrackEclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science TrackBoris Adryan
 
Accelerating Big Data Implementations for the Connected World
Accelerating Big Data Implementations for the Connected WorldAccelerating Big Data Implementations for the Connected World
Accelerating Big Data Implementations for the Connected WorldDataWorks Summit/Hadoop Summit
 
Action from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightAction from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightStampedeCon
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data opsLars Albertsson
 

Tendances (20)

Data Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryData Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data Discovery
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
Introduction to Neo4j
Introduction to Neo4jIntroduction to Neo4j
Introduction to Neo4j
 
WEBINAR: Emerging Technologies in Supply Chain
WEBINAR: Emerging Technologies in Supply ChainWEBINAR: Emerging Technologies in Supply Chain
WEBINAR: Emerging Technologies in Supply Chain
 
Linkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4jLinkurious Enterprise: graph visualization platform neo4j
Linkurious Enterprise: graph visualization platform neo4j
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for Managers
 
Session 2.3 semantics for safeguarding & security – a police story
Session 2.3   semantics for safeguarding & security – a police storySession 2.3   semantics for safeguarding & security – a police story
Session 2.3 semantics for safeguarding & security – a police story
 
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
 
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
Agile v Warehouse? Maurice Lynch CEO of Nathaen Technologies - Dublinked Data...
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
Session 1.1 linked data applied: a field report from the netherlands
Session 1.1   linked data applied: a field report from the netherlandsSession 1.1   linked data applied: a field report from the netherlands
Session 1.1 linked data applied: a field report from the netherlands
 
Improving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APMImproving Response Times at Optum with Elastic APM
Improving Response Times at Optum with Elastic APM
 
Data Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk ManagementData Science Application in Business Portfolio & Risk Management
Data Science Application in Business Portfolio & Risk Management
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
Translating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with GraphsTranslating the Human Analog to Digital with Graphs
Translating the Human Analog to Digital with Graphs
 
EclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science TrackEclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science Track
 
Accelerating Big Data Implementations for the Connected World
Accelerating Big Data Implementations for the Connected WorldAccelerating Big Data Implementations for the Connected World
Accelerating Big Data Implementations for the Connected World
 
Action from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data RightAction from Insight - Joining the 2 Percent Who are Getting Big Data Right
Action from Insight - Joining the 2 Percent Who are Getting Big Data Right
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
 

Similaire à Big Data & Cloud Expo

Big data - teams not technology
Big data - teams not technologyBig data - teams not technology
Big data - teams not technologyUpside Energy Ltd
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesShilpi Sharma
 
ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...
ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...
ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...Antidot
 
Lets Get Real, Open Source & the Contact Center - Astricon 2012 Keynote
Lets Get Real, Open Source & the Contact Center - Astricon 2012 KeynoteLets Get Real, Open Source & the Contact Center - Astricon 2012 Keynote
Lets Get Real, Open Source & the Contact Center - Astricon 2012 KeynoteClint Oram
 
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value Splunk
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Enterprise supercomputers ike nassi_v5
Enterprise supercomputers ike nassi_v5Enterprise supercomputers ike nassi_v5
Enterprise supercomputers ike nassi_v5Ike Nassi
 
PreSentation Cloud Conference
PreSentation Cloud ConferencePreSentation Cloud Conference
PreSentation Cloud Conferencecornflakes
 
From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012Anand Deshpande
 
Big data overview external
Big data overview externalBig data overview external
Big data overview externalBrett Colbert
 
Marie-Aude Aufaure keynote ieee cist 2014
Marie-Aude Aufaure keynote ieee cist 2014Marie-Aude Aufaure keynote ieee cist 2014
Marie-Aude Aufaure keynote ieee cist 2014ieee-cist
 
Cutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and DellCutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and DellAMD
 
Making your Analytics Investment Pay Off - StampedeCon 2012
Making your Analytics Investment Pay Off - StampedeCon 2012Making your Analytics Investment Pay Off - StampedeCon 2012
Making your Analytics Investment Pay Off - StampedeCon 2012StampedeCon
 
Solving Compliance for Big Data
Solving Compliance for Big DataSolving Compliance for Big Data
Solving Compliance for Big Datafbeckett1
 
Enabling a Data Driven Agile Business
Enabling a Data Driven Agile BusinessEnabling a Data Driven Agile Business
Enabling a Data Driven Agile BusinessTharindu Mathew
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonKhazret Sapenov
 

Similaire à Big Data & Cloud Expo (20)

Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Big data - teams not technology
Big data - teams not technologyBig data - teams not technology
Big data - teams not technology
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & Challenges
 
ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...
ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...
ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic ...
 
Lets Get Real, Open Source & the Contact Center - Astricon 2012 Keynote
Lets Get Real, Open Source & the Contact Center - Astricon 2012 KeynoteLets Get Real, Open Source & the Contact Center - Astricon 2012 Keynote
Lets Get Real, Open Source & the Contact Center - Astricon 2012 Keynote
 
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
SplunkLive: New Visibility=New Opportunity: How IT Can Drive Business Value
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Enterprise supercomputers ike nassi_v5
Enterprise supercomputers ike nassi_v5Enterprise supercomputers ike nassi_v5
Enterprise supercomputers ike nassi_v5
 
PreSentation Cloud Conference
PreSentation Cloud ConferencePreSentation Cloud Conference
PreSentation Cloud Conference
 
From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012From the Big Data keynote at InCSIghts 2012
From the Big Data keynote at InCSIghts 2012
 
Big data overview external
Big data overview externalBig data overview external
Big data overview external
 
Marie-Aude Aufaure keynote ieee cist 2014
Marie-Aude Aufaure keynote ieee cist 2014Marie-Aude Aufaure keynote ieee cist 2014
Marie-Aude Aufaure keynote ieee cist 2014
 
Cutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and DellCutting Big Data Down to Size with AMD and Dell
Cutting Big Data Down to Size with AMD and Dell
 
Big data use cases
Big data use casesBig data use cases
Big data use cases
 
Making your Analytics Investment Pay Off - StampedeCon 2012
Making your Analytics Investment Pay Off - StampedeCon 2012Making your Analytics Investment Pay Off - StampedeCon 2012
Making your Analytics Investment Pay Off - StampedeCon 2012
 
Solving Compliance for Big Data
Solving Compliance for Big DataSolving Compliance for Big Data
Solving Compliance for Big Data
 
Enabling a Data Driven Agile Business
Enabling a Data Driven Agile BusinessEnabling a Data Driven Agile Business
Enabling a Data Driven Agile Business
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawson
 

Plus de Jim Kaskade

Jim kaskade biography (updated)
Jim kaskade biography (updated)Jim kaskade biography (updated)
Jim kaskade biography (updated)Jim Kaskade
 
Woodside Residential Design Guidelines
Woodside Residential Design GuidelinesWoodside Residential Design Guidelines
Woodside Residential Design GuidelinesJim Kaskade
 
Woodside Glens Neighborhood Plan - Amended 1999
Woodside Glens Neighborhood Plan - Amended 1999Woodside Glens Neighborhood Plan - Amended 1999
Woodside Glens Neighborhood Plan - Amended 1999Jim Kaskade
 
Infochimps Hadoop Summit 2013
Infochimps Hadoop Summit 2013Infochimps Hadoop Summit 2013
Infochimps Hadoop Summit 2013Jim Kaskade
 
Infochimps TieCon 2013
Infochimps TieCon 2013Infochimps TieCon 2013
Infochimps TieCon 2013Jim Kaskade
 
Big analytics best practices @ PARC
Big analytics best practices @ PARCBig analytics best practices @ PARC
Big analytics best practices @ PARCJim Kaskade
 
Marketing & Sales
Marketing & SalesMarketing & Sales
Marketing & SalesJim Kaskade
 
Outsourcing Class
Outsourcing ClassOutsourcing Class
Outsourcing ClassJim Kaskade
 
Online Video and Next-gen Storage
Online Video and Next-gen StorageOnline Video and Next-gen Storage
Online Video and Next-gen StorageJim Kaskade
 
Rapid Social Game Development & Deployment
Rapid Social Game Development & DeploymentRapid Social Game Development & Deployment
Rapid Social Game Development & DeploymentJim Kaskade
 
Application Model for Cloud Deployment
Application Model for Cloud DeploymentApplication Model for Cloud Deployment
Application Model for Cloud DeploymentJim Kaskade
 
Next-Gen Security (using Cloud)
Next-Gen Security (using Cloud)Next-Gen Security (using Cloud)
Next-Gen Security (using Cloud)Jim Kaskade
 
CISCO Visual Networking Index Forecast and Methodology, 2009-14
CISCO Visual Networking Index Forecast and Methodology, 2009-14CISCO Visual Networking Index Forecast and Methodology, 2009-14
CISCO Visual Networking Index Forecast and Methodology, 2009-14Jim Kaskade
 
Jim Kaskade Biography
Jim Kaskade BiographyJim Kaskade Biography
Jim Kaskade BiographyJim Kaskade
 
CISCO\'s Take On Internet Video
CISCO\'s Take On Internet VideoCISCO\'s Take On Internet Video
CISCO\'s Take On Internet VideoJim Kaskade
 
Private Cloud Platform as a Service
Private Cloud Platform as a ServicePrivate Cloud Platform as a Service
Private Cloud Platform as a ServiceJim Kaskade
 
Advertising Exchange Whitepaper
Advertising Exchange WhitepaperAdvertising Exchange Whitepaper
Advertising Exchange WhitepaperJim Kaskade
 
Broadband Video Ad Exchange
Broadband Video Ad ExchangeBroadband Video Ad Exchange
Broadband Video Ad ExchangeJim Kaskade
 
Broadband Video Review
Broadband Video ReviewBroadband Video Review
Broadband Video ReviewJim Kaskade
 

Plus de Jim Kaskade (20)

Jim kaskade biography (updated)
Jim kaskade biography (updated)Jim kaskade biography (updated)
Jim kaskade biography (updated)
 
Woodside Residential Design Guidelines
Woodside Residential Design GuidelinesWoodside Residential Design Guidelines
Woodside Residential Design Guidelines
 
Woodside Glens Neighborhood Plan - Amended 1999
Woodside Glens Neighborhood Plan - Amended 1999Woodside Glens Neighborhood Plan - Amended 1999
Woodside Glens Neighborhood Plan - Amended 1999
 
Infochimps Hadoop Summit 2013
Infochimps Hadoop Summit 2013Infochimps Hadoop Summit 2013
Infochimps Hadoop Summit 2013
 
Infochimps TieCon 2013
Infochimps TieCon 2013Infochimps TieCon 2013
Infochimps TieCon 2013
 
Big analytics best practices @ PARC
Big analytics best practices @ PARCBig analytics best practices @ PARC
Big analytics best practices @ PARC
 
Marketing & Sales
Marketing & SalesMarketing & Sales
Marketing & Sales
 
Outsourcing Class
Outsourcing ClassOutsourcing Class
Outsourcing Class
 
Online Video and Next-gen Storage
Online Video and Next-gen StorageOnline Video and Next-gen Storage
Online Video and Next-gen Storage
 
Rapid Social Game Development & Deployment
Rapid Social Game Development & DeploymentRapid Social Game Development & Deployment
Rapid Social Game Development & Deployment
 
Application Model for Cloud Deployment
Application Model for Cloud DeploymentApplication Model for Cloud Deployment
Application Model for Cloud Deployment
 
Next-Gen Security (using Cloud)
Next-Gen Security (using Cloud)Next-Gen Security (using Cloud)
Next-Gen Security (using Cloud)
 
CISCO Visual Networking Index Forecast and Methodology, 2009-14
CISCO Visual Networking Index Forecast and Methodology, 2009-14CISCO Visual Networking Index Forecast and Methodology, 2009-14
CISCO Visual Networking Index Forecast and Methodology, 2009-14
 
Jim Kaskade Biography
Jim Kaskade BiographyJim Kaskade Biography
Jim Kaskade Biography
 
CISCO\'s Take On Internet Video
CISCO\'s Take On Internet VideoCISCO\'s Take On Internet Video
CISCO\'s Take On Internet Video
 
Private Cloud Platform as a Service
Private Cloud Platform as a ServicePrivate Cloud Platform as a Service
Private Cloud Platform as a Service
 
Advertising Exchange Whitepaper
Advertising Exchange WhitepaperAdvertising Exchange Whitepaper
Advertising Exchange Whitepaper
 
Broadband Video Ad Exchange
Broadband Video Ad ExchangeBroadband Video Ad Exchange
Broadband Video Ad Exchange
 
Mobile Video
Mobile VideoMobile Video
Mobile Video
 
Broadband Video Review
Broadband Video ReviewBroadband Video Review
Broadband Video Review
 

Big Data & Cloud Expo

  • 1. Big Data & Cloud Infinite Monkey Theorem CloudCon Expo & Conference October, 2012
  • 2. First What is Big Data? “data sets so large and complex that it becomes difficult to process using on-hand database management tools.” 10/19/2012 Infochimps Confidential 2
  • 3. Data Volume Growing 44x 2010 = 1.2 2020 = 35.2 Zettabytes/yr Zettabytes/yr Source: 2011 IDC Digital Universe Study 10/19/2012 Infochimps Confidential 3
  • 4. Enterprise Data Warehouse Request Answer Parsing ? Engines BYNET Interconnect Amp Amp Amp Node Node Node .... PARC | 4
  • 5. Big Data Warehouse Search Recommend Rank Analytic Request Master: Answer Score Next-Best-Action Name Node Job Tracker Ethernet Interconnect Slave: Slave: Slave: Task Trckr Task Trckr Task Trckr Data Node Data Node Data Node Semi- .... Structured Data PARC | 5
  • 6. Real Time Traditional Operational Application Ecosystem Deployment in Analytic Public/Private Cloud Appliances Toolset Integration Traditional Decision Support Hardened Batch Large Small Enterprise Enterprise 10/19/2012 Infochimps Confidential 6
  • 7. Next Infinite Monkey Theorem (2): an infinite number of monkeys hitting keys on a typewriter for a period of time will almost surely type a given text, such as Shakespeare”s Hamlet. 10/19/2012 Infochimps Confidential 7
  • 8. “unexperienced and unobservable“ based on “real experiences and real observations“ 10/19/2012 Infochimps Confidential 8
  • 9. ““ Infinite Monkey Theorem (2): an infinite number of monkeys hitting keys on a typewriterfor a period of time will atypewriter for a period of time will almost surely type a given text, such as Shakespeare”s Hamlet. 10/19/2012 Infochimps Confidential 9
  • 10. infinite number keys on a almost Shakespeare”s of monkeys typewriter surely Hamlet unlimited processing statistically insights computational data significant power 10/19/2012 Infochimps Confidential 10
  • 11. #thisischimpy 10/19/2012 Infochimps Confidential 11
  • 12. Problem “Little Data For Business Users“ 10/19/2012 Infochimps Confidential 12
  • 13.
  • 14.
  • 15. “Big Data For Business Users“ 10/19/2012 Infochimps Confidential 15
  • 16. Reduce Friction $ $ $ $ ? Executive Data 10/19/2012 Infochimps Confidential 16
  • 17. #thisisreallygood 10/19/2012 Infochimps Confidential 17
  • 18. Public unlimited computational power Private Virtual Private 10/19/2012 Infochimps Confidential 18
  • 19. analysts use these images to count shipping containers coming off ships in California and are able to get a sense of overall US import activity 10/19/2012 Infochimps Confidential 19
  • 20. Public data processing Private Virtual Private 10/19/2012 Infochimps Confidential 20
  • 21. Walmart 10/19/2012 Infochimps Confidential 21
  • 22. Target 10/19/2012 Infochimps Confidential 22
  • 23. Images Web, Mobile, CRM, ERP, SCM… Business Docs, Transactions & Text Interactions Web Logs SQL NoSQL NewSQL Social EDW MPP NewSQL Sensors Business Intelligence & Analytics Dashboards, Reports GPS Visualization… 10/19/2012 Infochimps Confidential 23
  • 24. Public statistically significant Private Virtual Private 10/19/2012 Infochimps Confidential 24
  • 25. #lotsofdata + #simplealgorithms 10/19/2012 Infochimps Confidential 25
  • 26. Cars In Lot News Text Web Pricing Quarterly Revenue Prediction Social Sentiment Weather Sensors Local Employment 10/19/2012 Infochimps Confidential 26
  • 27. Public insights Private Virtual Private 10/19/2012 Infochimps Confidential 27
  • 28. New Media Data Scientist App Developer Gnip Powertrack Business Users Gnip EDC Sources Sentiment Moreover Metabase In-Motion Data Delivery APIs Listening Service Application TV Transcription NoSQL Radio Transcription Print Transcription IT Staff Traditional Media 10/19/2012 Infochimps Confidential 28
  • 29. unlimited processing statistically insights computational data significant power 10/19/2012 Infochimps Confidential 29
  • 30. #1BigDataCloudService 10/19/2012 Infochimps Confidential 30
  • 31. #inspiredbyAvinashKaushik 10/19/2012 Infochimps Confidential 31

Notes de l'éditeur

  1. AvinashKaushik gave a talk at Strata 2012 in Santa Clara in March.If you listen to all the hype of Big Data, it solves for the first problem.If you listen to all the vendors, there is a lot of emphasis on the first part (perhaps Infochimps included), and very little on the second.I think that’s because we don’t exactly know how to truly empower the organization to interact directly with any/all data available.It’s too expensive, risky, complex.
  2. 40%+ YoY growth with 2012 generating 2.4Zettabytes alone.http://jameskaskade.com/?p=2040http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm
  3. AMP:access module processorsPE: Parsing EngineBYNET: Banyan Cross-bar Switch YNET (Y Network)Store:The Parsing Engine dispatches a request to retrieve one or more rows.The BYNET ensures that appropriate AMP(s) are activated.The Parsing Engine dispatches a request to insert a row.The BYNET ensures that the row gets to the appropriate AMP (Access Module Processor) via the hashing algorithm.The AMP stores the row on its associated disk.Each AMP can have multiple physical disks associated with it.Retrieve:The AMPs (access module processors) locate and retrieve desired rows in parallel access and will sort, aggregate or format if needed.The BYNET returns retrieved rows to Parsing Engine.The Parsing Engine returns row(s) to requesting client application.Teradata’s shared-nothing architecture allows for highly scalable data volumes.
  4. 3 node Hadoop system:$8K/node$10K switch$4K/node HadoopDistro$24K + $10K x 25%x3 maintenance = $43K$4K x 3 x 3 = $36KTotal = There are three essential elements of an analytic platform: Strong support for analytic database query. A variety of query styles — at a minimum, SQL, MDX or graph.Strong support for analytic processes other than queries. Typically these would be in the areas of mathematics (statistics, predictive analytics, data mining, linear algebra, optimization, graph theory, etc.) and/or data transformation (e.g. sessionization, entity extraction).Strong integration between the first two.The point is — an analytic platform is something on which you can build a range of powerful analytic applications. Some specifics of what to look for in analytic platform may be found in the link above.http://www.dbms2.com/2011/02/24/analytic-platforms/http://www.dbms2.com/2011/01/18/architectural-options-for-analytic-database-management-systems/Enterprise data warehouse (Full or partial)Kinds of data likely to be included: All, but especially operationalLikely use styles: AllCanonical example: Central EDW for a big enterpriseStresses: Concurrency, reliability, workload managementClassical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL ServerTraditional data martKinds of data likely to be included: AllLikely use styles: Business intelligence, budgeting/consolidation, investigativeExamples: Reporting servers, planning/consolidation servers, anything MOLAP, etc.Stresses: Performance, concurrency, TCOColumnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them — e.g. Sybase IQ and Vertica — have excellent track records in concurrent usage as well.Investigative data mart — agileKinds of data likely to be included: All, especially customer-centricLikely use styles: InvestigativeCanonical example: A few analysts getting a few TB to examineStresses: Ease of setup/load, ease of admin, price/performanceInfobright is often cost-effective among columnar analytic DBMS. Investigative data mart — bigKinds of data likely to be included: All, especially customer-centric, logs, financial trade, scientificLikely use styles: InvestigativeCanonical example: Single-subject 20 TB – 20 PB relational databaseStresses: Performance, scale-out, analytic functionalityPerformance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum.Bit bucket - HadoopKinds of data likely to be included: Logs, other technical/externalLikely use styles: Staging/ETL, investigativeCanonical example: Log files in a Hadoop clusterStresses: TCO, scale-out, transform/big-query performance, ETL functionalityArchival data storeKinds of data likely to be included: Operational, CDR (call detail record), security logLikely use styles: Archival, reporting (for compliance), possibly also investigativeExamples: Any long-term detailed historical storeStresses: TCO, compression, scale-out, performance (if multi-use)Perhaps only Rainstor truly embraces the archival positioningOutsourced data martKinds of data likely to be included: AllLikely use styles: Traditional BI, investigative analytics, staging/ETLExamples: Advertising tracking, SaaS CRMStresses: Performance, TCO, reliability, concurrencyOracle shops = Vertica gets the nod in a number of these casesOperational analytic(s) serverKinds of data likely to be included: Customer-centric, log, financial tradeLikely use styles: Advanced operational analyticsExamples:Lower latency: Web or call-center personalization, anti-fraudHigher latency: Customer profiling, Basel 3 risk analysisStresses: Performance, reliability, analytic functionality, perhaps concurrencyhttp://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/
  5. Being the CEO of Infochimps, I felt compelled to share a little “chimpy” research with you…The “Infinite Monkey Theorem”….is a METAPHOR that directly relates to Big Data, that I think you’ll appreciate.So what is the “Infinite Monkey Theorem”????The following definition is a variant of the original theorem….let me read it to you.This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.
  6. This theorem has been traced back to Aristotle's “On Generation and Corruption”, where he makes deductions about the unexperienced and unobservable based on real experiences and real observations.Think about this a little….we’re talking about analyzing real world experiences and observations to predict what will happen…what will happen with our business in the future….the unexperienced and unobserved.This is fundamentally what Big Data proposes to help…
  7. So as a metaphor…the "monkey" is not an actual monkey, but a metaphor for an abstract device a device that produces a sequence of letters and symbols.And "almost surely" is a mathematical term with a precise meaningShakespeare’s Hamlet also represents a broader meaning….it represents any text, any work, any insight.
  8. So lets look at this in more depth….Infinite number of monkeys -> represents today’s seemingly unlimited computational power of either public or private Clouds…as an elastic delivery method.Keys on a typewriter -> capture discrete transactions which only analyzed together can derive meaning. Again we amass the computational power to process dataAlmost surely -> is translated into a mathematical term, namely the concept of significanceAnd finally, Shakespeare’s Hamlet is what we strive to create and it is the source of our happiness, our translation of this raw resource into insight.
  9. Now this may seem “chimpy”….but this is beautiful. I love this metaphor.But we have a LARGE problem….
  10. We have a problem today WITH our data infrastructure….our ability to gleam insights.I think all of you know what I’m referring to…..It’s the fact that we’re operating on less than 15% of the corporate data available to us…..even with the ENTERPRISE DATA WAREHOUSE, the EDW which is supposedly storing a COMPLETE, SINGLE VIEW OF THE TRUTH….We’re still giving our business users…..a tiny bit…a little bit of data.
  11. The Business User
  12. The Business User
  13. The Business User
  14. So why is an elastic, unlimited computational resource important?Op-Ex vs. Cap-ExCost Reduction due to better utilization / productivityTime-to-Market
  15. Hedge funds and Wall Street firms, are using Cold War-style satellite surveillance to gather market-moving information. The Port of Long Beach is the second-busiest container port in the United States and acts as a major gateway for trade between the US and Asia. With the activity from this port estimated at over $100 billion per year, this specific port is a location it will pay to keep track of. 

Satellite analysts use these images to count shipping containers coming off ships in California and are able to get a sense of overall US import activity, comparing activity month by month.This analysis is being performed in Amazon”s EC2
  16. Now lets talk about processing your enterprise data assets….your Big Data…..again, we can leverage the cloud infrastructure to scale to the level of any processing needs you may have.
  17. The current image shows a Walmart in Wichita, Kansas.Analysts count cars in Wal-Mart parking lots to measure overall customer traffic to understand growth versus its competition.For example, Wal-Mart's growthwas determined to come mostly from areas of high unemployment.This type of analysis is being performed in Amazon”s EC2…
  18. The current image shows the a Target in the Moraine Point Plaza located in Gardiner, NorthAnalysts comparing satellite parking lot data with regional unemployment trends found Target's growth tended to come in areas of lower-than-average unemployment.

Again, these processes are being performed in Amazon EC2.…this is interesting….but how do we process the data further to help derive more relevant insights?http://www.cnbc.com/id/38738810/Spying_For_Profits_The_Satellite_Image_Indicator
  19. The way this is performed is by taking data sources like images and storing them into Hadoop. Then using Big Data tools like MapReduce to perform sophisticated analysis on those aggregated data sets.Why is this concept so disruptive?Things like a fraction of the price….no structured data model – aka no star schema…yet the ability to run sophisticated queries and algorithms against all your detailed data.
  20. The Business User
  21. The previous examples of Walmart and Target involved using a regression algorithm which was executed against the satellite data + other data to produce a quarterly revenue prediction which BEAT all previous models.
  22. Which brings us to the discussion around insights.
  23. Quote that sets theme….the definition of “Infinite Monkey Theorem”.
  24. The Business User