SlideShare une entreprise Scribd logo
1  sur  9
Given Enough Monkeys
Some Thoughts on Randomness
Jesse Anderson |   CLOUDERA, INSTRUCTOR
Infinite Monkey Theorem




2
Million Monkeys Algorithm


      Randomly generate a 9 character group

                    TOBEORNOT



          Does it exist in Shakespeare?
       To be, or not to be- that is the question




3
Exponential Growth (aka Big Data)


     Odds of finding a group    Contiguous
                                              Combinations
     of characters is 1 in 26   Characters
     raised to the power of
          the number of             8           208,827,064,576
     contiguous characters
                                    9          5,429,503,678,976

                                   10        141,167,095,653,376




4
Data Bias?




5
Hadoop Scalability
                               Percent of Linear Scalability
              100

              80
    Percent




              60                                                               RDBMS
                                                                               Hadoop
              40

              20

                0
                    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
                                      Nodes                        RDBMS = Relational Database

6
Business Value of Scalability


        Scaling does not require    Adding more computers
         massive re-engineering         to cluster gets a
        and complete rewrites of     predictable increase in
                  code             computational power and
                                             storage

         SAVE                        SAVE




7
Going Viral (and taking over the world)


      Covered internationally        26,000 unique
      in BBC, Wall Street            visits from 119
      Journal, Wired and             countries in
      Slashdot                       one day




8
@jessetanderson

Contenu connexe

Similaire à Strata 2012 Million Monkeys

Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataRichard McDougall
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarAmazon Web Services
 
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesBig Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesAmazon Web Services
 
ScimoreDB @ CommunityDays 2011
ScimoreDB @ CommunityDays 2011ScimoreDB @ CommunityDays 2011
ScimoreDB @ CommunityDays 2011scimore
 
Scimore CommunityDays 2011
Scimore CommunityDays 2011Scimore CommunityDays 2011
Scimore CommunityDays 2011scimore
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavasrisatish ambati
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationShanley Kane
 
Prepare Your Data For The Cloud
Prepare Your Data For The CloudPrepare Your Data For The Cloud
Prepare Your Data For The CloudIndicThreads
 
DeepImage_GTC15_public
DeepImage_GTC15_publicDeepImage_GTC15_public
DeepImage_GTC15_publicRen Wu
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Sybase Türkiye
 

Similaire à Strata 2012 Million Monkeys (20)

Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Cassandra
CassandraCassandra
Cassandra
 
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace Webinar
 
Big Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web ServicesBig Data Analytics with Amazon Web Services
Big Data Analytics with Amazon Web Services
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
 
ScimoreDB @ CommunityDays 2011
ScimoreDB @ CommunityDays 2011ScimoreDB @ CommunityDays 2011
ScimoreDB @ CommunityDays 2011
 
Scimore CommunityDays 2011
Scimore CommunityDays 2011Scimore CommunityDays 2011
Scimore CommunityDays 2011
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjava
 
Spark
SparkSpark
Spark
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 Presentation
 
No Sql
No SqlNo Sql
No Sql
 
Prepare Your Data For The Cloud
Prepare Your Data For The CloudPrepare Your Data For The Cloud
Prepare Your Data For The Cloud
 
Preparing your data for the cloud
Preparing your data for the cloudPreparing your data for the cloud
Preparing your data for the cloud
 
DeepImage_GTC15_public
DeepImage_GTC15_publicDeepImage_GTC15_public
DeepImage_GTC15_public
 
Brisk hadoop june2011
Brisk hadoop june2011Brisk hadoop june2011
Brisk hadoop june2011
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
 

Plus de Jesse Anderson

Managing Real-Time Data Teams
Managing Real-Time Data TeamsManaging Real-Time Data Teams
Managing Real-Time Data TeamsJesse Anderson
 
Pulsar for Kafka People
Pulsar for Kafka PeoplePulsar for Kafka People
Pulsar for Kafka PeopleJesse Anderson
 
Big Data and Analytics in the COVID-19 Era
Big Data and Analytics in the COVID-19 EraBig Data and Analytics in the COVID-19 Era
Big Data and Analytics in the COVID-19 EraJesse Anderson
 
Working Together As Data Teams V1
Working Together As Data Teams V1Working Together As Data Teams V1
Working Together As Data Teams V1Jesse Anderson
 
What Does an Exec Need to About Architecture and Why
What Does an Exec Need to About Architecture and WhyWhat Does an Exec Need to About Architecture and Why
What Does an Exec Need to About Architecture and WhyJesse Anderson
 
The Five Dysfunctions of a Data Engineering Team
The Five Dysfunctions of a Data Engineering TeamThe Five Dysfunctions of a Data Engineering Team
The Five Dysfunctions of a Data Engineering TeamJesse Anderson
 
HBaseCon 2014-Just the Basics
HBaseCon 2014-Just the BasicsHBaseCon 2014-Just the Basics
HBaseCon 2014-Just the BasicsJesse Anderson
 
EC2 Performance, Spot Instance ROI and EMR Scalability
EC2 Performance, Spot Instance ROI and EMR ScalabilityEC2 Performance, Spot Instance ROI and EMR Scalability
EC2 Performance, Spot Instance ROI and EMR ScalabilityJesse Anderson
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular ExpressionsJesse Anderson
 
Introduction to Android
Introduction to AndroidIntroduction to Android
Introduction to AndroidJesse Anderson
 

Plus de Jesse Anderson (12)

Managing Real-Time Data Teams
Managing Real-Time Data TeamsManaging Real-Time Data Teams
Managing Real-Time Data Teams
 
Pulsar for Kafka People
Pulsar for Kafka PeoplePulsar for Kafka People
Pulsar for Kafka People
 
Big Data and Analytics in the COVID-19 Era
Big Data and Analytics in the COVID-19 EraBig Data and Analytics in the COVID-19 Era
Big Data and Analytics in the COVID-19 Era
 
Working Together As Data Teams V1
Working Together As Data Teams V1Working Together As Data Teams V1
Working Together As Data Teams V1
 
What Does an Exec Need to About Architecture and Why
What Does an Exec Need to About Architecture and WhyWhat Does an Exec Need to About Architecture and Why
What Does an Exec Need to About Architecture and Why
 
The Five Dysfunctions of a Data Engineering Team
The Five Dysfunctions of a Data Engineering TeamThe Five Dysfunctions of a Data Engineering Team
The Five Dysfunctions of a Data Engineering Team
 
HBaseCon 2014-Just the Basics
HBaseCon 2014-Just the BasicsHBaseCon 2014-Just the Basics
HBaseCon 2014-Just the Basics
 
EC2 Performance, Spot Instance ROI and EMR Scalability
EC2 Performance, Spot Instance ROI and EMR ScalabilityEC2 Performance, Spot Instance ROI and EMR Scalability
EC2 Performance, Spot Instance ROI and EMR Scalability
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular Expressions
 
Why Use MVC?
Why Use MVC?Why Use MVC?
Why Use MVC?
 
How to Use MVC
How to Use MVCHow to Use MVC
How to Use MVC
 
Introduction to Android
Introduction to AndroidIntroduction to Android
Introduction to Android
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Dernier (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Strata 2012 Million Monkeys

  • 1. Given Enough Monkeys Some Thoughts on Randomness Jesse Anderson | CLOUDERA, INSTRUCTOR
  • 3. Million Monkeys Algorithm Randomly generate a 9 character group TOBEORNOT Does it exist in Shakespeare? To be, or not to be- that is the question 3
  • 4. Exponential Growth (aka Big Data) Odds of finding a group Contiguous Combinations of characters is 1 in 26 Characters raised to the power of the number of 8 208,827,064,576 contiguous characters 9 5,429,503,678,976 10 141,167,095,653,376 4
  • 6. Hadoop Scalability Percent of Linear Scalability 100 80 Percent 60 RDBMS Hadoop 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Nodes RDBMS = Relational Database 6
  • 7. Business Value of Scalability Scaling does not require Adding more computers massive re-engineering to cluster gets a and complete rewrites of predictable increase in code computational power and storage SAVE SAVE 7
  • 8. Going Viral (and taking over the world) Covered internationally 26,000 unique in BBC, Wall Street visits from 119 Journal, Wired and countries in Slashdot one day 8

Notes de l'éditeur

  1. Interesting statistical question. Thought about since Aristotle.Randomness+Resouces+Time=AnythingPossibleNo real monkeys – need virtual monkeys
  2. Lucky monkeyThe monkey wears a lot of hats. He generates and then compares.Every work of Shakespeare created. First was A Lover’s Complaint and last was Taming of the ShrewVisualization to find your favorite line from Shakespeare
  3. Shakespeare lazy. Heavily influenced English Literature.Big Data isn’t always a huge file. It can be high computation.
  4. Creating Shakespeare not a business. Don’t have Shakespeare in your data.If you look hard enough you will find itHumans are not randomYou want to be looking for what’s actually there. Check your assumptionsOperate with scientific method. Form a hypothesis. Test hypothesis against data.Offer what customers are looking for. Not what you think or favorite or new product. Only what your data shows.
  5. This is not a map of MT and ID1 to 20 node testingKeep efficiency up RDBMS efficiency in gutter
  6. Engineers not spending time coding to scale. Busy adding new features.No code changes for scaling. Took 1.5 months on one computer and 3.5 days on 20 nodesSpending on new computers gives a consistent, linear increase. Compare spending on RDBMS and Hadoop.
  7. We like to ask bigger questions.I asked if Shakespeare could be randomly recreated by a bunch of virtual monkeys? The answer is yes.