SlideShare a Scribd company logo
1 of 73
Download to read offline
Wednesday, July 14, 2010
Evolving a New Analytical Platform
         What Works and What’s Missing


         Jeff Hammerbacher
         Chief Scientist, Cloudera
         July 14, 2010



Wednesday, July 14, 2010
My Background
         Thanks for Asking
         ▪   hammer@cloudera.com
         ▪   Studied Mathematics at Harvard
         ▪   Worked as a Quant on Wall Street
         ▪   Conceived, built, and led Data team at Facebook
             ▪   Nearly 30 amazing engineers and data scientists
             ▪   Several open source projects and research papers
         ▪   Founder of Cloudera
             ▪   Chief Scientist
             ▪   Also, check out the book “Beautiful Data”

Wednesday, July 14, 2010
Presentation Outline
         ▪   1. Defining the Platform
             ▪   BI: Science for Profit
             ▪   Need tools for whole research cycle
             ▪   SQL Server 2008 R2: defining the platform
         ▪   2. State of the Platform Ecosystem
         ▪   3. Foundations for a New Implementation
             ▪   Hadoop
             ▪   Boiling the Frog
         ▪   4. Future Developments
         ▪   Questions and Discussion


Wednesday, July 14, 2010
1. Defining the Platform




Wednesday, July 14, 2010
BI is looking more like science (for profit)




Wednesday, July 14, 2010
Jim Gray: Science entering Fourth Paradigm
            “We have to do better at producing tools to
                 support the whole research cycle”




Wednesday, July 14, 2010
RDBMS only a small part of this tool set




Wednesday, July 14, 2010
Example: SQL Server 2008 R2




Wednesday, July 14, 2010
RDBMS: SQL Server




Wednesday, July 14, 2010
ETL: SQL Server Integration Services
                              RDBMS: SQL Server




Wednesday, July 14, 2010
ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services




Wednesday, July 14, 2010
ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services




Wednesday, July 14, 2010
ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search



Wednesday, July 14, 2010
CEP: StreamInsight
                       ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search



Wednesday, July 14, 2010
CEP: StreamInsight
                       ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Wednesday, July 14, 2010
MDM: Master Data Services
                                CEP: StreamInsight
                       ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Wednesday, July 14, 2010
Collaboration: SharePoint
                            MDM: Master Data Services
                                CEP: StreamInsight
                       ETL: SQL Server Integration Services
                              RDBMS: SQL Server
                 Reporting: SQL Server Reporting Services
                  Analysis: SQL Server Analysis Services
                         Search: Full-Text Search
                             OLAP: PowerPivot


Wednesday, July 14, 2010
What do we call this unified suite?




Wednesday, July 14, 2010
For today: Analytical Data Platform




Wednesday, July 14, 2010
For today: Analytical Data Platform
               LAMP Stack for Analytical Data Management




Wednesday, July 14, 2010
2. The State of the Platform Ecosystem




Wednesday, July 14, 2010
Who makes up the platform ecosystem?




Wednesday, July 14, 2010
Platform Providers




Wednesday, July 14, 2010
Infrastructure Providers
                             Platform Providers




Wednesday, July 14, 2010
Infrastructure Providers
                             Platform Providers
                           Application Developers




Wednesday, July 14, 2010
Content Providers
                           Infrastructure Providers
                             Platform Providers
                           Application Developers




Wednesday, July 14, 2010
Content Providers
                           Infrastructure Providers
                             Platform Providers
                           Application Developers
                                 End Users




Wednesday, July 14, 2010
What is new about the ecosystem today?




Wednesday, July 14, 2010
Content Providers
            1. > 95% of enterprise data is unstructured
                  2. Data volumes growing rapidly




Wednesday, July 14, 2010
Infrastructure Providers
                                     1. Cloud
                           2. Warehouse-Scale Computers




Wednesday, July 14, 2010
Platform Providers
                                  1. Open source
                       2. Driven by consumer web properties




Wednesday, July 14, 2010
Application Developers
                              1. Data Scientists
                           2. Diversity of languages




Wednesday, July 14, 2010
End Users
                                1. Browser is the client
                           2. Tell a story about the business




Wednesday, July 14, 2010
3. Foundations for a New Implementation




Wednesday, July 14, 2010
New foundations: HDFS and MapReduce




Wednesday, July 14, 2010
2005: Doug/Mike start project inside Nutch




Wednesday, July 14, 2010
2006: Doug joins Yahoo!




Wednesday, July 14, 2010
2007: Make Hadoop scale




Wednesday, July 14, 2010
2007: Make Hadoop scale
                           Yahoo! makes Pig open source




Wednesday, July 14, 2010
Jim Gray’s “Fourth Paradigm” lecture
                           2007: Make Hadoop scale
                           Yahoo! makes Pig open source




Wednesday, July 14, 2010
Randy Bryant’s “DISC” lecture
                       Jim Gray’s “Fourth Paradigm” lecture
                           2007: Make Hadoop scale
                           Yahoo! makes Pig open source




Wednesday, July 14, 2010
Randy Bryant’s “DISC” lecture
                       Jim Gray’s “Fourth Paradigm” lecture
                             2007: Make Hadoop scale
                             Yahoo! makes Pig open source
                           Powerset makes HBase open source




Wednesday, July 14, 2010
2008: Make Hadoop fast




Wednesday, July 14, 2010
2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark




Wednesday, July 14, 2010
First Hadoop Summit
                           2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark




Wednesday, July 14, 2010
First Hadoop Summit
                           2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Wednesday, July 14, 2010
Facebook makes Hive open source
                                 First Hadoop Summit
                              2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Wednesday, July 14, 2010
“MapReduce: A Major Step Backwards”
                             Facebook makes Hive open source
                                   First Hadoop Summit
                                2008: Make Hadoop fast
            Yahoo! wins Daytona terabyte sort benchmark
            Yahoo! builds production webmap with Hadoop




Wednesday, July 14, 2010
2009: Insert Hadoop into the enterprise




Wednesday, July 14, 2010
2009: Insert Hadoop into the enterprise
                           Cloudera releases CDH




Wednesday, July 14, 2010
First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                           Cloudera releases CDH




Wednesday, July 14, 2010
Yahoo! sorts a petabyte with Hadoop
                                 First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                                 Cloudera releases CDH




Wednesday, July 14, 2010
Yahoo! sorts a petabyte with Hadoop
                                 First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                         Cloudera releases CDH
               Cloudera adds training, support, services




Wednesday, July 14, 2010
“The Unreasonable Effectiveness of Data”
                   Yahoo! sorts a petabyte with Hadoop
                          First Hadoop World NYC
                  2009: Insert Hadoop into the enterprise
                         Cloudera releases CDH
               Cloudera adds training, support, services




Wednesday, July 14, 2010
2010: Integrate Hadoop into the enterprise




Wednesday, July 14, 2010
2010: Integrate Hadoop into the enterprise
                       IBM announces InfoSphere BigInsights




Wednesday, July 14, 2010
Yahoo! completes enterprise-class security
            2010: Integrate Hadoop into the enterprise
                       IBM announces InfoSphere BigInsights




Wednesday, July 14, 2010
Yahoo! completes enterprise-class security
            2010: Integrate Hadoop into the enterprise
                       IBM announces InfoSphere BigInsights
                         Datameer and Karmasphere funded




Wednesday, July 14, 2010
Quest, Talend, Netezza, and more integrate
             Yahoo! completes enterprise-class security
            2010: Integrate Hadoop into the enterprise
                       IBM announces InfoSphere BigInsights
                         Datameer and Karmasphere funded




Wednesday, July 14, 2010
Hive adds JDBC and ODBC
             Quest, Talend, Netezza, and more integrate
             Yahoo! completes enterprise-class security
            2010: Integrate Hadoop into the enterprise
                       IBM announces InfoSphere BigInsights
                         Datameer and Karmasphere funded




Wednesday, July 14, 2010
Hadoop will be an Analytical Data Platform




Wednesday, July 14, 2010
4. Future Developments




Wednesday, July 14, 2010
Capture: Log collection and CEP




Wednesday, July 14, 2010
Curate: Workflow and Scheduling




Wednesday, July 14, 2010
Curate: Secondary and Full-Text Indexing




Wednesday, July 14, 2010
Curate: Learn Structure from Data




Wednesday, July 14, 2010
Analyze: Mesos-enabled frameworks




Wednesday, July 14, 2010
Analyze: Link working set and historical data




Wednesday, July 14, 2010
All behind a single user interface




Wednesday, July 14, 2010
HUE
                           Making Many Computers Feel Like One




Wednesday, July 14, 2010
!"#$%&'()* !"#$%"&'$"()*+(%*,-.((/0*12%#"()*30*"#*$42*
                   2)$2%/%"#2*(/2)*#('%52*/6-$+(%7*+(%*5(7/628*.-$-
                    ! !"#$%&'#$()! '**)+,-.,"$"#/)0)12"+#3,"/)3"#$&,.$&'#$)43#5),"$)
                      "#$%&'()%&($*+&),%"#-"(-)./01,
                                                     ! 63-.*313$()! 7*,2($&')-'"'%$/)
                                                       &$823&$()+,-.,"$"#)9$&/3,"/)
                                                       0)($.$"($"+3$/
                                                     ! :.$")/,2&+$)! ;<<=)>.'+5$)
                                                       *3+$"/$(
                                                     ! ?$*3'@*$)! .'#+5$()43#5)13A$/)
                                                       1&,-)12#2&$)&$*$'/$/)#,)
                                                       3-.&,9$)/#'@3*3#B
                    ! 62..,&#$()! 7*,2($&')$-.*,B/)CD<=),1)#5$).&,E$+#)1,2"($&/)'"()
                      '#)*$'/#),"$)+,--3##$&)1,&)CF<=),1)#5$/$),.$")/,2&+$)
                      +,-.,"$"#/G


Wednesday, July 14, 2010
(c) 2010 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0




Wednesday, July 14, 2010

More Related Content

Similar to 20100714accel

Mashups and data portals where next? (spatial@gov)
Mashups and data portals where next? (spatial@gov)Mashups and data portals where next? (spatial@gov)
Mashups and data portals where next? (spatial@gov)josediacono
 
A Match Made In The Cloud
A Match Made In The CloudA Match Made In The Cloud
A Match Made In The CloudChapter Three
 
Opscode Lightning Talk - Operations as Code
Opscode Lightning Talk - Operations as CodeOpscode Lightning Talk - Operations as Code
Opscode Lightning Talk - Operations as CodeJohn Willis
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETLkabrilake
 
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
2011 June - Singapore GTUG presentation. App Engine program update + intro to Goikailan
 
Open End To End Js Stack
Open End To End Js StackOpen End To End Js Stack
Open End To End Js StackSkills Matter
 
Introducing Riak and Ripple
Introducing Riak and RippleIntroducing Riak and Ripple
Introducing Riak and RippleSean Cribbs
 
Puppet Keynote
Puppet KeynotePuppet Keynote
Puppet KeynotePuppet
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesBOSC 2010
 
Enterprise Search from Microsoft
Enterprise Search  from MicrosoftEnterprise Search  from Microsoft
Enterprise Search from MicrosoftAmplexor
 
First Steps with Microsoft SQL Server
First Steps with Microsoft SQL ServerFirst Steps with Microsoft SQL Server
First Steps with Microsoft SQL ServerBoris Hristov
 
SF Hadoop Users Group August 2014 Meetup Slides
SF Hadoop Users Group August 2014 Meetup SlidesSF Hadoop Users Group August 2014 Meetup Slides
SF Hadoop Users Group August 2014 Meetup SlidesYash Ranadive
 
Docker Deployments
Docker DeploymentsDocker Deployments
Docker DeploymentsDocker, Inc.
 
SharePoint 2010 - Was ist neu, was wird besser!
SharePoint 2010 - Was ist neu, was wird besser!SharePoint 2010 - Was ist neu, was wird besser!
SharePoint 2010 - Was ist neu, was wird besser!GFU Cyrus AG
 

Similar to 20100714accel (20)

App Engine Meetup
App Engine MeetupApp Engine Meetup
App Engine Meetup
 
Mashups and data portals where next? (spatial@gov)
Mashups and data portals where next? (spatial@gov)Mashups and data portals where next? (spatial@gov)
Mashups and data portals where next? (spatial@gov)
 
A Match Made In The Cloud
A Match Made In The CloudA Match Made In The Cloud
A Match Made In The Cloud
 
20100513brown
20100513brown20100513brown
20100513brown
 
Opscode Lightning Talk - Operations as Code
Opscode Lightning Talk - Operations as CodeOpscode Lightning Talk - Operations as Code
Opscode Lightning Talk - Operations as Code
 
Application Engine ETL
Application Engine ETLApplication Engine ETL
Application Engine ETL
 
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
2011 June - Singapore GTUG presentation. App Engine program update + intro to Go
 
Railsconf 2010
Railsconf 2010Railsconf 2010
Railsconf 2010
 
Treasure Data Cloud Strategy
Treasure Data Cloud StrategyTreasure Data Cloud Strategy
Treasure Data Cloud Strategy
 
Open End To End Js Stack
Open End To End Js StackOpen End To End Js Stack
Open End To End Js Stack
 
Introducing Riak and Ripple
Introducing Riak and RippleIntroducing Riak and Ripple
Introducing Riak and Ripple
 
Puppet Keynote
Puppet KeynotePuppet Keynote
Puppet Keynote
 
Big Data loves JS
Big Data loves JSBig Data loves JS
Big Data loves JS
 
Operations as Code
Operations as CodeOperations as Code
Operations as Code
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Enterprise Search from Microsoft
Enterprise Search  from MicrosoftEnterprise Search  from Microsoft
Enterprise Search from Microsoft
 
First Steps with Microsoft SQL Server
First Steps with Microsoft SQL ServerFirst Steps with Microsoft SQL Server
First Steps with Microsoft SQL Server
 
SF Hadoop Users Group August 2014 Meetup Slides
SF Hadoop Users Group August 2014 Meetup SlidesSF Hadoop Users Group August 2014 Meetup Slides
SF Hadoop Users Group August 2014 Meetup Slides
 
Docker Deployments
Docker DeploymentsDocker Deployments
Docker Deployments
 
SharePoint 2010 - Was ist neu, was wird besser!
SharePoint 2010 - Was ist neu, was wird besser!SharePoint 2010 - Was ist neu, was wird besser!
SharePoint 2010 - Was ist neu, was wird besser!
 

More from Jeff Hammerbacher (20)

20120223keystone
20120223keystone20120223keystone
20120223keystone
 
20100423sage
20100423sage20100423sage
20100423sage
 
20100418sos
20100418sos20100418sos
20100418sos
 
20100301icde
20100301icde20100301icde
20100301icde
 
20100201hplabs
20100201hplabs20100201hplabs
20100201hplabs
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091110startup2startup
20091110startup2startup20091110startup2startup
20091110startup2startup
 
20091030nasajpl
20091030nasajpl20091030nasajpl
20091030nasajpl
 
20091027genentech
20091027genentech20091027genentech
20091027genentech
 
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
 
20090622 Velocity
20090622 Velocity20090622 Velocity
20090622 Velocity
 
20090422 Www
20090422 Www20090422 Www
20090422 Www
 
20090309berkeley
20090309berkeley20090309berkeley
20090309berkeley
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
20081022cca
20081022cca20081022cca
20081022cca
 
20081009nychive
20081009nychive20081009nychive
20081009nychive
 
2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao2008 Ur Tech Talk Zshao
2008 Ur Tech Talk Zshao
 
Data Presentations Cassandra Sigmod
Data  Presentations  Cassandra SigmodData  Presentations  Cassandra Sigmod
Data Presentations Cassandra Sigmod
 

20100714accel

  • 2. Evolving a New Analytical Platform What Works and What’s Missing Jeff Hammerbacher Chief Scientist, Cloudera July 14, 2010 Wednesday, July 14, 2010
  • 3. My Background Thanks for Asking ▪ hammer@cloudera.com ▪ Studied Mathematics at Harvard ▪ Worked as a Quant on Wall Street ▪ Conceived, built, and led Data team at Facebook ▪ Nearly 30 amazing engineers and data scientists ▪ Several open source projects and research papers ▪ Founder of Cloudera ▪ Chief Scientist ▪ Also, check out the book “Beautiful Data” Wednesday, July 14, 2010
  • 4. Presentation Outline ▪ 1. Defining the Platform ▪ BI: Science for Profit ▪ Need tools for whole research cycle ▪ SQL Server 2008 R2: defining the platform ▪ 2. State of the Platform Ecosystem ▪ 3. Foundations for a New Implementation ▪ Hadoop ▪ Boiling the Frog ▪ 4. Future Developments ▪ Questions and Discussion Wednesday, July 14, 2010
  • 5. 1. Defining the Platform Wednesday, July 14, 2010
  • 6. BI is looking more like science (for profit) Wednesday, July 14, 2010
  • 7. Jim Gray: Science entering Fourth Paradigm “We have to do better at producing tools to support the whole research cycle” Wednesday, July 14, 2010
  • 8. RDBMS only a small part of this tool set Wednesday, July 14, 2010
  • 9. Example: SQL Server 2008 R2 Wednesday, July 14, 2010
  • 11. ETL: SQL Server Integration Services RDBMS: SQL Server Wednesday, July 14, 2010
  • 12. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Wednesday, July 14, 2010
  • 13. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Wednesday, July 14, 2010
  • 14. ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Wednesday, July 14, 2010
  • 15. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search Wednesday, July 14, 2010
  • 16. CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Wednesday, July 14, 2010
  • 17. MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Wednesday, July 14, 2010
  • 18. Collaboration: SharePoint MDM: Master Data Services CEP: StreamInsight ETL: SQL Server Integration Services RDBMS: SQL Server Reporting: SQL Server Reporting Services Analysis: SQL Server Analysis Services Search: Full-Text Search OLAP: PowerPivot Wednesday, July 14, 2010
  • 19. What do we call this unified suite? Wednesday, July 14, 2010
  • 20. For today: Analytical Data Platform Wednesday, July 14, 2010
  • 21. For today: Analytical Data Platform LAMP Stack for Analytical Data Management Wednesday, July 14, 2010
  • 22. 2. The State of the Platform Ecosystem Wednesday, July 14, 2010
  • 23. Who makes up the platform ecosystem? Wednesday, July 14, 2010
  • 25. Infrastructure Providers Platform Providers Wednesday, July 14, 2010
  • 26. Infrastructure Providers Platform Providers Application Developers Wednesday, July 14, 2010
  • 27. Content Providers Infrastructure Providers Platform Providers Application Developers Wednesday, July 14, 2010
  • 28. Content Providers Infrastructure Providers Platform Providers Application Developers End Users Wednesday, July 14, 2010
  • 29. What is new about the ecosystem today? Wednesday, July 14, 2010
  • 30. Content Providers 1. > 95% of enterprise data is unstructured 2. Data volumes growing rapidly Wednesday, July 14, 2010
  • 31. Infrastructure Providers 1. Cloud 2. Warehouse-Scale Computers Wednesday, July 14, 2010
  • 32. Platform Providers 1. Open source 2. Driven by consumer web properties Wednesday, July 14, 2010
  • 33. Application Developers 1. Data Scientists 2. Diversity of languages Wednesday, July 14, 2010
  • 34. End Users 1. Browser is the client 2. Tell a story about the business Wednesday, July 14, 2010
  • 35. 3. Foundations for a New Implementation Wednesday, July 14, 2010
  • 36. New foundations: HDFS and MapReduce Wednesday, July 14, 2010
  • 37. 2005: Doug/Mike start project inside Nutch Wednesday, July 14, 2010
  • 38. 2006: Doug joins Yahoo! Wednesday, July 14, 2010
  • 39. 2007: Make Hadoop scale Wednesday, July 14, 2010
  • 40. 2007: Make Hadoop scale Yahoo! makes Pig open source Wednesday, July 14, 2010
  • 41. Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Wednesday, July 14, 2010
  • 42. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Wednesday, July 14, 2010
  • 43. Randy Bryant’s “DISC” lecture Jim Gray’s “Fourth Paradigm” lecture 2007: Make Hadoop scale Yahoo! makes Pig open source Powerset makes HBase open source Wednesday, July 14, 2010
  • 44. 2008: Make Hadoop fast Wednesday, July 14, 2010
  • 45. 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Wednesday, July 14, 2010
  • 46. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Wednesday, July 14, 2010
  • 47. First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Wednesday, July 14, 2010
  • 48. Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Wednesday, July 14, 2010
  • 49. “MapReduce: A Major Step Backwards” Facebook makes Hive open source First Hadoop Summit 2008: Make Hadoop fast Yahoo! wins Daytona terabyte sort benchmark Yahoo! builds production webmap with Hadoop Wednesday, July 14, 2010
  • 50. 2009: Insert Hadoop into the enterprise Wednesday, July 14, 2010
  • 51. 2009: Insert Hadoop into the enterprise Cloudera releases CDH Wednesday, July 14, 2010
  • 52. First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Wednesday, July 14, 2010
  • 53. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Wednesday, July 14, 2010
  • 54. Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Wednesday, July 14, 2010
  • 55. “The Unreasonable Effectiveness of Data” Yahoo! sorts a petabyte with Hadoop First Hadoop World NYC 2009: Insert Hadoop into the enterprise Cloudera releases CDH Cloudera adds training, support, services Wednesday, July 14, 2010
  • 56. 2010: Integrate Hadoop into the enterprise Wednesday, July 14, 2010
  • 57. 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Wednesday, July 14, 2010
  • 58. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Wednesday, July 14, 2010
  • 59. Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Wednesday, July 14, 2010
  • 60. Quest, Talend, Netezza, and more integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Wednesday, July 14, 2010
  • 61. Hive adds JDBC and ODBC Quest, Talend, Netezza, and more integrate Yahoo! completes enterprise-class security 2010: Integrate Hadoop into the enterprise IBM announces InfoSphere BigInsights Datameer and Karmasphere funded Wednesday, July 14, 2010
  • 62. Hadoop will be an Analytical Data Platform Wednesday, July 14, 2010
  • 64. Capture: Log collection and CEP Wednesday, July 14, 2010
  • 65. Curate: Workflow and Scheduling Wednesday, July 14, 2010
  • 66. Curate: Secondary and Full-Text Indexing Wednesday, July 14, 2010
  • 67. Curate: Learn Structure from Data Wednesday, July 14, 2010
  • 69. Analyze: Link working set and historical data Wednesday, July 14, 2010
  • 70. All behind a single user interface Wednesday, July 14, 2010
  • 71. HUE Making Many Computers Feel Like One Wednesday, July 14, 2010
  • 72. !"#$%&'()* !"#$%"&'$"()*+(%*,-.((/0*12%#"()*30*"#*$42* 2)$2%/%"#2*(/2)*#('%52*/6-$+(%7*+(%*5(7/628*.-$- ! !"#$%&'#$()! '**)+,-.,"$"#/)0)12"+#3,"/)3"#$&,.$&'#$)43#5),"$) "#$%&'()%&($*+&),%"#-"(-)./01, ! 63-.*313$()! 7*,2($&')-'"'%$/) &$823&$()+,-.,"$"#)9$&/3,"/) 0)($.$"($"+3$/ ! :.$")/,2&+$)! ;<<=)>.'+5$) *3+$"/$( ! ?$*3'@*$)! .'#+5$()43#5)13A$/) 1&,-)12#2&$)&$*$'/$/)#,) 3-.&,9$)/#'@3*3#B ! 62..,&#$()! 7*,2($&')$-.*,B/)CD<=),1)#5$).&,E$+#)1,2"($&/)'"() '#)*$'/#),"$)+,--3##$&)1,&)CF<=),1)#5$/$),.$")/,2&+$) +,-.,"$"#/G Wednesday, July 14, 2010
  • 73. (c) 2010 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0 Wednesday, July 14, 2010