The document is a presentation given by Jeff Hammerbacher on July 14, 2010 about evolving a new analytical platform. The presentation outlines the history and evolution of Hadoop and how it is becoming an analytical data platform that can support the entire data analysis process. It discusses how Hadoop was originally developed at Yahoo and how its ecosystem has expanded to include many contributors and a variety of related projects. The presentation also looks at future developments for Hadoop including better support for data capture, curation, analysis and a unified user interface.
2. Evolving a New Analytical Platform
What Works and What’s Missing
Jeff Hammerbacher
Chief Scientist, Cloudera
July 14, 2010
Wednesday, July 14, 2010
3. My Background
Thanks for Asking
▪ hammer@cloudera.com
▪ Studied Mathematics at Harvard
▪ Worked as a Quant on Wall Street
▪ Conceived, built, and led Data team at Facebook
▪ Nearly 30 amazing engineers and data scientists
▪ Several open source projects and research papers
▪ Founder of Cloudera
▪ Chief Scientist
▪ Also, check out the book “Beautiful Data”
Wednesday, July 14, 2010
4. Presentation Outline
▪ 1. Defining the Platform
▪ BI: Science for Profit
▪ Need tools for whole research cycle
▪ SQL Server 2008 R2: defining the platform
▪ 2. State of the Platform Ecosystem
▪ 3. Foundations for a New Implementation
▪ Hadoop
▪ Boiling the Frog
▪ 4. Future Developments
▪ Questions and Discussion
Wednesday, July 14, 2010
11. ETL: SQL Server Integration Services
RDBMS: SQL Server
Wednesday, July 14, 2010
12. ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Wednesday, July 14, 2010
13. ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Wednesday, July 14, 2010
14. ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Search: Full-Text Search
Wednesday, July 14, 2010
15. CEP: StreamInsight
ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Search: Full-Text Search
Wednesday, July 14, 2010
16. CEP: StreamInsight
ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Search: Full-Text Search
OLAP: PowerPivot
Wednesday, July 14, 2010
17. MDM: Master Data Services
CEP: StreamInsight
ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Search: Full-Text Search
OLAP: PowerPivot
Wednesday, July 14, 2010
18. Collaboration: SharePoint
MDM: Master Data Services
CEP: StreamInsight
ETL: SQL Server Integration Services
RDBMS: SQL Server
Reporting: SQL Server Reporting Services
Analysis: SQL Server Analysis Services
Search: Full-Text Search
OLAP: PowerPivot
Wednesday, July 14, 2010
19. What do we call this unified suite?
Wednesday, July 14, 2010
40. 2007: Make Hadoop scale
Yahoo! makes Pig open source
Wednesday, July 14, 2010
41. Jim Gray’s “Fourth Paradigm” lecture
2007: Make Hadoop scale
Yahoo! makes Pig open source
Wednesday, July 14, 2010
42. Randy Bryant’s “DISC” lecture
Jim Gray’s “Fourth Paradigm” lecture
2007: Make Hadoop scale
Yahoo! makes Pig open source
Wednesday, July 14, 2010
43. Randy Bryant’s “DISC” lecture
Jim Gray’s “Fourth Paradigm” lecture
2007: Make Hadoop scale
Yahoo! makes Pig open source
Powerset makes HBase open source
Wednesday, July 14, 2010
45. 2008: Make Hadoop fast
Yahoo! wins Daytona terabyte sort benchmark
Wednesday, July 14, 2010
46. First Hadoop Summit
2008: Make Hadoop fast
Yahoo! wins Daytona terabyte sort benchmark
Wednesday, July 14, 2010
47. First Hadoop Summit
2008: Make Hadoop fast
Yahoo! wins Daytona terabyte sort benchmark
Yahoo! builds production webmap with Hadoop
Wednesday, July 14, 2010
48. Facebook makes Hive open source
First Hadoop Summit
2008: Make Hadoop fast
Yahoo! wins Daytona terabyte sort benchmark
Yahoo! builds production webmap with Hadoop
Wednesday, July 14, 2010
49. “MapReduce: A Major Step Backwards”
Facebook makes Hive open source
First Hadoop Summit
2008: Make Hadoop fast
Yahoo! wins Daytona terabyte sort benchmark
Yahoo! builds production webmap with Hadoop
Wednesday, July 14, 2010
51. 2009: Insert Hadoop into the enterprise
Cloudera releases CDH
Wednesday, July 14, 2010
52. First Hadoop World NYC
2009: Insert Hadoop into the enterprise
Cloudera releases CDH
Wednesday, July 14, 2010
53. Yahoo! sorts a petabyte with Hadoop
First Hadoop World NYC
2009: Insert Hadoop into the enterprise
Cloudera releases CDH
Wednesday, July 14, 2010
54. Yahoo! sorts a petabyte with Hadoop
First Hadoop World NYC
2009: Insert Hadoop into the enterprise
Cloudera releases CDH
Cloudera adds training, support, services
Wednesday, July 14, 2010
55. “The Unreasonable Effectiveness of Data”
Yahoo! sorts a petabyte with Hadoop
First Hadoop World NYC
2009: Insert Hadoop into the enterprise
Cloudera releases CDH
Cloudera adds training, support, services
Wednesday, July 14, 2010
57. 2010: Integrate Hadoop into the enterprise
IBM announces InfoSphere BigInsights
Wednesday, July 14, 2010
58. Yahoo! completes enterprise-class security
2010: Integrate Hadoop into the enterprise
IBM announces InfoSphere BigInsights
Wednesday, July 14, 2010
59. Yahoo! completes enterprise-class security
2010: Integrate Hadoop into the enterprise
IBM announces InfoSphere BigInsights
Datameer and Karmasphere funded
Wednesday, July 14, 2010
60. Quest, Talend, Netezza, and more integrate
Yahoo! completes enterprise-class security
2010: Integrate Hadoop into the enterprise
IBM announces InfoSphere BigInsights
Datameer and Karmasphere funded
Wednesday, July 14, 2010
61. Hive adds JDBC and ODBC
Quest, Talend, Netezza, and more integrate
Yahoo! completes enterprise-class security
2010: Integrate Hadoop into the enterprise
IBM announces InfoSphere BigInsights
Datameer and Karmasphere funded
Wednesday, July 14, 2010
62. Hadoop will be an Analytical Data Platform
Wednesday, July 14, 2010
73. (c) 2010 Cloudera, Inc. or its licensors. "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0
Wednesday, July 14, 2010