SlideShare a Scribd company logo
1 of 26
Download to read offline
The Future of Hadoop                                                                                     



Doug Cutting | A Founder of Apache Hadoop
Jeff Hammerbacher | Chief Scientist, Cloudera




 Welcome to the webinar!

 Audio/Telephone: +1 (215) 383-1016
 Access Code: 421-634-457
 Audio Pin: Shown after joining the Webinar
 Hadoop, Hbase, Pig, Hive, Bigtop, Avro, Flume & Whirr are trademark of the Apache Software Foundation
Housekeeping
▪   All lines are on mute

▪   Ask questions at any time using the Questions panel on GoToMeeting

▪   Slides and recording will be available on www.cloudera.com/events




                            ©2011 Cloudera, Inc. All Rights Reserved.
Presentation Outline
▪   1. Context
▪   2. Apache Bigtop
▪   3. Apache Hadoop Core
▪   4. Apache HBase, Hive, and Pig
▪   5. Other components


▪   Questions and Discussion




                            ©2011 Cloudera, Inc. All Rights Reserved.
1. Context
Context
Data
▪   1.8 ZB will be created and replicated in 2011
    ▪   Up 9x in the last five years
    ▪   More than 90% of this data is unstructured
    ▪   Enterprises have some liability for 80% of this data
    ▪   Enterprises will spend $4T on managing data in 2011




                                                                             ▪   Source: IDC Digital Universe Report 2011




                                 ©2011 Cloudera, Inc. All Rights Reserved.
Context
Hadoop
▪   Apache Hadoop and related software are designed for this world


▪   Volume
    ▪   Commodity hardware and open source software lowers cost and increases capacity

▪   Velocity
    ▪   Data ingest speed aided by append-only and schema-on-read design

▪   Variety
    ▪   Multiple tools to structure, process, and access data




                                             ©2011 Cloudera, Inc. All Rights Reserved.
Context
Hadoop
Context
HDFS and MapReduce
▪   Apache Hadoop = HDFS + MapReduce
    ▪   Similar to kernel of an operating system
    ▪   Referred to as “Hadoop Core”


▪   Related components are often deployed with Hadoop
    ▪   For example: HBase, Hive, Pig, Oozie, Flume, Sqoop
    ▪   Together, these components form a “Hadoop Stack”
    ▪   Not all components must be deployed
Context
Bigtop
▪   What standards should all components follow?


▪   How can we ensure all components of the stack work together?


▪   How can we find the right version of each component?


▪   How can we make it easy to install an additional component?
2. Apache Bigtop
Apache Bigtop
▪   Now incubating at Apache
▪   Hadoop ecosystem-wide project, including:
    ▪   Interoperability testing of components
    ▪   Packaging of compatible versions of components
▪   Like a Fedora, Debian or CentOS for Hadoop ecosystem
▪   Releases are not a single artifact
    ▪   Rather a set of interdependent, compatible components




                                ©2011 Cloudera, Inc. All Rights Reserved.
Apache Bigtop
▪   Current components
    ▪   Hadoop
    ▪   HBase
    ▪   Hive
    ▪   Pig
    ▪   Oozie
    ▪   Sqoop
    ▪   Flume
    ▪   ZooKeeper
    ▪   Whirr
Apache Bigtop
▪   Outputs
    ▪   Source
    ▪   RPM
    ▪   Deb
▪   Tests
    ▪   Integration
    ▪   Package
    ▪   Smoke
▪   Release 0.1.0 under vote now!
3. Apache Hadoop Core
Apache Hadoop Core
▪   Current stable releases based on branches from 0.20
▪   Upcoming release: 0.22
    ▪   Includes both security and new implementation of append
    ▪   Not expected to be run at scale or commercially supported
    ▪   Nearly ready for vote


▪   Upcoming release: 0.23
    ▪   Build and dependency management moved to Maven
    ▪   Branch to happen soon
HDFS
▪   Robustness
    ▪   HDFS-1073: Checkpointing of image and edits log


▪   Availability
    ▪   HDFS-1623: High availability


▪   Performance
    ▪   HDFS-941: Faster random reads
    ▪   HDFS-2080: Faster checksums


                              ©2011 Cloudera, Inc. All Rights Reserved.
HDFS
▪   Scalability
    ▪   HDFS-1052: Federation of the NameNode




                                    ▪   Source of diagram: http://www.hortonworks.com/an-introduction-to-hdfs-federation/
MapReduce
▪   Modularity
    ▪   MAPREDUCE-279: MapReduce 2.0
        ▪   Break JobTracker into ResourceManager and ApplicationMaster
        ▪   Replace TaskTracker with NodeManager




                                            ▪   Source of diagram: http://www.odbms.org/download/dean-keynote-ladis2009.pdf
MapReduce
▪   Potential New Frameworks
    ▪   MAPREDUCE-2719: Distributed shell
    ▪   MAPREDUCE-2720: Distributed Java commands
    ▪   MPI: Communication-intensive parallelism
    ▪   Fast scans and aggregations
        ▪   OpenDremel
    ▪   Bulk Synchronous Parallel
        ▪   Giraph, Golden Orb, Hama, et al.
    ▪   Actor Model (streaming)
        ▪   S4, Akka, Storm, et al.
4. HBase, Hive, and Pig
Apache HBase
▪   Upcoming release: 0.92.0
▪   Server-side triggers
    ▪   HBASE-2000: Coprocessors
▪   Availability
    ▪   HBASE-1730/4213: Online schema changes
▪   Performance
    ▪   HBASE-3857: HFile 2.0


▪   HBase book in September!


                            ©2011 Cloudera, Inc. All Rights Reserved.
Apache Hive
▪   Upcoming release: 0.8
▪   Data transfer
    ▪   HIVE-306: INSERT INTO
    ▪   HIVE-1918: EXPORT/IMPORT
▪   Indexes
    ▪   HIVE-1644: Automatically use indexes
    ▪   HIVE-1803: Bitmap indexes
▪   Data formats
    ▪   HIVE-895: Avro support


                              ©2011 Cloudera, Inc. All Rights Reserved.
Apache Pig
▪   Recent release: 0.9
▪   Scripting
    ▪   PIG-1479: Embedding Pig in Python
    ▪   PIG-1793: Macro expansion
▪   Debugging
    ▪   PIG-1712: ILLUSTRATE rework
▪   Data formats
    ▪   PIG-1748: Avro support




                             ©2011 Cloudera, Inc. All Rights Reserved.
5. Other Components
Other Components
▪   Apache Incubator
    ▪   Sqoop, Flume, and Oozie now incubating
    ▪   Whirr graduated to a top-level Apache project
▪   Apache Avro
    ▪   Interoperability with Protocol Buffers and Thrift
    ▪   Column-oriented file format
    ▪   Python MapReduce implementation
▪   Apache ZooKeeper
    ▪   Multi-update
    ▪   Kerberos authentication of clients

                                 ©2011 Cloudera, Inc. All Rights Reserved.
Q&A
Visit www.hadoopworld.com
• November 8-9, 2011 in New York City
• Early bird discount ends September 5, 2011

Enter Today: www.facebook.com/cloudera
• Click the “Be a Cloudera Hero for Apache
   Hadoop” tab
• Share what you think Apache Hadoop can
  do for you
• Win a personal hackathon with Doug Cutting
  in San Francisco, CA

More Related Content

What's hot

Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulobusbey
 
MySQL Performance Tuning 101
MySQL Performance Tuning 101MySQL Performance Tuning 101
MySQL Performance Tuning 101Mirko Ortensi
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Alluxio, Inc.
 
20140722 Taiwan MySQL User Group Meeting Tech Updates
20140722 Taiwan MySQL User Group Meeting Tech Updates20140722 Taiwan MySQL User Group Meeting Tech Updates
20140722 Taiwan MySQL User Group Meeting Tech UpdatesRyusuke Kajiyama
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceDinesh Chitlangia
 
TWJUG August, What's new in MySQL 5.7 RC
TWJUG August, What's new in MySQL 5.7 RCTWJUG August, What's new in MySQL 5.7 RC
TWJUG August, What's new in MySQL 5.7 RCRyusuke Kajiyama
 
MySQL Performance Tuning: The Perfect Scalability (OOW2019)
MySQL Performance Tuning: The Perfect Scalability (OOW2019)MySQL Performance Tuning: The Perfect Scalability (OOW2019)
MySQL Performance Tuning: The Perfect Scalability (OOW2019)Mirko Ortensi
 
MySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big DataMySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big DataMorgan Tocker
 
TWJUG August, MySQL JDBC Driver "Connector/J"
TWJUG August, MySQL JDBC Driver "Connector/J"TWJUG August, MySQL JDBC Driver "Connector/J"
TWJUG August, MySQL JDBC Driver "Connector/J"Ryusuke Kajiyama
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016Ted Wennmark
 
20171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v120171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v1Ivan Ma
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Cloudera, Inc.
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLMatt Lord
 
Why MySQL High Availability Matters
Why MySQL High Availability MattersWhy MySQL High Availability Matters
Why MySQL High Availability MattersMatt Lord
 
Using MySQL in the Cloud
Using MySQL in the CloudUsing MySQL in the Cloud
Using MySQL in the CloudMatt Lord
 

What's hot (20)

Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
MySQL Performance Tuning 101
MySQL Performance Tuning 101MySQL Performance Tuning 101
MySQL Performance Tuning 101
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
 
20140722 Taiwan MySQL User Group Meeting Tech Updates
20140722 Taiwan MySQL User Group Meeting Tech Updates20140722 Taiwan MySQL User Group Meeting Tech Updates
20140722 Taiwan MySQL User Group Meeting Tech Updates
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
 
TWJUG August, What's new in MySQL 5.7 RC
TWJUG August, What's new in MySQL 5.7 RCTWJUG August, What's new in MySQL 5.7 RC
TWJUG August, What's new in MySQL 5.7 RC
 
MySQL Performance Tuning: The Perfect Scalability (OOW2019)
MySQL Performance Tuning: The Perfect Scalability (OOW2019)MySQL Performance Tuning: The Perfect Scalability (OOW2019)
MySQL Performance Tuning: The Perfect Scalability (OOW2019)
 
MySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big DataMySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big Data
 
TWJUG August, MySQL JDBC Driver "Connector/J"
TWJUG August, MySQL JDBC Driver "Connector/J"TWJUG August, MySQL JDBC Driver "Connector/J"
TWJUG August, MySQL JDBC Driver "Connector/J"
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016
 
20171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v120171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v1
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQL
 
Why MySQL High Availability Matters
Why MySQL High Availability MattersWhy MySQL High Availability Matters
Why MySQL High Availability Matters
 
Using MySQL in the Cloud
Using MySQL in the CloudUsing MySQL in the Cloud
Using MySQL in the Cloud
 

Similar to Webinar: The Future of Hadoop

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Building an Apache Hadoop data application
Building an Apache Hadoop data applicationBuilding an Apache Hadoop data application
Building an Apache Hadoop data applicationtomwhite
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Cloudera, Inc.
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop WorldCloudera, Inc.
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Cloudera, Inc.
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitSaptak Sen
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupMark Kerzner
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleMark Kerzner
 
Amr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY PresentationAmr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY Presentation500 Startups
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kiteJoey Echeverria
 

Similar to Webinar: The Future of Hadoop (20)

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Building an Apache Hadoop data application
Building an Apache Hadoop data applicationBuilding an Apache Hadoop data application
Building an Apache Hadoop data application
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop World
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
 
Amr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY PresentationAmr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY Presentation
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 

Recently uploaded (20)

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 

Webinar: The Future of Hadoop

  • 1. The Future of Hadoop  Doug Cutting | A Founder of Apache Hadoop Jeff Hammerbacher | Chief Scientist, Cloudera Welcome to the webinar! Audio/Telephone: +1 (215) 383-1016 Access Code: 421-634-457 Audio Pin: Shown after joining the Webinar Hadoop, Hbase, Pig, Hive, Bigtop, Avro, Flume & Whirr are trademark of the Apache Software Foundation
  • 2. Housekeeping ▪ All lines are on mute ▪ Ask questions at any time using the Questions panel on GoToMeeting ▪ Slides and recording will be available on www.cloudera.com/events ©2011 Cloudera, Inc. All Rights Reserved.
  • 3. Presentation Outline ▪ 1. Context ▪ 2. Apache Bigtop ▪ 3. Apache Hadoop Core ▪ 4. Apache HBase, Hive, and Pig ▪ 5. Other components ▪ Questions and Discussion ©2011 Cloudera, Inc. All Rights Reserved.
  • 5. Context Data ▪ 1.8 ZB will be created and replicated in 2011 ▪ Up 9x in the last five years ▪ More than 90% of this data is unstructured ▪ Enterprises have some liability for 80% of this data ▪ Enterprises will spend $4T on managing data in 2011 ▪ Source: IDC Digital Universe Report 2011 ©2011 Cloudera, Inc. All Rights Reserved.
  • 6. Context Hadoop ▪ Apache Hadoop and related software are designed for this world ▪ Volume ▪ Commodity hardware and open source software lowers cost and increases capacity ▪ Velocity ▪ Data ingest speed aided by append-only and schema-on-read design ▪ Variety ▪ Multiple tools to structure, process, and access data ©2011 Cloudera, Inc. All Rights Reserved.
  • 8. Context HDFS and MapReduce ▪ Apache Hadoop = HDFS + MapReduce ▪ Similar to kernel of an operating system ▪ Referred to as “Hadoop Core” ▪ Related components are often deployed with Hadoop ▪ For example: HBase, Hive, Pig, Oozie, Flume, Sqoop ▪ Together, these components form a “Hadoop Stack” ▪ Not all components must be deployed
  • 9. Context Bigtop ▪ What standards should all components follow? ▪ How can we ensure all components of the stack work together? ▪ How can we find the right version of each component? ▪ How can we make it easy to install an additional component?
  • 11. Apache Bigtop ▪ Now incubating at Apache ▪ Hadoop ecosystem-wide project, including: ▪ Interoperability testing of components ▪ Packaging of compatible versions of components ▪ Like a Fedora, Debian or CentOS for Hadoop ecosystem ▪ Releases are not a single artifact ▪ Rather a set of interdependent, compatible components ©2011 Cloudera, Inc. All Rights Reserved.
  • 12. Apache Bigtop ▪ Current components ▪ Hadoop ▪ HBase ▪ Hive ▪ Pig ▪ Oozie ▪ Sqoop ▪ Flume ▪ ZooKeeper ▪ Whirr
  • 13. Apache Bigtop ▪ Outputs ▪ Source ▪ RPM ▪ Deb ▪ Tests ▪ Integration ▪ Package ▪ Smoke ▪ Release 0.1.0 under vote now!
  • 15. Apache Hadoop Core ▪ Current stable releases based on branches from 0.20 ▪ Upcoming release: 0.22 ▪ Includes both security and new implementation of append ▪ Not expected to be run at scale or commercially supported ▪ Nearly ready for vote ▪ Upcoming release: 0.23 ▪ Build and dependency management moved to Maven ▪ Branch to happen soon
  • 16. HDFS ▪ Robustness ▪ HDFS-1073: Checkpointing of image and edits log ▪ Availability ▪ HDFS-1623: High availability ▪ Performance ▪ HDFS-941: Faster random reads ▪ HDFS-2080: Faster checksums ©2011 Cloudera, Inc. All Rights Reserved.
  • 17. HDFS ▪ Scalability ▪ HDFS-1052: Federation of the NameNode ▪ Source of diagram: http://www.hortonworks.com/an-introduction-to-hdfs-federation/
  • 18. MapReduce ▪ Modularity ▪ MAPREDUCE-279: MapReduce 2.0 ▪ Break JobTracker into ResourceManager and ApplicationMaster ▪ Replace TaskTracker with NodeManager ▪ Source of diagram: http://www.odbms.org/download/dean-keynote-ladis2009.pdf
  • 19. MapReduce ▪ Potential New Frameworks ▪ MAPREDUCE-2719: Distributed shell ▪ MAPREDUCE-2720: Distributed Java commands ▪ MPI: Communication-intensive parallelism ▪ Fast scans and aggregations ▪ OpenDremel ▪ Bulk Synchronous Parallel ▪ Giraph, Golden Orb, Hama, et al. ▪ Actor Model (streaming) ▪ S4, Akka, Storm, et al.
  • 20. 4. HBase, Hive, and Pig
  • 21. Apache HBase ▪ Upcoming release: 0.92.0 ▪ Server-side triggers ▪ HBASE-2000: Coprocessors ▪ Availability ▪ HBASE-1730/4213: Online schema changes ▪ Performance ▪ HBASE-3857: HFile 2.0 ▪ HBase book in September! ©2011 Cloudera, Inc. All Rights Reserved.
  • 22. Apache Hive ▪ Upcoming release: 0.8 ▪ Data transfer ▪ HIVE-306: INSERT INTO ▪ HIVE-1918: EXPORT/IMPORT ▪ Indexes ▪ HIVE-1644: Automatically use indexes ▪ HIVE-1803: Bitmap indexes ▪ Data formats ▪ HIVE-895: Avro support ©2011 Cloudera, Inc. All Rights Reserved.
  • 23. Apache Pig ▪ Recent release: 0.9 ▪ Scripting ▪ PIG-1479: Embedding Pig in Python ▪ PIG-1793: Macro expansion ▪ Debugging ▪ PIG-1712: ILLUSTRATE rework ▪ Data formats ▪ PIG-1748: Avro support ©2011 Cloudera, Inc. All Rights Reserved.
  • 25. Other Components ▪ Apache Incubator ▪ Sqoop, Flume, and Oozie now incubating ▪ Whirr graduated to a top-level Apache project ▪ Apache Avro ▪ Interoperability with Protocol Buffers and Thrift ▪ Column-oriented file format ▪ Python MapReduce implementation ▪ Apache ZooKeeper ▪ Multi-update ▪ Kerberos authentication of clients ©2011 Cloudera, Inc. All Rights Reserved.
  • 26. Q&A Visit www.hadoopworld.com • November 8-9, 2011 in New York City • Early bird discount ends September 5, 2011 Enter Today: www.facebook.com/cloudera • Click the “Be a Cloudera Hero for Apache Hadoop” tab • Share what you think Apache Hadoop can do for you • Win a personal hackathon with Doug Cutting in San Francisco, CA