SlideShare a Scribd company logo
1 of 26
Download to read offline
The Future of Hadoop                                                                                     



Doug Cutting | A Founder of Apache Hadoop
Jeff Hammerbacher | Chief Scientist, Cloudera




 Welcome to the webinar!

 Audio/Telephone: +1 (215) 383-1016
 Access Code: 421-634-457
 Audio Pin: Shown after joining the Webinar
 Hadoop, Hbase, Pig, Hive, Bigtop, Avro, Flume & Whirr are trademark of the Apache Software Foundation
Housekeeping
▪   All lines are on mute

▪   Ask questions at any time using the Questions panel on GoToMeeting

▪   Slides and recording will be available on www.cloudera.com/events




                            ©2011 Cloudera, Inc. All Rights Reserved.
Presentation Outline
▪   1. Context
▪   2. Apache Bigtop
▪   3. Apache Hadoop Core
▪   4. Apache HBase, Hive, and Pig
▪   5. Other components


▪   Questions and Discussion




                            ©2011 Cloudera, Inc. All Rights Reserved.
1. Context
Context
Data
▪   1.8 ZB will be created and replicated in 2011
    ▪   Up 9x in the last five years
    ▪   More than 90% of this data is unstructured
    ▪   Enterprises have some liability for 80% of this data
    ▪   Enterprises will spend $4T on managing data in 2011




                                                                             ▪   Source: IDC Digital Universe Report 2011




                                 ©2011 Cloudera, Inc. All Rights Reserved.
Context
Hadoop
▪   Apache Hadoop and related software are designed for this world


▪   Volume
    ▪   Commodity hardware and open source software lowers cost and increases capacity

▪   Velocity
    ▪   Data ingest speed aided by append-only and schema-on-read design

▪   Variety
    ▪   Multiple tools to structure, process, and access data




                                             ©2011 Cloudera, Inc. All Rights Reserved.
Context
Hadoop
Context
HDFS and MapReduce
▪   Apache Hadoop = HDFS + MapReduce
    ▪   Similar to kernel of an operating system
    ▪   Referred to as “Hadoop Core”


▪   Related components are often deployed with Hadoop
    ▪   For example: HBase, Hive, Pig, Oozie, Flume, Sqoop
    ▪   Together, these components form a “Hadoop Stack”
    ▪   Not all components must be deployed
Context
Bigtop
▪   What standards should all components follow?


▪   How can we ensure all components of the stack work together?


▪   How can we find the right version of each component?


▪   How can we make it easy to install an additional component?
2. Apache Bigtop
Apache Bigtop
▪   Now incubating at Apache
▪   Hadoop ecosystem-wide project, including:
    ▪   Interoperability testing of components
    ▪   Packaging of compatible versions of components
▪   Like a Fedora, Debian or CentOS for Hadoop ecosystem
▪   Releases are not a single artifact
    ▪   Rather a set of interdependent, compatible components




                                ©2011 Cloudera, Inc. All Rights Reserved.
Apache Bigtop
▪   Current components
    ▪   Hadoop
    ▪   HBase
    ▪   Hive
    ▪   Pig
    ▪   Oozie
    ▪   Sqoop
    ▪   Flume
    ▪   ZooKeeper
    ▪   Whirr
Apache Bigtop
▪   Outputs
    ▪   Source
    ▪   RPM
    ▪   Deb
▪   Tests
    ▪   Integration
    ▪   Package
    ▪   Smoke
▪   Release 0.1.0 under vote now!
3. Apache Hadoop Core
Apache Hadoop Core
▪   Current stable releases based on branches from 0.20
▪   Upcoming release: 0.22
    ▪   Includes both security and new implementation of append
    ▪   Not expected to be run at scale or commercially supported
    ▪   Nearly ready for vote


▪   Upcoming release: 0.23
    ▪   Build and dependency management moved to Maven
    ▪   Branch to happen soon
HDFS
▪   Robustness
    ▪   HDFS-1073: Checkpointing of image and edits log


▪   Availability
    ▪   HDFS-1623: High availability


▪   Performance
    ▪   HDFS-941: Faster random reads
    ▪   HDFS-2080: Faster checksums


                              ©2011 Cloudera, Inc. All Rights Reserved.
HDFS
▪   Scalability
    ▪   HDFS-1052: Federation of the NameNode




                                    ▪   Source of diagram: http://www.hortonworks.com/an-introduction-to-hdfs-federation/
MapReduce
▪   Modularity
    ▪   MAPREDUCE-279: MapReduce 2.0
        ▪   Break JobTracker into ResourceManager and ApplicationMaster
        ▪   Replace TaskTracker with NodeManager




                                            ▪   Source of diagram: http://www.odbms.org/download/dean-keynote-ladis2009.pdf
MapReduce
▪   Potential New Frameworks
    ▪   MAPREDUCE-2719: Distributed shell
    ▪   MAPREDUCE-2720: Distributed Java commands
    ▪   MPI: Communication-intensive parallelism
    ▪   Fast scans and aggregations
        ▪   OpenDremel
    ▪   Bulk Synchronous Parallel
        ▪   Giraph, Golden Orb, Hama, et al.
    ▪   Actor Model (streaming)
        ▪   S4, Akka, Storm, et al.
4. HBase, Hive, and Pig
Apache HBase
▪   Upcoming release: 0.92.0
▪   Server-side triggers
    ▪   HBASE-2000: Coprocessors
▪   Availability
    ▪   HBASE-1730/4213: Online schema changes
▪   Performance
    ▪   HBASE-3857: HFile 2.0


▪   HBase book in September!


                            ©2011 Cloudera, Inc. All Rights Reserved.
Apache Hive
▪   Upcoming release: 0.8
▪   Data transfer
    ▪   HIVE-306: INSERT INTO
    ▪   HIVE-1918: EXPORT/IMPORT
▪   Indexes
    ▪   HIVE-1644: Automatically use indexes
    ▪   HIVE-1803: Bitmap indexes
▪   Data formats
    ▪   HIVE-895: Avro support


                              ©2011 Cloudera, Inc. All Rights Reserved.
Apache Pig
▪   Recent release: 0.9
▪   Scripting
    ▪   PIG-1479: Embedding Pig in Python
    ▪   PIG-1793: Macro expansion
▪   Debugging
    ▪   PIG-1712: ILLUSTRATE rework
▪   Data formats
    ▪   PIG-1748: Avro support




                             ©2011 Cloudera, Inc. All Rights Reserved.
5. Other Components
Other Components
▪   Apache Incubator
    ▪   Sqoop, Flume, and Oozie now incubating
    ▪   Whirr graduated to a top-level Apache project
▪   Apache Avro
    ▪   Interoperability with Protocol Buffers and Thrift
    ▪   Column-oriented file format
    ▪   Python MapReduce implementation
▪   Apache ZooKeeper
    ▪   Multi-update
    ▪   Kerberos authentication of clients

                                 ©2011 Cloudera, Inc. All Rights Reserved.
Q&A
Visit www.hadoopworld.com
• November 8-9, 2011 in New York City
• Early bird discount ends September 5, 2011

Enter Today: www.facebook.com/cloudera
• Click the “Be a Cloudera Hero for Apache
   Hadoop” tab
• Share what you think Apache Hadoop can
  do for you
• Win a personal hackathon with Doug Cutting
  in San Francisco, CA

More Related Content

What's hot

Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulobusbey
 
MySQL Performance Tuning 101
MySQL Performance Tuning 101MySQL Performance Tuning 101
MySQL Performance Tuning 101Mirko Ortensi
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Alluxio, Inc.
 
20140722 Taiwan MySQL User Group Meeting Tech Updates
20140722 Taiwan MySQL User Group Meeting Tech Updates20140722 Taiwan MySQL User Group Meeting Tech Updates
20140722 Taiwan MySQL User Group Meeting Tech UpdatesRyusuke Kajiyama
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceDinesh Chitlangia
 
TWJUG August, What's new in MySQL 5.7 RC
TWJUG August, What's new in MySQL 5.7 RCTWJUG August, What's new in MySQL 5.7 RC
TWJUG August, What's new in MySQL 5.7 RCRyusuke Kajiyama
 
MySQL Performance Tuning: The Perfect Scalability (OOW2019)
MySQL Performance Tuning: The Perfect Scalability (OOW2019)MySQL Performance Tuning: The Perfect Scalability (OOW2019)
MySQL Performance Tuning: The Perfect Scalability (OOW2019)Mirko Ortensi
 
MySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big DataMySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big DataMorgan Tocker
 
TWJUG August, MySQL JDBC Driver "Connector/J"
TWJUG August, MySQL JDBC Driver "Connector/J"TWJUG August, MySQL JDBC Driver "Connector/J"
TWJUG August, MySQL JDBC Driver "Connector/J"Ryusuke Kajiyama
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016Ted Wennmark
 
20171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v120171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v1Ivan Ma
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Cloudera, Inc.
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLMatt Lord
 
Why MySQL High Availability Matters
Why MySQL High Availability MattersWhy MySQL High Availability Matters
Why MySQL High Availability MattersMatt Lord
 
Using MySQL in the Cloud
Using MySQL in the CloudUsing MySQL in the Cloud
Using MySQL in the CloudMatt Lord
 

What's hot (20)

Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
MySQL Performance Tuning 101
MySQL Performance Tuning 101MySQL Performance Tuning 101
MySQL Performance Tuning 101
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
 
20140722 Taiwan MySQL User Group Meeting Tech Updates
20140722 Taiwan MySQL User Group Meeting Tech Updates20140722 Taiwan MySQL User Group Meeting Tech Updates
20140722 Taiwan MySQL User Group Meeting Tech Updates
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR complianceOzone: Evolution of HDFS scalability & built-in GDPR compliance
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
 
TWJUG August, What's new in MySQL 5.7 RC
TWJUG August, What's new in MySQL 5.7 RCTWJUG August, What's new in MySQL 5.7 RC
TWJUG August, What's new in MySQL 5.7 RC
 
MySQL Performance Tuning: The Perfect Scalability (OOW2019)
MySQL Performance Tuning: The Perfect Scalability (OOW2019)MySQL Performance Tuning: The Perfect Scalability (OOW2019)
MySQL Performance Tuning: The Perfect Scalability (OOW2019)
 
MySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big DataMySQL: From Single Instance to Big Data
MySQL: From Single Instance to Big Data
 
TWJUG August, MySQL JDBC Driver "Connector/J"
TWJUG August, MySQL JDBC Driver "Connector/J"TWJUG August, MySQL JDBC Driver "Connector/J"
TWJUG August, MySQL JDBC Driver "Connector/J"
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016
 
20171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v120171104 hk-py con-mysql-documentstore_v1
20171104 hk-py con-mysql-documentstore_v1
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQL
 
Why MySQL High Availability Matters
Why MySQL High Availability MattersWhy MySQL High Availability Matters
Why MySQL High Availability Matters
 
Using MySQL in the Cloud
Using MySQL in the CloudUsing MySQL in the Cloud
Using MySQL in the Cloud
 

Similar to Webinar: The Future of Hadoop

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Building an Apache Hadoop data application
Building an Apache Hadoop data applicationBuilding an Apache Hadoop data application
Building an Apache Hadoop data applicationtomwhite
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Cloudera, Inc.
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop WorldCloudera, Inc.
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Cloudera, Inc.
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitSaptak Sen
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleMark Kerzner
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupMark Kerzner
 
Amr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY PresentationAmr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY Presentation500 Startups
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kiteJoey Echeverria
 

Similar to Webinar: The Future of Hadoop (20)

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Building an Apache Hadoop data application
Building an Apache Hadoop data applicationBuilding an Apache Hadoop data application
Building an Apache Hadoop data application
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5
 
Hw09 Welcome To Hadoop World
Hw09   Welcome To Hadoop WorldHw09   Welcome To Hadoop World
Hw09 Welcome To Hadoop World
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Hadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - AltiscaleHadoop Hadoop & Spark meetup - Altiscale
Hadoop Hadoop & Spark meetup - Altiscale
 
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop MeetupHadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
Hadoop as a service presented by Ajay Jha at Houston Hadoop Meetup
 
Amr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY PresentationAmr Awadallah, unSEXY Presentation
Amr Awadallah, unSEXY Presentation
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 

Recently uploaded (20)

201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 

Webinar: The Future of Hadoop

  • 1. The Future of Hadoop  Doug Cutting | A Founder of Apache Hadoop Jeff Hammerbacher | Chief Scientist, Cloudera Welcome to the webinar! Audio/Telephone: +1 (215) 383-1016 Access Code: 421-634-457 Audio Pin: Shown after joining the Webinar Hadoop, Hbase, Pig, Hive, Bigtop, Avro, Flume & Whirr are trademark of the Apache Software Foundation
  • 2. Housekeeping ▪ All lines are on mute ▪ Ask questions at any time using the Questions panel on GoToMeeting ▪ Slides and recording will be available on www.cloudera.com/events ©2011 Cloudera, Inc. All Rights Reserved.
  • 3. Presentation Outline ▪ 1. Context ▪ 2. Apache Bigtop ▪ 3. Apache Hadoop Core ▪ 4. Apache HBase, Hive, and Pig ▪ 5. Other components ▪ Questions and Discussion ©2011 Cloudera, Inc. All Rights Reserved.
  • 5. Context Data ▪ 1.8 ZB will be created and replicated in 2011 ▪ Up 9x in the last five years ▪ More than 90% of this data is unstructured ▪ Enterprises have some liability for 80% of this data ▪ Enterprises will spend $4T on managing data in 2011 ▪ Source: IDC Digital Universe Report 2011 ©2011 Cloudera, Inc. All Rights Reserved.
  • 6. Context Hadoop ▪ Apache Hadoop and related software are designed for this world ▪ Volume ▪ Commodity hardware and open source software lowers cost and increases capacity ▪ Velocity ▪ Data ingest speed aided by append-only and schema-on-read design ▪ Variety ▪ Multiple tools to structure, process, and access data ©2011 Cloudera, Inc. All Rights Reserved.
  • 8. Context HDFS and MapReduce ▪ Apache Hadoop = HDFS + MapReduce ▪ Similar to kernel of an operating system ▪ Referred to as “Hadoop Core” ▪ Related components are often deployed with Hadoop ▪ For example: HBase, Hive, Pig, Oozie, Flume, Sqoop ▪ Together, these components form a “Hadoop Stack” ▪ Not all components must be deployed
  • 9. Context Bigtop ▪ What standards should all components follow? ▪ How can we ensure all components of the stack work together? ▪ How can we find the right version of each component? ▪ How can we make it easy to install an additional component?
  • 11. Apache Bigtop ▪ Now incubating at Apache ▪ Hadoop ecosystem-wide project, including: ▪ Interoperability testing of components ▪ Packaging of compatible versions of components ▪ Like a Fedora, Debian or CentOS for Hadoop ecosystem ▪ Releases are not a single artifact ▪ Rather a set of interdependent, compatible components ©2011 Cloudera, Inc. All Rights Reserved.
  • 12. Apache Bigtop ▪ Current components ▪ Hadoop ▪ HBase ▪ Hive ▪ Pig ▪ Oozie ▪ Sqoop ▪ Flume ▪ ZooKeeper ▪ Whirr
  • 13. Apache Bigtop ▪ Outputs ▪ Source ▪ RPM ▪ Deb ▪ Tests ▪ Integration ▪ Package ▪ Smoke ▪ Release 0.1.0 under vote now!
  • 15. Apache Hadoop Core ▪ Current stable releases based on branches from 0.20 ▪ Upcoming release: 0.22 ▪ Includes both security and new implementation of append ▪ Not expected to be run at scale or commercially supported ▪ Nearly ready for vote ▪ Upcoming release: 0.23 ▪ Build and dependency management moved to Maven ▪ Branch to happen soon
  • 16. HDFS ▪ Robustness ▪ HDFS-1073: Checkpointing of image and edits log ▪ Availability ▪ HDFS-1623: High availability ▪ Performance ▪ HDFS-941: Faster random reads ▪ HDFS-2080: Faster checksums ©2011 Cloudera, Inc. All Rights Reserved.
  • 17. HDFS ▪ Scalability ▪ HDFS-1052: Federation of the NameNode ▪ Source of diagram: http://www.hortonworks.com/an-introduction-to-hdfs-federation/
  • 18. MapReduce ▪ Modularity ▪ MAPREDUCE-279: MapReduce 2.0 ▪ Break JobTracker into ResourceManager and ApplicationMaster ▪ Replace TaskTracker with NodeManager ▪ Source of diagram: http://www.odbms.org/download/dean-keynote-ladis2009.pdf
  • 19. MapReduce ▪ Potential New Frameworks ▪ MAPREDUCE-2719: Distributed shell ▪ MAPREDUCE-2720: Distributed Java commands ▪ MPI: Communication-intensive parallelism ▪ Fast scans and aggregations ▪ OpenDremel ▪ Bulk Synchronous Parallel ▪ Giraph, Golden Orb, Hama, et al. ▪ Actor Model (streaming) ▪ S4, Akka, Storm, et al.
  • 20. 4. HBase, Hive, and Pig
  • 21. Apache HBase ▪ Upcoming release: 0.92.0 ▪ Server-side triggers ▪ HBASE-2000: Coprocessors ▪ Availability ▪ HBASE-1730/4213: Online schema changes ▪ Performance ▪ HBASE-3857: HFile 2.0 ▪ HBase book in September! ©2011 Cloudera, Inc. All Rights Reserved.
  • 22. Apache Hive ▪ Upcoming release: 0.8 ▪ Data transfer ▪ HIVE-306: INSERT INTO ▪ HIVE-1918: EXPORT/IMPORT ▪ Indexes ▪ HIVE-1644: Automatically use indexes ▪ HIVE-1803: Bitmap indexes ▪ Data formats ▪ HIVE-895: Avro support ©2011 Cloudera, Inc. All Rights Reserved.
  • 23. Apache Pig ▪ Recent release: 0.9 ▪ Scripting ▪ PIG-1479: Embedding Pig in Python ▪ PIG-1793: Macro expansion ▪ Debugging ▪ PIG-1712: ILLUSTRATE rework ▪ Data formats ▪ PIG-1748: Avro support ©2011 Cloudera, Inc. All Rights Reserved.
  • 25. Other Components ▪ Apache Incubator ▪ Sqoop, Flume, and Oozie now incubating ▪ Whirr graduated to a top-level Apache project ▪ Apache Avro ▪ Interoperability with Protocol Buffers and Thrift ▪ Column-oriented file format ▪ Python MapReduce implementation ▪ Apache ZooKeeper ▪ Multi-update ▪ Kerberos authentication of clients ©2011 Cloudera, Inc. All Rights Reserved.
  • 26. Q&A Visit www.hadoopworld.com • November 8-9, 2011 in New York City • Early bird discount ends September 5, 2011 Enter Today: www.facebook.com/cloudera • Click the “Be a Cloudera Hero for Apache Hadoop” tab • Share what you think Apache Hadoop can do for you • Win a personal hackathon with Doug Cutting in San Francisco, CA