SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Welcome to
 Production-izing Hadoop: Lessons Learned

Audio/Telephone: +1 916 233 3087
Access Code: 616-465-108
Audio PIN: Shown after joining the Webinar
Housekeeping

• Ask questions at any time using the questions panel

• Problems? Use the chat panel

• Book drawing - winner announced at the end

• Slides and recording will be available



                  Copyright 2010 Cloudera Inc. All rights reserved   2
Poll

What is your interest in Hadoop?
• Just learning about it
• I have a problem I think Hadoop can solve
• Using Hadoop in our labs
• Using Hadoop in production




                 Copyright 2010 Cloudera Inc. All rights reserved   3
Speaker: Eric Sammer
Eric is a Solution Architect and Training Instructor for Cloudera. He has
worked with dozens of customers in a variety of industries including
Cloudera's largest Hadoop deployments. His experience ranges from
clusters of a few nodes to clusters with hundreds of nodes with complex
multi-tenant user environments.

Prior to joining Cloudera, he held roles including System Architect, Director
of Technical Operations, and Tech Lead at various New York City startups
focusing on distributed data collection, processing, and reporting systems.
Eric has over 12 years in development and technical operations and has
contributed to various open source projects such as Gentoo Linux.

twitter: @esammer, @cloudera

                       Copyright 2010 Cloudera Inc. All rights reserved     4
Starting Out

        (You)


                                     “Let’s build a Hadoop cluster!”




                                             http://www.iccs.inf.ed.ac.uk/~miles/code.html

                Copyright 2010 Cloudera Inc. All rights reserved                             5
Starting Out

        (You)




                                        http://www.iccs.inf.ed.ac.uk/~miles/code.html

                Copyright 2010 Cloudera Inc. All rights reserved                        6
Where you want to be




                                                                                  (You)




Yahoo! Hadoop Cluster (2007)




                               Copyright 2010 Cloudera Inc. All rights reserved           7
What is Hadoop?

• A scalable fault-tolerant distributed system for data storage
  and processing (open source under the Apache license)

• Core Hadoop has two main components
   • Hadoop Distributed File System (HDFS): self-healing high-bandwidth
     clustered storage
   • MapReduce: fault-tolerant distributed processing

• Key value
   •   Flexible -> store data without a schema and add it later as needed
   •   Affordable -> cost / TB at a fraction of traditional options
   •   Broadly adopted -> a large and active ecosystem
   •   Proven at scale -> dozens of petabyte + implementations in
       production today
                      Copyright 2010 Cloudera Inc. All Rights Reserved.     8
Cloudera’s Distribution for Hadoop, Version 3
The Industry’s Leading Hadoop Distribution



                                                  Hue                              Hue SDK

                               Oozie                               Oozie               Hive
                                                                           Pig/
                                                                           Hive


                Flume, Sqoop                                                         HBase

                                                                                  Zookeeper



•   Open source – 100% Apache licensed
•   Simplified – Component versions & dependencies managed for you
•   Integrated – All components & functions interoperate through standard API’s
•   Reliable – Patched with fixes from future releases to improve stability
•   Supported – Employs project founders and committers for >70% of components
                     Copyright 2010 Cloudera Inc. All Rights Reserved.                        9
Overview

•   Proper planning
•   Data Ingestion
•   ETL and Data Processing Infrastructure
•   Authentication, Authorization, and Sharing
•   Monitoring




                  Copyright 2010 Cloudera Inc. All rights reserved   10
The production data platform

•   Data storage
•   ETL / data processing / analysis infrastructure
•   Data ingestion infrastructure
•   Integration with tools
•   Data security and access control
•   Health and performance monitoring




                   Copyright 2010 Cloudera Inc. All rights reserved   11
Proper planning

• Know your use cases!
   •   Log transformation, aggregation
   •   Text mining, IR
   •   Analytics
   •   Machine learning
• Critical to proper configuration
   • Hadoop
   • Network
   • OS
• Resource utilization, deep job insight will tell you more
                     Copyright 2010 Cloudera Inc. All rights reserved   12
HDFS Concerns

• Name node availability
   • HA is tricky
   • Consider where Hadoop lives in the system
   • Manual recovery can be simple, fast, effective
• Backup Strategy
   • Name node metadata – hourly, ~2 day retention
   • User data
      • Log shipping style strategies
      • DistCp
      • “Fan out” to multiple clusters on ingestion


                     Copyright 2010 Cloudera Inc. All rights reserved   13
Data Ingestion

• Many data sources
   •   Streaming data sources (log files, mostly)
   •   RDBMS
   •   EDW
   •   Files (usually exports from 3rd party)
• Common place we see DIY
   • You probably shouldn’t
   • Sqoop, Flume, Oozie (but I’m biased)
• No matter what - fault tolerant, performant, monitored


                     Copyright 2010 Cloudera Inc. All rights reserved   14
ETL and Data Processing

•   Non-interactive jobs
•   Establish a common directory structure for processes
•   Need tools to handle complex chains of jobs
•   Workflow tools support
    • Job dependencies, error handling
    • Tracking
    • Invocation based on time or events
• Most common mistake: depending on jobs always
  completing successfully or within a window of time.
    • Monitor for SLA rather than pray
    • Defensive coding practices apply just as they do everywhere else!
                      Copyright 2010 Cloudera Inc. All rights reserved    15
Metadata Management

• Tool independent metadata about…
   •   Data sets we know about and their location (on HDFS)
   •   Schemata
   •   Authorization (currently HDFS permissions only)
   •   Partitioning
   •   Format and compression
   •   Guarantees (consistency, timeliness, permits duplicates)
• Currently still DIY in many ways, tool-dependent
• Most people rely on prayer and hard coding
• (H)OWL is interesting
                     Copyright 2010 Cloudera Inc. All rights reserved   16
Authentication and authorization
• Authentication
   • Don’t talk to strangers
   • Should integrate with existing IT infrastructure
   • Yahoo! security (Kerberos) patches now part of CDH3b3
• Authorization
   • Not everyone can access everything
      • Ex. Production data sets are read-only to quants / analysts. Analysts
        have home or group directories for derived data sets.
   • Mostly enforced via HDFS permissions; directory structure and
     organization is critical
   • Not as fine grained as column level access in EDW, RDBMS
• HUE as a gateway to the cluster
                      Copyright 2010 Cloudera Inc. All rights reserved          17
Resource Sharing

• Prefer one large cluster to many small clusters (unless
  maybe you’re Facebook)
• “Stop hogging the cluster!”
• Cluster resources
   •   Disk space (HDFS size quotas)
   •   Number of files (HDFS file count quotas)
   •   Simultaneous jobs
   •   Tasks – guaranteed capacity, full utilization, SLA enforcement
• Monitor and track resource utilization across all groups


                     Copyright 2010 Cloudera Inc. All rights reserved   18
Monitoring

• Critical for keeping things running
• Cluster health
   •   Duh.
   •   Traditional monitoring tools: Nagios, Hyperic, Zenoss
   •   Host checks, service checks
   •   When to alert? It’s tricky.
• Cluster performance
   • Overall utilization in aggregate
   • 30,000ft view of utilization and performance; macro level


                     Copyright 2010 Cloudera Inc. All rights reserved   19
Monitoring
• Hadoop aware cluster monitoring
   • Traditional tools don’t cut it; Hadoop monitoring is inherently
     Hadoop specific
   • Analogous to RDBMS monitoring tools
• Job level “monitoring”
   •   More like analysis
   •   “What resources does this job use?”
   •   “How does this run compare to last run?”
   •   “How can I make this run faster, more resource efficient?”
   •   Two views we care about
        • Job perspective
        • Resource perspective (task slots, scheduler pool)

                       Copyright 2010 Cloudera Inc. All rights reserved   20
Wrapping it up

• Hadoop proper is awesome, but is only part of the
  picture
• Much of Professional Services time is filling in the blanks
• There’s still a way to go
   • Metadata management
   • Operational tools and support
   • Improvements to Hadoop core to improve stability, security,
     manageability
• Adoption and feedback drive progress
• CDH provides the infrastructure for a complete system
                   Copyright 2010 Cloudera Inc. All rights reserved   21
Cloudera Makes Hadoop Safe For the Enterprise




        Software                  Services                         Training




               Copyright 2010 Cloudera Inc. All Rights Reserved.              22
Cloudera Enterprise
Enterprise Support and Management Tools




 • Increases reliability and consistency of the Hadoop platform
 • Improves Hadoop’s conformance to important IT policies and procedures
 • Lowers the cost of management and administration


                     Copyright 2010 Cloudera Inc. All Rights Reserved.     23
References / Resources

•   Cloudera documentation - http://docs.cloudera.com
•   Cloudera Groups – http://groups.cloudera.org
•   Cloudera JIRA – http://issues.cloudera.org
•   Hadoop the Definitive Guide

• esammer@cloudera.com
• irc.freenode.net #cloudera, #hadoop
• @esammer


                  Copyright 2010 Cloudera Inc. All rights reserved   24
Poll

What other topics would you be most interested in
hearing about?
• More case studies of enterprises using Hadoop
• Technical "How to" sessions
• Industry specific applications of Hadoop
• Technical overviews of Hadoop and related components




                Copyright 2010 Cloudera Inc. All rights reserved   25
Winner of the drawing is…




              Copyright 2010 Cloudera Inc. All rights reserved   26
Q&A

Learn about upcoming events: www.cloudera.com/events

DBTA Webinar: Thursday, December 9th, 11am PT / 1pm ET
New Solutions for the Data Intensive Enterprise
Register at www.cloudera.com/events

Thank you for attending.



                Copyright 2010 Cloudera Inc. All rights reserved   27

Contenu connexe

Tendances

Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
sql on hadoop
sql on hadoop sql on hadoop
sql on hadoop Jianwei Li
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revJason Shih
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environmentsDataWorks Summit
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionCloudera, Inc.
 
大数据数据安全
大数据数据安全大数据数据安全
大数据数据安全Jianwei Li
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessCloudera, Inc.
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinarCloudera, Inc.
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop EcosystemDataWorks Summit
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastuctureDataWorks Summit
 
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016Cloudera Japan
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: OverviewCloudera, Inc.
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedHadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedCloudera, Inc.
 

Tendances (20)

Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
sql on hadoop
sql on hadoop sql on hadoop
sql on hadoop
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
 
大数据数据安全
大数据数据安全大数据数据安全
大数据数据安全
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
 
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons LearnedHadoop World 2010: Productionizing Hadoop: Lessons Learned
Hadoop World 2010: Productionizing Hadoop: Lessons Learned
 

En vedette

Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityClouderaUserGroups
 
What the Enterprise Requires - Usability
What the Enterprise Requires - UsabilityWhat the Enterprise Requires - Usability
What the Enterprise Requires - UsabilityCloudera, Inc.
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIClouderaUserGroups
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Hadoop trong triển khai Big Data
Hadoop trong triển khai Big DataHadoop trong triển khai Big Data
Hadoop trong triển khai Big DataNguyễn Duy Nhân
 

En vedette (7)

Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
 
What the Enterprise Requires - Usability
What the Enterprise Requires - UsabilityWhat the Enterprise Requires - Usability
What the Enterprise Requires - Usability
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via API
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Inside Flume
Inside FlumeInside Flume
Inside Flume
 
Hadoop trong triển khai Big Data
Hadoop trong triển khai Big DataHadoop trong triển khai Big Data
Hadoop trong triển khai Big Data
 

Similaire à Production-izing Hadoop Lessons Learned

Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Cloudera, Inc.
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems WebinarCloudera, Inc.
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valleymarkgrover
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Apekshit Sharma
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCloudIDSummit
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 

Similaire à Production-izing Hadoop Lessons Learned (20)

Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar10 Common Hadoop-able Problems Webinar
10 Common Hadoop-able Problems Webinar
 
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon ValleyIntro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Dernier

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Dernier (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Production-izing Hadoop Lessons Learned

  • 1. Welcome to Production-izing Hadoop: Lessons Learned Audio/Telephone: +1 916 233 3087 Access Code: 616-465-108 Audio PIN: Shown after joining the Webinar
  • 2. Housekeeping • Ask questions at any time using the questions panel • Problems? Use the chat panel • Book drawing - winner announced at the end • Slides and recording will be available Copyright 2010 Cloudera Inc. All rights reserved 2
  • 3. Poll What is your interest in Hadoop? • Just learning about it • I have a problem I think Hadoop can solve • Using Hadoop in our labs • Using Hadoop in production Copyright 2010 Cloudera Inc. All rights reserved 3
  • 4. Speaker: Eric Sammer Eric is a Solution Architect and Training Instructor for Cloudera. He has worked with dozens of customers in a variety of industries including Cloudera's largest Hadoop deployments. His experience ranges from clusters of a few nodes to clusters with hundreds of nodes with complex multi-tenant user environments. Prior to joining Cloudera, he held roles including System Architect, Director of Technical Operations, and Tech Lead at various New York City startups focusing on distributed data collection, processing, and reporting systems. Eric has over 12 years in development and technical operations and has contributed to various open source projects such as Gentoo Linux. twitter: @esammer, @cloudera Copyright 2010 Cloudera Inc. All rights reserved 4
  • 5. Starting Out (You) “Let’s build a Hadoop cluster!” http://www.iccs.inf.ed.ac.uk/~miles/code.html Copyright 2010 Cloudera Inc. All rights reserved 5
  • 6. Starting Out (You) http://www.iccs.inf.ed.ac.uk/~miles/code.html Copyright 2010 Cloudera Inc. All rights reserved 6
  • 7. Where you want to be (You) Yahoo! Hadoop Cluster (2007) Copyright 2010 Cloudera Inc. All rights reserved 7
  • 8. What is Hadoop? • A scalable fault-tolerant distributed system for data storage and processing (open source under the Apache license) • Core Hadoop has two main components • Hadoop Distributed File System (HDFS): self-healing high-bandwidth clustered storage • MapReduce: fault-tolerant distributed processing • Key value • Flexible -> store data without a schema and add it later as needed • Affordable -> cost / TB at a fraction of traditional options • Broadly adopted -> a large and active ecosystem • Proven at scale -> dozens of petabyte + implementations in production today Copyright 2010 Cloudera Inc. All Rights Reserved. 8
  • 9. Cloudera’s Distribution for Hadoop, Version 3 The Industry’s Leading Hadoop Distribution Hue Hue SDK Oozie Oozie Hive Pig/ Hive Flume, Sqoop HBase Zookeeper • Open source – 100% Apache licensed • Simplified – Component versions & dependencies managed for you • Integrated – All components & functions interoperate through standard API’s • Reliable – Patched with fixes from future releases to improve stability • Supported – Employs project founders and committers for >70% of components Copyright 2010 Cloudera Inc. All Rights Reserved. 9
  • 10. Overview • Proper planning • Data Ingestion • ETL and Data Processing Infrastructure • Authentication, Authorization, and Sharing • Monitoring Copyright 2010 Cloudera Inc. All rights reserved 10
  • 11. The production data platform • Data storage • ETL / data processing / analysis infrastructure • Data ingestion infrastructure • Integration with tools • Data security and access control • Health and performance monitoring Copyright 2010 Cloudera Inc. All rights reserved 11
  • 12. Proper planning • Know your use cases! • Log transformation, aggregation • Text mining, IR • Analytics • Machine learning • Critical to proper configuration • Hadoop • Network • OS • Resource utilization, deep job insight will tell you more Copyright 2010 Cloudera Inc. All rights reserved 12
  • 13. HDFS Concerns • Name node availability • HA is tricky • Consider where Hadoop lives in the system • Manual recovery can be simple, fast, effective • Backup Strategy • Name node metadata – hourly, ~2 day retention • User data • Log shipping style strategies • DistCp • “Fan out” to multiple clusters on ingestion Copyright 2010 Cloudera Inc. All rights reserved 13
  • 14. Data Ingestion • Many data sources • Streaming data sources (log files, mostly) • RDBMS • EDW • Files (usually exports from 3rd party) • Common place we see DIY • You probably shouldn’t • Sqoop, Flume, Oozie (but I’m biased) • No matter what - fault tolerant, performant, monitored Copyright 2010 Cloudera Inc. All rights reserved 14
  • 15. ETL and Data Processing • Non-interactive jobs • Establish a common directory structure for processes • Need tools to handle complex chains of jobs • Workflow tools support • Job dependencies, error handling • Tracking • Invocation based on time or events • Most common mistake: depending on jobs always completing successfully or within a window of time. • Monitor for SLA rather than pray • Defensive coding practices apply just as they do everywhere else! Copyright 2010 Cloudera Inc. All rights reserved 15
  • 16. Metadata Management • Tool independent metadata about… • Data sets we know about and their location (on HDFS) • Schemata • Authorization (currently HDFS permissions only) • Partitioning • Format and compression • Guarantees (consistency, timeliness, permits duplicates) • Currently still DIY in many ways, tool-dependent • Most people rely on prayer and hard coding • (H)OWL is interesting Copyright 2010 Cloudera Inc. All rights reserved 16
  • 17. Authentication and authorization • Authentication • Don’t talk to strangers • Should integrate with existing IT infrastructure • Yahoo! security (Kerberos) patches now part of CDH3b3 • Authorization • Not everyone can access everything • Ex. Production data sets are read-only to quants / analysts. Analysts have home or group directories for derived data sets. • Mostly enforced via HDFS permissions; directory structure and organization is critical • Not as fine grained as column level access in EDW, RDBMS • HUE as a gateway to the cluster Copyright 2010 Cloudera Inc. All rights reserved 17
  • 18. Resource Sharing • Prefer one large cluster to many small clusters (unless maybe you’re Facebook) • “Stop hogging the cluster!” • Cluster resources • Disk space (HDFS size quotas) • Number of files (HDFS file count quotas) • Simultaneous jobs • Tasks – guaranteed capacity, full utilization, SLA enforcement • Monitor and track resource utilization across all groups Copyright 2010 Cloudera Inc. All rights reserved 18
  • 19. Monitoring • Critical for keeping things running • Cluster health • Duh. • Traditional monitoring tools: Nagios, Hyperic, Zenoss • Host checks, service checks • When to alert? It’s tricky. • Cluster performance • Overall utilization in aggregate • 30,000ft view of utilization and performance; macro level Copyright 2010 Cloudera Inc. All rights reserved 19
  • 20. Monitoring • Hadoop aware cluster monitoring • Traditional tools don’t cut it; Hadoop monitoring is inherently Hadoop specific • Analogous to RDBMS monitoring tools • Job level “monitoring” • More like analysis • “What resources does this job use?” • “How does this run compare to last run?” • “How can I make this run faster, more resource efficient?” • Two views we care about • Job perspective • Resource perspective (task slots, scheduler pool) Copyright 2010 Cloudera Inc. All rights reserved 20
  • 21. Wrapping it up • Hadoop proper is awesome, but is only part of the picture • Much of Professional Services time is filling in the blanks • There’s still a way to go • Metadata management • Operational tools and support • Improvements to Hadoop core to improve stability, security, manageability • Adoption and feedback drive progress • CDH provides the infrastructure for a complete system Copyright 2010 Cloudera Inc. All rights reserved 21
  • 22. Cloudera Makes Hadoop Safe For the Enterprise Software Services Training Copyright 2010 Cloudera Inc. All Rights Reserved. 22
  • 23. Cloudera Enterprise Enterprise Support and Management Tools • Increases reliability and consistency of the Hadoop platform • Improves Hadoop’s conformance to important IT policies and procedures • Lowers the cost of management and administration Copyright 2010 Cloudera Inc. All Rights Reserved. 23
  • 24. References / Resources • Cloudera documentation - http://docs.cloudera.com • Cloudera Groups – http://groups.cloudera.org • Cloudera JIRA – http://issues.cloudera.org • Hadoop the Definitive Guide • esammer@cloudera.com • irc.freenode.net #cloudera, #hadoop • @esammer Copyright 2010 Cloudera Inc. All rights reserved 24
  • 25. Poll What other topics would you be most interested in hearing about? • More case studies of enterprises using Hadoop • Technical "How to" sessions • Industry specific applications of Hadoop • Technical overviews of Hadoop and related components Copyright 2010 Cloudera Inc. All rights reserved 25
  • 26. Winner of the drawing is… Copyright 2010 Cloudera Inc. All rights reserved 26
  • 27. Q&A Learn about upcoming events: www.cloudera.com/events DBTA Webinar: Thursday, December 9th, 11am PT / 1pm ET New Solutions for the Data Intensive Enterprise Register at www.cloudera.com/events Thank you for attending. Copyright 2010 Cloudera Inc. All rights reserved 27