SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
The Hadoop Community


Jeff Hammerbacher
Manager, Data
May 28 - 29, 2008
Who is Using Hadoop?
▪   Mentioned previously: Yahoo!, Powerset, Quantcast, Last.fm, Autodesk
▪   A9.com
    ▪   Build Amazon’s product search indices
    ▪   Session analytics
▪   FIM
    ▪   Log analysis and machine learning
▪   Wikia Search
    ▪   125 nodes
▪   Veoh, NetSeer, Krugle, Rapleaf, Joost, New York Times
▪   For more, check out http://wiki.apache.org/hadoop/PoweredBy
Hadoop in the ASF
▪   Began life as subproject of Lucene
▪   Now a top level project: http://hadoop.apache.org
    ▪   HBase now a subproject of Hadoop
    ▪   Pig and Mahout are related projects
    ▪   Zookeeper and virtual cluster management project in Apache soon
▪   Apache Software Foundation provides organizational and legal support
    ▪   Membership in ASF is by invitation only
    ▪   ASF Members elect a Board of Directors
    ▪   Each top level project has a Project Management Committee (PMC)
    ▪   Each PMC has a VP who is an officer of the ASF appointed by the Board
    ▪   The VP of the Hadoop PMC is Owen O’Malley of Yahoo
The Hadoop PMC
                   Name                       Organization
Andrzej Bialecki             Getopt

Doug Cutting                 Yahoo!

Dhruba Borthakur             Facebook

Enis Soztutar                Agmlab

Jim Kellerman                Powerset

Nigel Daley                  Yahoo!

Owen O’Malley                Yahoo!

Michael Stack                Powerset

Christophe Taton             INRIA

Tom White                    Independent Consultant
Apache Infrastructure for Hadoop
▪   Web site: Apache Forrest
▪   Wiki: MoinMoin
▪   Version Control: Subversion
▪   API Documentation: JavaDoc
▪   Bug Tracking: JIRA
▪   Continuous Build Server: Hudson
▪   IRC Channel: #hadoop on irc.freenode.org
▪   And of course, the mailing lists:
    ▪   core-user@hadoop.apache.org
    ▪   core-dev@hadoop.apache.org
Contributing to Hadoop
▪   Get comfortable with available documentation on the website
▪   Read through the wiki
▪   Browse the mailing list archives
▪   Dig into the JIRA!
    ▪   Open source bug tracking software from Atlassian
    ▪   “Issues”: Bugs, feature requests, documentation requests
    ▪   Issues categorized by “component” and “version”
    ▪   “Workflow”: Issue as FSM; each state is a “status”
▪   http://www.atlassian.com/software/jira/docs/latest/introduction.html
Contributing to Hadoop
More About JIRA
▪   Every Issue has a unique, numbered Key
    ▪   Type
    ▪   Status
    ▪   Priority
    ▪   Assignee
    ▪   Reporter
    ▪   Votes
    ▪   Watchers
Contributing to Hadoop
More About Ticket Classification
▪   Status: Open, In Progress, Reopened, Resolved, Closed, or Patch Available
▪   Priorities: Blocker, Critical, Major, Minor, Trivial
▪   Type: Bug, Improvement, New Feature, Task
▪   Voting on an issue means you actively want to see it fixed
▪   Watching an issue means you can passively track progress
Contributing to Hadoop
More About JIRA
▪   Title, Created Time, Updated Time, Component, Affect/Fix Version, Links/
    Sub-Tasks, Description, Comments
Contributing to Hadoop
Filters and the Issue Navigator
▪   You can view related Issues via the Issue Navigator
    ▪   “Filter” determines what is shown in the Navigator
    ▪   Common Filters on the right-hand side of main login
        ▪   Outstanding, Assigned to Me, Reported by Me, Resolved Recently, Added
            Recently, Updated Recently, Most Important
        ▪   “Most Important” Filter just sorts by Issue Priority
        ▪   I’d recommend the “... Recently” and “Most Important” Filters first
    ▪   Can also click “Find Issues” on top nav to build your own Filters
The JIRA Issue Navigator
The JIRA Issue Navigator
    Creating a New Filter
Contributing to Hadoop
JIRA Reports and Release Notes
▪   Reports add a visualization component to Filters
    ▪   Most can be applied to any saved filter
    ▪   Some Reports have a chart configured
▪   Common Reports:
    ▪   Road Map
    ▪   Open Issues
    ▪   Popular Issues (based on number of Votes)
▪   To keep up with what’s new, the Release Notes are quite useful
Future Directions for Hadoop
HDFS
▪   For 0.18
    ▪   HADOOP-1700: Append to Files in HDFS
        ▪   Numerous blocking issues; hope to have code freeze by early June
        ▪   8 Voters, 21 Watchers
    ▪   HADOOP-3022: Fast Cluster Restart
    ▪   HADOOP-1702: Reduce buffer copies when data is written to DFS
    ▪   HADOOP-3164: Use FileChannel.transferTo() when data is read from DN
    ▪   HADOOP-3058: Hadoop DFS to report more replication metrics
    ▪   HADOOP-3246: FTP client over HDFS
Future Directions for Hadoop
HDFS
▪   Scalability
    ▪   Separate DFS into multiple volumes and have a NN per volume
    ▪   Manage volume metadata in Zookeeper
▪   Availability
    ▪   Mirroring
    ▪   Have Zookeeper manage metadata
▪   Backup and Recovery
    ▪   Synchronized global snapshot via ZFS or LVM
▪   http://wiki.apache.org/hadoop/HdfsFutures
Future Directions for Hadoop
MapReduce
▪   For 0.18
    ▪   HADOOP-544: Replace the job, tip and task ids with objects
    ▪   HADOOP-3245: Provide ability to persist running jobs
    ▪   HADOOP-3130: Shuffling takes too long to get the last map output
    ▪   HADOOP-3221: Need a quot;LineBasedTextInputFormatquot;
    ▪   HADOOP-3149: Supporting multiple outputs for M/R jobs
    ▪   HADOOP-2182: Input Split details for maps should be logged
    ▪   HADOOP-3226: Run combiner when merging spills from map output
    ▪   HADOOP-3227: Implement a binary input/output format for Streaming
Future Directions for Hadoop
MapReduce
▪   Scheduling
    ▪   Factor job and task scheduling out of code to allow for testing
        different policies (HADOOP-3412)
    ▪   Augment JobTracker to be a resource manager and job scheduler
▪   Speculative Execution Policies
    ▪   Separate logic for Mapper and Reducer
    ▪   Break Reducer into more granular tasks
▪   Allow for execution across many different data sources
    ▪   for example, MySQL
Future Directions for Hadoop
Other Interesting Tickets
▪   HADOOP-4: Tool to mount dfs on linux
▪   HADOOP-249: Improving Map -> Reduce performance and Task JVM reuse
▪   HADOOP-2510: Map-Reduce 2.0
▪   HADOOP-2864: Improve the Scalability and Robustness of IPC
▪   HADOOP-2884: Refactor Hadoop package structure and source tree
▪   HADOOP-3366: Shuffle/Merge improvements
▪   HADOOP-3421: Requirements for a Resource Manager for Hadoop
▪   HADOOP-3444: Implementing a Resource Manager (V1) for Hadoop
Contributing to Hadoop
Patch Submission
▪   http://wiki.apache.org/hadoop/HowToContribute
▪   Basically run “svn diff” on your checkout of trunk and write output to a
    “.patch” file, then attach it to the issue
▪   Hudson will pick up patch and apply to trunk
▪   Make sure to have tests and JavaDoc comments
▪   Performance regressions tested via DFSIO and GridMix benchmarks
Contributing to Hadoop
Project Ideas
▪   http://wiki.apache.org/hadoop/ProjectSuggestions
    ▪   Testing, Tools, and Research
▪   Security
▪   Tools
    ▪   Performance monitoring and benchmarking
    ▪   Anomaly detection
    ▪   General system management
Contributing to Hadoop
Project Ideas continued
▪   Performance
    ▪   Speculative execution policies
    ▪   Resource-aware task scheduling (instead of slot-based)
    ▪   Better failure detection algorithms
▪   Linear Algebra, Statistics, and Machine Learning
    ▪   SAS/R for massive data sets
    ▪   Vector and Matrix algebra libraries
    ▪   Common statistical functions: point estimation, hypothesis testing
    ▪   Model training and validation libraries
(c) 2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

Contenu connexe

Tendances

Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1Sperasoft
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Edureka!
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Olaf Hartig
 
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesMithun Radhakrishnan
 
Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive	Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive Alex Silva
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauCodemotion
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!Edureka!
 
Spring for Apache Hadoop
Spring for Apache HadoopSpring for Apache Hadoop
Spring for Apache Hadoopzenyk
 
Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and ClouderaJoey Echeverria
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera, Inc.
 
Moving Data Between Exadata and Hadoop
Moving Data Between Exadata and HadoopMoving Data Between Exadata and Hadoop
Moving Data Between Exadata and HadoopEnkitec
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorEdureka!
 
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleDebugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleData Con LA
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map ReduceEdureka!
 

Tendances (20)

Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
 
Pivotal hawq internals
Pivotal hawq internalsPivotal hawq internals
Pivotal hawq internals
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)Querying Linked Data with SPARQL (2010)
Querying Linked Data with SPARQL (2010)
 
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
 
Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive	Data Engineering with Spring, Hadoop and Hive
Data Engineering with Spring, Hadoop and Hive
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin Leau
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Spring for Apache Hadoop
Spring for Apache HadoopSpring for Apache Hadoop
Spring for Apache Hadoop
 
Hadoop 24/7
Hadoop 24/7Hadoop 24/7
Hadoop 24/7
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and Cloudera
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for HadoopCloudera Impala: A Modern SQL Engine for Hadoop
Cloudera Impala: A Modern SQL Engine for Hadoop
 
Moving Data Between Exadata and Hadoop
Moving Data Between Exadata and HadoopMoving Data Between Exadata and Hadoop
Moving Data Between Exadata and Hadoop
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop Administrator
 
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleDebugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 

En vedette

Open Source
Open SourceOpen Source
Open Sourcedcxv cvc
 
Noble Digital Library Presentation
Noble Digital Library PresentationNoble Digital Library Presentation
Noble Digital Library Presentationbethj_willis
 
Padrenuestro06 Siempreasi
Padrenuestro06 SiempreasiPadrenuestro06 Siempreasi
Padrenuestro06 Siempreasiguest463dbf
 
Revista Travelsmart Junjul2008
Revista Travelsmart Junjul2008Revista Travelsmart Junjul2008
Revista Travelsmart Junjul2008HOTELESONLINE
 

En vedette (8)

Open Source
Open SourceOpen Source
Open Source
 
Noble Digital Library Presentation
Noble Digital Library PresentationNoble Digital Library Presentation
Noble Digital Library Presentation
 
040407 Sales Meeting
040407 Sales Meeting040407 Sales Meeting
040407 Sales Meeting
 
Padrenuestro06 Siempreasi
Padrenuestro06 SiempreasiPadrenuestro06 Siempreasi
Padrenuestro06 Siempreasi
 
Barcelona....
Barcelona....Barcelona....
Barcelona....
 
Revista Travelsmart Junjul2008
Revista Travelsmart Junjul2008Revista Travelsmart Junjul2008
Revista Travelsmart Junjul2008
 
Cnie
CnieCnie
Cnie
 
011508 Sales Meeting
011508 Sales Meeting011508 Sales Meeting
011508 Sales Meeting
 

Similaire à 20080529dublinpt1

Big Data Lesson 2 Jean-Antoine Moreau
Big Data Lesson 2 Jean-Antoine MoreauBig Data Lesson 2 Jean-Antoine Moreau
Big Data Lesson 2 Jean-Antoine MoreauJean-Antoine Moreau
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimePlamen Jeliazkov
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingYahoo Developer Network
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Edureka!
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 
Big data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentBig data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentTraining Institute
 
Rock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsRock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsPablo Godel
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applicationsdzhou
 
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces securityZeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces securityJakub Kałużny
 

Similaire à 20080529dublinpt1 (20)

20080528dublinpt2
20080528dublinpt220080528dublinpt2
20080528dublinpt2
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Big Data Lesson 2 Jean-Antoine Moreau
Big Data Lesson 2 Jean-Antoine MoreauBig Data Lesson 2 Jean-Antoine Moreau
Big Data Lesson 2 Jean-Antoine Moreau
 
20080611accel
20080611accel20080611accel
20080611accel
 
NameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real TimeNameNode Analytics - Querying HDFS Namespace in Real Time
NameNode Analytics - Querying HDFS Namespace in Real Time
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
20081022cca
20081022cca20081022cca
20081022cca
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Hadoop bootcamp getting started
Hadoop bootcamp getting startedHadoop bootcamp getting started
Hadoop bootcamp getting started
 
Big data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentBig data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-content
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Rock Solid Deployment of Web Applications
Rock Solid Deployment of Web ApplicationsRock Solid Deployment of Web Applications
Rock Solid Deployment of Web Applications
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces securityZeronights 2015 - Big problems with big data - Hadoop interfaces security
Zeronights 2015 - Big problems with big data - Hadoop interfaces security
 
Hadoop pycon2011uk
Hadoop pycon2011ukHadoop pycon2011uk
Hadoop pycon2011uk
 

Plus de Jeff Hammerbacher (20)

20120223keystone
20120223keystone20120223keystone
20120223keystone
 
20100714accel
20100714accel20100714accel
20100714accel
 
20100608sigmod
20100608sigmod20100608sigmod
20100608sigmod
 
20100513brown
20100513brown20100513brown
20100513brown
 
20100423sage
20100423sage20100423sage
20100423sage
 
20100418sos
20100418sos20100418sos
20100418sos
 
20100301icde
20100301icde20100301icde
20100301icde
 
20100201hplabs
20100201hplabs20100201hplabs
20100201hplabs
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091110startup2startup
20091110startup2startup20091110startup2startup
20091110startup2startup
 
20091030nasajpl
20091030nasajpl20091030nasajpl
20091030nasajpl
 
20091027genentech
20091027genentech20091027genentech
20091027genentech
 
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
 
20090622 Velocity
20090622 Velocity20090622 Velocity
20090622 Velocity
 
20090422 Www
20090422 Www20090422 Www
20090422 Www
 
20090309berkeley
20090309berkeley20090309berkeley
20090309berkeley
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
20081009nychive
20081009nychive20081009nychive
20081009nychive
 

Dernier

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

20080529dublinpt1

  • 1.
  • 2. The Hadoop Community Jeff Hammerbacher Manager, Data May 28 - 29, 2008
  • 3. Who is Using Hadoop? ▪ Mentioned previously: Yahoo!, Powerset, Quantcast, Last.fm, Autodesk ▪ A9.com ▪ Build Amazon’s product search indices ▪ Session analytics ▪ FIM ▪ Log analysis and machine learning ▪ Wikia Search ▪ 125 nodes ▪ Veoh, NetSeer, Krugle, Rapleaf, Joost, New York Times ▪ For more, check out http://wiki.apache.org/hadoop/PoweredBy
  • 4. Hadoop in the ASF ▪ Began life as subproject of Lucene ▪ Now a top level project: http://hadoop.apache.org ▪ HBase now a subproject of Hadoop ▪ Pig and Mahout are related projects ▪ Zookeeper and virtual cluster management project in Apache soon ▪ Apache Software Foundation provides organizational and legal support ▪ Membership in ASF is by invitation only ▪ ASF Members elect a Board of Directors ▪ Each top level project has a Project Management Committee (PMC) ▪ Each PMC has a VP who is an officer of the ASF appointed by the Board ▪ The VP of the Hadoop PMC is Owen O’Malley of Yahoo
  • 5. The Hadoop PMC Name Organization Andrzej Bialecki Getopt Doug Cutting Yahoo! Dhruba Borthakur Facebook Enis Soztutar Agmlab Jim Kellerman Powerset Nigel Daley Yahoo! Owen O’Malley Yahoo! Michael Stack Powerset Christophe Taton INRIA Tom White Independent Consultant
  • 6. Apache Infrastructure for Hadoop ▪ Web site: Apache Forrest ▪ Wiki: MoinMoin ▪ Version Control: Subversion ▪ API Documentation: JavaDoc ▪ Bug Tracking: JIRA ▪ Continuous Build Server: Hudson ▪ IRC Channel: #hadoop on irc.freenode.org ▪ And of course, the mailing lists: ▪ core-user@hadoop.apache.org ▪ core-dev@hadoop.apache.org
  • 7. Contributing to Hadoop ▪ Get comfortable with available documentation on the website ▪ Read through the wiki ▪ Browse the mailing list archives ▪ Dig into the JIRA! ▪ Open source bug tracking software from Atlassian ▪ “Issues”: Bugs, feature requests, documentation requests ▪ Issues categorized by “component” and “version” ▪ “Workflow”: Issue as FSM; each state is a “status” ▪ http://www.atlassian.com/software/jira/docs/latest/introduction.html
  • 8. Contributing to Hadoop More About JIRA ▪ Every Issue has a unique, numbered Key ▪ Type ▪ Status ▪ Priority ▪ Assignee ▪ Reporter ▪ Votes ▪ Watchers
  • 9. Contributing to Hadoop More About Ticket Classification ▪ Status: Open, In Progress, Reopened, Resolved, Closed, or Patch Available ▪ Priorities: Blocker, Critical, Major, Minor, Trivial ▪ Type: Bug, Improvement, New Feature, Task ▪ Voting on an issue means you actively want to see it fixed ▪ Watching an issue means you can passively track progress
  • 10. Contributing to Hadoop More About JIRA ▪ Title, Created Time, Updated Time, Component, Affect/Fix Version, Links/ Sub-Tasks, Description, Comments
  • 11. Contributing to Hadoop Filters and the Issue Navigator ▪ You can view related Issues via the Issue Navigator ▪ “Filter” determines what is shown in the Navigator ▪ Common Filters on the right-hand side of main login ▪ Outstanding, Assigned to Me, Reported by Me, Resolved Recently, Added Recently, Updated Recently, Most Important ▪ “Most Important” Filter just sorts by Issue Priority ▪ I’d recommend the “... Recently” and “Most Important” Filters first ▪ Can also click “Find Issues” on top nav to build your own Filters
  • 12. The JIRA Issue Navigator
  • 13. The JIRA Issue Navigator Creating a New Filter
  • 14. Contributing to Hadoop JIRA Reports and Release Notes ▪ Reports add a visualization component to Filters ▪ Most can be applied to any saved filter ▪ Some Reports have a chart configured ▪ Common Reports: ▪ Road Map ▪ Open Issues ▪ Popular Issues (based on number of Votes) ▪ To keep up with what’s new, the Release Notes are quite useful
  • 15. Future Directions for Hadoop HDFS ▪ For 0.18 ▪ HADOOP-1700: Append to Files in HDFS ▪ Numerous blocking issues; hope to have code freeze by early June ▪ 8 Voters, 21 Watchers ▪ HADOOP-3022: Fast Cluster Restart ▪ HADOOP-1702: Reduce buffer copies when data is written to DFS ▪ HADOOP-3164: Use FileChannel.transferTo() when data is read from DN ▪ HADOOP-3058: Hadoop DFS to report more replication metrics ▪ HADOOP-3246: FTP client over HDFS
  • 16. Future Directions for Hadoop HDFS ▪ Scalability ▪ Separate DFS into multiple volumes and have a NN per volume ▪ Manage volume metadata in Zookeeper ▪ Availability ▪ Mirroring ▪ Have Zookeeper manage metadata ▪ Backup and Recovery ▪ Synchronized global snapshot via ZFS or LVM ▪ http://wiki.apache.org/hadoop/HdfsFutures
  • 17. Future Directions for Hadoop MapReduce ▪ For 0.18 ▪ HADOOP-544: Replace the job, tip and task ids with objects ▪ HADOOP-3245: Provide ability to persist running jobs ▪ HADOOP-3130: Shuffling takes too long to get the last map output ▪ HADOOP-3221: Need a quot;LineBasedTextInputFormatquot; ▪ HADOOP-3149: Supporting multiple outputs for M/R jobs ▪ HADOOP-2182: Input Split details for maps should be logged ▪ HADOOP-3226: Run combiner when merging spills from map output ▪ HADOOP-3227: Implement a binary input/output format for Streaming
  • 18. Future Directions for Hadoop MapReduce ▪ Scheduling ▪ Factor job and task scheduling out of code to allow for testing different policies (HADOOP-3412) ▪ Augment JobTracker to be a resource manager and job scheduler ▪ Speculative Execution Policies ▪ Separate logic for Mapper and Reducer ▪ Break Reducer into more granular tasks ▪ Allow for execution across many different data sources ▪ for example, MySQL
  • 19. Future Directions for Hadoop Other Interesting Tickets ▪ HADOOP-4: Tool to mount dfs on linux ▪ HADOOP-249: Improving Map -> Reduce performance and Task JVM reuse ▪ HADOOP-2510: Map-Reduce 2.0 ▪ HADOOP-2864: Improve the Scalability and Robustness of IPC ▪ HADOOP-2884: Refactor Hadoop package structure and source tree ▪ HADOOP-3366: Shuffle/Merge improvements ▪ HADOOP-3421: Requirements for a Resource Manager for Hadoop ▪ HADOOP-3444: Implementing a Resource Manager (V1) for Hadoop
  • 20. Contributing to Hadoop Patch Submission ▪ http://wiki.apache.org/hadoop/HowToContribute ▪ Basically run “svn diff” on your checkout of trunk and write output to a “.patch” file, then attach it to the issue ▪ Hudson will pick up patch and apply to trunk ▪ Make sure to have tests and JavaDoc comments ▪ Performance regressions tested via DFSIO and GridMix benchmarks
  • 21. Contributing to Hadoop Project Ideas ▪ http://wiki.apache.org/hadoop/ProjectSuggestions ▪ Testing, Tools, and Research ▪ Security ▪ Tools ▪ Performance monitoring and benchmarking ▪ Anomaly detection ▪ General system management
  • 22. Contributing to Hadoop Project Ideas continued ▪ Performance ▪ Speculative execution policies ▪ Resource-aware task scheduling (instead of slot-based) ▪ Better failure detection algorithms ▪ Linear Algebra, Statistics, and Machine Learning ▪ SAS/R for massive data sets ▪ Vector and Matrix algebra libraries ▪ Common statistical functions: point estimation, hypothesis testing ▪ Model training and validation libraries
  • 23. (c) 2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0