SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Apache Hadoop 0.22
and Other Versions
Konstantin V Shvachko
Principal Hadoop Architect, eBay
IBM Karmasphere Twitter
February – March, 2012
eBay Inc. confidential
Apache Hadoop Ecosystem
• Hadoop Core
– Common – communication and user facing APIs
– HDFS – distributed file system
– MapReduce – distributed computation framework
• Pig – dataflow language
• Hive – data warehouse, SQL
• Zookeeper – distributed coordination service
• HBase – columnar store
• Oozie – complex job workflow
• eBay Specific
– Cascading
– Lzo compression
2
eBay Inc. confidential
Hadoop Versioning
• Straight line from 0.1 to 0.20
• Fanned out starting from 0.20.2
• Multiple distributions in 2010 based on 0.20
– Apache, Y, CDH, FB
– More today
• Focus on Apache Releases
– Release 0.20.2 2010-02-16
– Release 0.21.0 2010-08-13
– Release 0.20.203.0 2011-05-11 Security Stable
– Release 0.20.204.0 2011-09-05 Improvements
– Release 0.20.205.0 2011-10-17 HBase support
• Genealogy of elephants
3
eBay Inc. confidential4
eBay Inc. confidential
Major Branches
• Hadoop 1.0.0 (security branch) 2011-12-27
– Rename of 0.20.205
– Beta
• Hadoop 0.22.0 2011-12-10
– Continuation of 0.21.0
– Beta
• Hadoop 0.23.0 2011-11-11
– Fedaration – static partitioning of HDFS namespace
– Yarn – new implementation of MapReduce
– Scalability
– Alpha
• 2011 – record number of major releases!
• No unifying release, containing all the good features
5
eBay Inc. confidential
Hadoop 0.22 Branch
• Branched 2010-11-17
• Released 2011-12-10
• Many events in-between
• RM role – started in August 2011
• Stabilization
–Hadoop Platform team, eBay
–Many contributors from the community
6
eBay Inc. confidential
Features HDFS - 0.22
• New implementation of file append
• HBase support with hflush and hsync
• Symbolic links
• BackupNode and CheckpointNode
• DataNodes tolerate single disk failure. Disk-fail-in-place
• File concatenation
• SLive test
• Sticky bit
• Offline Image Viewer
7
eBay Inc. confidential
Features MapReduce - 0.22
• Hierarchical job queues
• Job limits per queue / pool
• Dynamically stop / start job queues
• Andvances in new MapReduce API
– Input/Output formats, ChainMapper / ChainReducer
• TaskTracker blacklisting
• DistributedCache sharing
8
eBay Inc. confidential
Features not Supported in Hadoop 0.22.0
Compared to Hadoop 1.0
• Security
– LinuxTaskController removed MAPREDUCE-2767
• Optimizations (operability) of the MapReduce framework
introduced in the Hadoop 0.20.security line of releases
– Limits on per-job JobConf, Counters, StatusReport, Split-Sizes
– User / queue limits on tasks / jobs in the CapacityScheduler
• Disk-fail-in-place – MapReduce part
• JMX-based metrics v2
• Jetty workaround
• CapacityScheduler should assign multiple tasks per heartbeat
• User's task logs filling up local disks on the TaskTrackers
• FairScheduler back-port from trunk
9
eBay Inc. confidential
Not in Hadoop 0.22.0 HDFS Part
• Shortcut a local client reads to a Datanodes files directly
– Important HBase optimization
– Porting is in progress
• WebHDFS: accessing HDFS over HTTP
– New experimental feature, back-ported from trunk
• NameNode startup time
– Handling block reports and missed heartbeats from DataNodes
– The rest is forward ported from 1.0
– More startup improvements in 0.22
10
eBay Inc. confidential
Hadoop 0.23 Features
• HDFS Federation
– Independent NameNodes sharing a common pool of DataNodes
– Cluster is a family of volumes with shared block storage layer
– User sees volumes as isolated file systems
– ViewFS: the client-side mount table
– Federated approach provides a static partitioning of the federated namespace
• Yarn: Scalability for MapReduce framework
– Separation of JobTracker functions
1. Job scheduling and resource allocation:
• Fundamentally centralized
2. Job monitoring and job life-cycle coordination
• Delegate coordination of different jobs to other nodes
– Dynamic partitioning of cluster resources: no fixed slots
• “Apache Hadoop: The scalability update” USENIX ;login: June, 2011
11
eBay Inc. confidential
Append and HBase
• Append means
– Reopening of existing files for appending new data
– Replica synchronization after failure
– Consistent view of file data during writing by different clients
– hflush, hsync – guarantee data delivered to DNs and persisted on NN
• First implementation of append in 0.19 HADOOP-1700
– 0.20-append branch
• Redesign of append in 0.21 HDFS-265
• HBase needs hflush and hsync only
• Hadoop 1.0 - HBase support via hflush, hsync
• Hadoop 0.22 – fully functional append, including HBase support
12
eBay Inc. confidential
BackupNode
• BackupNode a read-only NameNode
– Contains all file system metadata: files and directories
excluding block locations
– Can perform NameNode operations that don’t modify namespace
• BN maintains up-to-date in-memory image of file system namespace
always synchronized with the NameNode state
– NameNode streams journal to BackupNode
• BackupNode can create a checkpoint without downloading
checkpoint and journal files from active NameNode
• Intended to evolve into hot HA HDFS-2064
13
eBay Inc. confidential
Hadoop at eBay
• 2011 started with 532-node 5 PB cluster running CDH2
• EBay 0.20.203-based build (Wilma)
– Hadoop 0.20.203 – latest stable Apache release
• HDFS, MapReduce, Pig, Hive, Cascading, Mobius, lzo
– 500+ users; 2000 jobs per day
• Runs on 1000-node cluster
– 24 PB – capacity, 72 GB RAM / node
• Many smaller clusters
• Stabilization of Hadoop platform based on 0.22
14
eBay Inc. confidential
Testing
• One year of testing by different groups in Hadoop ecosystem
• Extensive testing of append by HBase community
• Fully automated build and certification with BigTop
• Hadoop platform team at eBay
– Extensive stabilization effort starting September
– Most bugs found in 0.22 are also in trunk and 0.23
– All new features tested
– Stress testing
– Reliability testing
• Works with: Pig 0.8, Hive 0.7, custom changes
HBase 0.92, Oozie, open sourced
Zookeeper, Cascading no changes needed
15
eBay Inc. confidential
Testing Tools, Examples
• TeraSort, TestDFSIO, DistCp
• GridMix, Rumen – production job traces
• SLive – adjustable mix of HDFS operations, permanent load
• Upgrade / rollback from 0.20.? and 0.20.203 to 0.22
• Oversubscribed cluster running out of memory
• Loosing racks with running jobs and HBase
– Cluster survived consecutive loss of 4 racks, shrinking to single rack
with HBase still alive and MR jobs completing
• Disk-fail-in-place helps identify bad drives during hardware burn-in
16
eBay Inc. confidential
Benchmarking
• TestDFSIO: 10 GB files (same as 100 GB)
• TeraSort: -5% (scheduler to blame)
• YCSB - same
• Internal eBay applications – same or better
• Lots of tuning: Hadoop, Java, OS, HW
– Gradual improvement of results
17
Throughput
MB/sec
Read Write Append
Hadoop-0.22 100 84 83
0.20 breed 96 66 n/a
eBay Inc. confidential
Good to have for 0.22.1
• Restore Security
• Disk Fail in place for MapReduce
• Optimizations
– Multiple tasks per heartbeat for CapacityScheduler
– CapacityScheduler preemption
• MR job and task limits
• Cluster startup time
• Add HA?
• Merge MR-1.0 into Hadoop 0.22?
18
eBay Inc. confidential
Important
• Works but not 0.20
– Good new features
– Reliability is the first concern
– Performance and missing functionality can be reconstructed
• Community release
– Not distributed / advertized by commercial distributors
– Community involvement important
• Don’t try to upgrade from Hadoop 0.21 to Hadoop 1.0
It’s the other way around
– Go to Hadoop 0.22 instead
• Forward-going release progress
– Stop porting new features, start releasing them
19
eBay Inc. confidential
Thank you
20
Hadoop 0.22 Contributions Accepted

Contenu connexe

Tendances

Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of HadoopKnoldus Inc.
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basicHafizur Rahman
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaDataWorks Summit
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetupvmoorthy
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Ambari Meetup: NameNode HA
Ambari Meetup: NameNode HAAmbari Meetup: NameNode HA
Ambari Meetup: NameNode HAHortonworks
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyDataWorks Summit
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 

Tendances (20)

Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in Alibaba
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetup
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Ambari Meetup: NameNode HA
Ambari Meetup: NameNode HAAmbari Meetup: NameNode HA
Ambari Meetup: NameNode HA
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage EfficiencyHDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 

Similaire à Apache Hadoop 0.22 and Other Versions

Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureRyan Hennig
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012Chris Huang
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete informationbhargavi804095
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbaseRavi Veeramachaneni
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloudgluent.
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 

Similaire à Apache Hadoop 0.22 and Other Versions (20)

Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 
Hbase status quo apache-con europe - nov 2012
Hbase status quo   apache-con europe - nov 2012Hbase status quo   apache-con europe - nov 2012
Hbase status quo apache-con europe - nov 2012
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the CloudSpeed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 

Plus de Konstantin V. Shvachko

HDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemHDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemKonstantin V. Shvachko
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Konstantin V. Shvachko
 
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsKonstantin V. Shvachko
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 

Plus de Konstantin V. Shvachko (6)

HDFS Selective Wire Encryption
HDFS Selective Wire EncryptionHDFS Selective Wire Encryption
HDFS Selective Wire Encryption
 
HDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemHDFS for Geographically Distributed File System
HDFS for Geographically Distributed File System
 
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
 
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 

Dernier

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Dernier (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

Apache Hadoop 0.22 and Other Versions

  • 1. Apache Hadoop 0.22 and Other Versions Konstantin V Shvachko Principal Hadoop Architect, eBay IBM Karmasphere Twitter February – March, 2012
  • 2. eBay Inc. confidential Apache Hadoop Ecosystem • Hadoop Core – Common – communication and user facing APIs – HDFS – distributed file system – MapReduce – distributed computation framework • Pig – dataflow language • Hive – data warehouse, SQL • Zookeeper – distributed coordination service • HBase – columnar store • Oozie – complex job workflow • eBay Specific – Cascading – Lzo compression 2
  • 3. eBay Inc. confidential Hadoop Versioning • Straight line from 0.1 to 0.20 • Fanned out starting from 0.20.2 • Multiple distributions in 2010 based on 0.20 – Apache, Y, CDH, FB – More today • Focus on Apache Releases – Release 0.20.2 2010-02-16 – Release 0.21.0 2010-08-13 – Release 0.20.203.0 2011-05-11 Security Stable – Release 0.20.204.0 2011-09-05 Improvements – Release 0.20.205.0 2011-10-17 HBase support • Genealogy of elephants 3
  • 5. eBay Inc. confidential Major Branches • Hadoop 1.0.0 (security branch) 2011-12-27 – Rename of 0.20.205 – Beta • Hadoop 0.22.0 2011-12-10 – Continuation of 0.21.0 – Beta • Hadoop 0.23.0 2011-11-11 – Fedaration – static partitioning of HDFS namespace – Yarn – new implementation of MapReduce – Scalability – Alpha • 2011 – record number of major releases! • No unifying release, containing all the good features 5
  • 6. eBay Inc. confidential Hadoop 0.22 Branch • Branched 2010-11-17 • Released 2011-12-10 • Many events in-between • RM role – started in August 2011 • Stabilization –Hadoop Platform team, eBay –Many contributors from the community 6
  • 7. eBay Inc. confidential Features HDFS - 0.22 • New implementation of file append • HBase support with hflush and hsync • Symbolic links • BackupNode and CheckpointNode • DataNodes tolerate single disk failure. Disk-fail-in-place • File concatenation • SLive test • Sticky bit • Offline Image Viewer 7
  • 8. eBay Inc. confidential Features MapReduce - 0.22 • Hierarchical job queues • Job limits per queue / pool • Dynamically stop / start job queues • Andvances in new MapReduce API – Input/Output formats, ChainMapper / ChainReducer • TaskTracker blacklisting • DistributedCache sharing 8
  • 9. eBay Inc. confidential Features not Supported in Hadoop 0.22.0 Compared to Hadoop 1.0 • Security – LinuxTaskController removed MAPREDUCE-2767 • Optimizations (operability) of the MapReduce framework introduced in the Hadoop 0.20.security line of releases – Limits on per-job JobConf, Counters, StatusReport, Split-Sizes – User / queue limits on tasks / jobs in the CapacityScheduler • Disk-fail-in-place – MapReduce part • JMX-based metrics v2 • Jetty workaround • CapacityScheduler should assign multiple tasks per heartbeat • User's task logs filling up local disks on the TaskTrackers • FairScheduler back-port from trunk 9
  • 10. eBay Inc. confidential Not in Hadoop 0.22.0 HDFS Part • Shortcut a local client reads to a Datanodes files directly – Important HBase optimization – Porting is in progress • WebHDFS: accessing HDFS over HTTP – New experimental feature, back-ported from trunk • NameNode startup time – Handling block reports and missed heartbeats from DataNodes – The rest is forward ported from 1.0 – More startup improvements in 0.22 10
  • 11. eBay Inc. confidential Hadoop 0.23 Features • HDFS Federation – Independent NameNodes sharing a common pool of DataNodes – Cluster is a family of volumes with shared block storage layer – User sees volumes as isolated file systems – ViewFS: the client-side mount table – Federated approach provides a static partitioning of the federated namespace • Yarn: Scalability for MapReduce framework – Separation of JobTracker functions 1. Job scheduling and resource allocation: • Fundamentally centralized 2. Job monitoring and job life-cycle coordination • Delegate coordination of different jobs to other nodes – Dynamic partitioning of cluster resources: no fixed slots • “Apache Hadoop: The scalability update” USENIX ;login: June, 2011 11
  • 12. eBay Inc. confidential Append and HBase • Append means – Reopening of existing files for appending new data – Replica synchronization after failure – Consistent view of file data during writing by different clients – hflush, hsync – guarantee data delivered to DNs and persisted on NN • First implementation of append in 0.19 HADOOP-1700 – 0.20-append branch • Redesign of append in 0.21 HDFS-265 • HBase needs hflush and hsync only • Hadoop 1.0 - HBase support via hflush, hsync • Hadoop 0.22 – fully functional append, including HBase support 12
  • 13. eBay Inc. confidential BackupNode • BackupNode a read-only NameNode – Contains all file system metadata: files and directories excluding block locations – Can perform NameNode operations that don’t modify namespace • BN maintains up-to-date in-memory image of file system namespace always synchronized with the NameNode state – NameNode streams journal to BackupNode • BackupNode can create a checkpoint without downloading checkpoint and journal files from active NameNode • Intended to evolve into hot HA HDFS-2064 13
  • 14. eBay Inc. confidential Hadoop at eBay • 2011 started with 532-node 5 PB cluster running CDH2 • EBay 0.20.203-based build (Wilma) – Hadoop 0.20.203 – latest stable Apache release • HDFS, MapReduce, Pig, Hive, Cascading, Mobius, lzo – 500+ users; 2000 jobs per day • Runs on 1000-node cluster – 24 PB – capacity, 72 GB RAM / node • Many smaller clusters • Stabilization of Hadoop platform based on 0.22 14
  • 15. eBay Inc. confidential Testing • One year of testing by different groups in Hadoop ecosystem • Extensive testing of append by HBase community • Fully automated build and certification with BigTop • Hadoop platform team at eBay – Extensive stabilization effort starting September – Most bugs found in 0.22 are also in trunk and 0.23 – All new features tested – Stress testing – Reliability testing • Works with: Pig 0.8, Hive 0.7, custom changes HBase 0.92, Oozie, open sourced Zookeeper, Cascading no changes needed 15
  • 16. eBay Inc. confidential Testing Tools, Examples • TeraSort, TestDFSIO, DistCp • GridMix, Rumen – production job traces • SLive – adjustable mix of HDFS operations, permanent load • Upgrade / rollback from 0.20.? and 0.20.203 to 0.22 • Oversubscribed cluster running out of memory • Loosing racks with running jobs and HBase – Cluster survived consecutive loss of 4 racks, shrinking to single rack with HBase still alive and MR jobs completing • Disk-fail-in-place helps identify bad drives during hardware burn-in 16
  • 17. eBay Inc. confidential Benchmarking • TestDFSIO: 10 GB files (same as 100 GB) • TeraSort: -5% (scheduler to blame) • YCSB - same • Internal eBay applications – same or better • Lots of tuning: Hadoop, Java, OS, HW – Gradual improvement of results 17 Throughput MB/sec Read Write Append Hadoop-0.22 100 84 83 0.20 breed 96 66 n/a
  • 18. eBay Inc. confidential Good to have for 0.22.1 • Restore Security • Disk Fail in place for MapReduce • Optimizations – Multiple tasks per heartbeat for CapacityScheduler – CapacityScheduler preemption • MR job and task limits • Cluster startup time • Add HA? • Merge MR-1.0 into Hadoop 0.22? 18
  • 19. eBay Inc. confidential Important • Works but not 0.20 – Good new features – Reliability is the first concern – Performance and missing functionality can be reconstructed • Community release – Not distributed / advertized by commercial distributors – Community involvement important • Don’t try to upgrade from Hadoop 0.21 to Hadoop 1.0 It’s the other way around – Go to Hadoop 0.22 instead • Forward-going release progress – Stop porting new features, start releasing them 19
  • 20. eBay Inc. confidential Thank you 20 Hadoop 0.22 Contributions Accepted