3. Agenda
• Who we are?
• Background of Hadoop and Hadoop at ebay
• What are the challenges
• What we achieved using Hadoop-2
4. Who I am
• Principal Engineer @ ebay
• Apache Hadoop Committer
• Apache Oozie PMC and Committer
• Current
• Leading Hadoop Core Development for
YARN and MapReduce @ ebay
• Past
• Working on Scheduler / Resource
Managers
• Working on Distributed Systems
• Data Pipeline frameworks
Mayank Bansal
5. Who we are
• ebay Hadoop Team
• We are around 40 people developing
and supporting Hadoop
• Thousands of Hadoop Users @ ebay
6. Agenda
• Who we are?
• Background of Hadoop and Hadoop at ebay
• What are the challenges
• What we achieved using Hadoop-2
9. Hadoop-1 Limitations
• Scalability
• Maximum Cluster Size 4-5K nodes
• Maximum concurrent tasks ~40K
• Job Tracker scalability
• Availability
• Failure kills all the jobs
• Hard partition on Maps and Reduce
• Less Cluster utilization
• Lack support for alternate Paradigms
12. Agenda
• Who we are?
• Background of Hadoop and Hadoop at ebay
• What are the challenges
• What we achieved using Hadoop-2
13. Application Master
• Runs on Normal Node Manager machines
• Out Of Memory Errors
• Slow Machines
• Flaky Network
14. Application Master
Nodes Goes Down
• Map Reduce
• Can Build state from Job History Files
• Generic Applications
• Application Time Line/History Server
• YARN-321
• YARN-1530
15. Application Master
• Slow Machines
• Automation/Monitoring
• Flaky Network
• Split Brain problem
• Fixed for Map Reduce
• All the AppMasters have to fix this
16. Application Master
Out Of Memory
• Physical Memory Errors
• yarn.app.mapreduce.am.resource.mb
• yarn.app.mapreduce.am.command-opts
• Virtual Memory Errors
• Default Ratio 2.1, needs to be tweaked
• yarn.nodemanager.vmem-check-enabled
• yarn.nodemanager.vmem-pmem-ratio
17. Binary Compatibility
• Works well
• mapred apis are binary compatible
• mapreduce apis are source compatible
• BUT …
• Only works for 70% Applications
• Why?
• Reflections
• Uber Jars in class path
• MAPREDUCE-5108
19. Log Aggregation
• Loads lot of data in HDFS
• Per Day 5-7 TB of Data
• Default is 30 days we made that to 4 days
• yarn.log-aggregation.retain-seconds
• Lot of load on Namenode
20. User Engagement
• Engage all users for verifying jobs
• Test with Production like data
• Verify all jobs just not the sample jobs
21. Agenda
• Who we are?
• Background of Hadoop and Hadoop at ebay
• What are the challenges
• What we achieved using Hadoop-2