Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
BranchReduceDistributed Branch-and-Bound on YARNJune 14, 2012
About Me           Copyright 2012 Cloudera Inc. All rights reserved   2
Hadoop Distributed Processing Frameworks             Copyright 2012 Cloudera Inc. All rights reserved
Lots of Other Parallel Processing Platforms               Copyright 2012 Cloudera Inc. All rights reserved
Hadoop 2.0: Resource Scheduling with YARN              Copyright 2012 Cloudera Inc. All rights reserved
The Data Deluge and the Cambrian Explosion              Copyright 2012 Cloudera Inc. All rights reserved
Parallel Distributed Processing For Everyone               Copyright 2012 Cloudera Inc. All rights reserved
Building a New Processing Framework on YARN           Copyright 2012 Cloudera Inc. All rights reserved
A Terrifyingly Accurate Paraphrasing of JWZSome people, when confronted with a tediousproblem, say, “I know, I’ll write a ...
On Designing Frameworks             Copyright 2012 Cloudera Inc. All rights reserved
The Example YARN App: Distributed Shell              Copyright 2012 Cloudera Inc. All rights reserved
Do We Need a New Programming Language for      Developing YARN Applications?           Copyright 2012 Cloudera Inc. All ri...
Do We Need a New Programming Language for      Developing YARN Applications?           Copyright 2012 Cloudera Inc. All ri...
Leverage Existing Frameworks • Popular RPC libraries   with support for   multiple languages    • C++, Java, Python • We n...
Kitten: Playing with YARN              Copyright 2012 Cloudera Inc. All rights reserved
Design Pattern: The Unified Application Master                                              • Contains business logic     ...
YARN Lifecycle Management as a Service • Specifically, extensions   of Guava’s Service   interface    • YarnClientService ...
Moving the Configuration Logic Out of Java              Copyright 2012 Cloudera Inc. All rights reserved
Lua as a Configuration Language • Small and Simple    • Looks like a      configuration file    • Functions are there     ...
First Kitten Utility: The cat Function                Copyright 2012 Cloudera Inc. All rights reserved
Second Kitten Utility: The yarn Function               Copyright 2012 Cloudera Inc. All rights reserved
BranchReduceCopyright 2012 Cloudera Inc. All rights reserved
Branch-and-Bound            Copyright 2012 Cloudera Inc. All rights reserved
The Challenge of Parallel Branch and Bound:Unbalanced Search Space                                              • Some bra...
The Solution: Work Stealing              Copyright 2012 Cloudera Inc. All rights reserved
You Write Three Classes• A Task class that implements Writable• A GlobalState class that implements Writable and has a  me...
Example: The Knapsack Problem    Copyright 2012 Cloudera Inc. All rights reserved
0-1 Integer Programming Problems • NP-Hard Resource   Allocation Problem • Portfolio Optimization • Asset Securitization  ...
Problem Formulation: (Simplified) LP Format              Copyright 2012 Cloudera Inc. All rights reserved
Questions?@josh_wills
Prochain SlideShare
Chargement dans…5
×

Hadoop Summit 2012 | BranchReduce: Distributed Branch-and-Bound on YARN

2 678 vues

Publié le

Session Abstract</strong><div></div><div><p>Branch-and-bound is a widely used technique for efficiently searching for solutions to combinatorial optimization problems. In this session, we will introduce BranchReduce, an open-source Java library for performing distributed branch-and-bound on a Hadoop cluster under YARN. Applications only need to write code that is specific to their optimization problem (namely the branching rule, the lower bound computation, and the upper bound computation), and BranchReduce handles deploying the application to the cluster, managing the execution, and periodically rebalancing the search space across the machines. We will give an overview of how BranchReduce works and then walk through an example that solves a scheduling problem with a near-linear speedup over a single machine implementation.

Publié dans : Technologie, Formation
  • Soyez le premier à commenter

Hadoop Summit 2012 | BranchReduce: Distributed Branch-and-Bound on YARN

  1. 1. BranchReduceDistributed Branch-and-Bound on YARNJune 14, 2012
  2. 2. About Me Copyright 2012 Cloudera Inc. All rights reserved 2
  3. 3. Hadoop Distributed Processing Frameworks Copyright 2012 Cloudera Inc. All rights reserved
  4. 4. Lots of Other Parallel Processing Platforms Copyright 2012 Cloudera Inc. All rights reserved
  5. 5. Hadoop 2.0: Resource Scheduling with YARN Copyright 2012 Cloudera Inc. All rights reserved
  6. 6. The Data Deluge and the Cambrian Explosion Copyright 2012 Cloudera Inc. All rights reserved
  7. 7. Parallel Distributed Processing For Everyone Copyright 2012 Cloudera Inc. All rights reserved
  8. 8. Building a New Processing Framework on YARN Copyright 2012 Cloudera Inc. All rights reserved
  9. 9. A Terrifyingly Accurate Paraphrasing of JWZSome people, when confronted with a tediousproblem, say, “I know, I’ll write a framework.”Now they have two tedious problems. Copyright 2012 Cloudera Inc. All rights reserved
  10. 10. On Designing Frameworks Copyright 2012 Cloudera Inc. All rights reserved
  11. 11. The Example YARN App: Distributed Shell Copyright 2012 Cloudera Inc. All rights reserved
  12. 12. Do We Need a New Programming Language for Developing YARN Applications? Copyright 2012 Cloudera Inc. All rights reserved
  13. 13. Do We Need a New Programming Language for Developing YARN Applications? Copyright 2012 Cloudera Inc. All rights reserved
  14. 14. Leverage Existing Frameworks • Popular RPC libraries with support for multiple languages • C++, Java, Python • We need to make it easy to deploy existing applications on YARN Copyright 2012 Cloudera Inc. All rights reserved
  15. 15. Kitten: Playing with YARN Copyright 2012 Cloudera Inc. All rights reserved
  16. 16. Design Pattern: The Unified Application Master • Contains business logic and YARN logic • Primary reason: Communication • Also: dynamic resource allocation • Develop our master/worker applications locally and then deploy them on YARN Copyright 2012 Cloudera Inc. All rights reserved
  17. 17. YARN Lifecycle Management as a Service • Specifically, extensions of Guava’s Service interface • YarnClientService • AppMasterService • Contains all of the logic for creating applications and keeping an eye on them Copyright 2012 Cloudera Inc. All rights reserved
  18. 18. Moving the Configuration Logic Out of Java Copyright 2012 Cloudera Inc. All rights reserved
  19. 19. Lua as a Configuration Language • Small and Simple • Looks like a configuration file • Functions are there when/if you need them • Inheritance • Don’t Repeat Yourself • Forgiving of undefined values • Java/C++ Integration Copyright 2012 Cloudera Inc. All rights reserved
  20. 20. First Kitten Utility: The cat Function Copyright 2012 Cloudera Inc. All rights reserved
  21. 21. Second Kitten Utility: The yarn Function Copyright 2012 Cloudera Inc. All rights reserved
  22. 22. BranchReduceCopyright 2012 Cloudera Inc. All rights reserved
  23. 23. Branch-and-Bound Copyright 2012 Cloudera Inc. All rights reserved
  24. 24. The Challenge of Parallel Branch and Bound:Unbalanced Search Space • Some branches are pruned quickly • Can be difficult to determine the best splits a priori • Easy to revert to a de facto single-threaded search Copyright 2012 Cloudera Inc. All rights reserved
  25. 25. The Solution: Work Stealing Copyright 2012 Cloudera Inc. All rights reserved
  26. 26. You Write Three Classes• A Task class that implements Writable• A GlobalState class that implements Writable and has a mergeWith(GlobalState other) method• A Processor class that defines: • execute(T task, BranchReduceContext<T, GlobalState> ctxt); • With optional initialize and cleanup methods• Configuration is done via BranchReduceJob Copyright 2012 Cloudera Inc. All rights reserved
  27. 27. Example: The Knapsack Problem Copyright 2012 Cloudera Inc. All rights reserved
  28. 28. 0-1 Integer Programming Problems • NP-Hard Resource Allocation Problem • Portfolio Optimization • Asset Securitization Copyright 2012 Cloudera Inc. All rights reserved
  29. 29. Problem Formulation: (Simplified) LP Format Copyright 2012 Cloudera Inc. All rights reserved
  30. 30. Questions?@josh_wills

×