Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Writing Yarn Applications Hadoop Summit 2012

6 438 vues

Publié le

Hadoop YARN is the next generation computing platform in Apache Hadoop with support for programming paradigms besides MapReduce. In the world of Big Data, one cannot solve all the problems wholly using the Map Reduce programming model. Typical installations run separate programming models like MR, MPI, graph-processing frameworks on individual clusters. Running fewer larger clusters is cheaper than running more small clusters. Therefore,_leveraging YARN to allow both MR and non-MR applications to run on top of a common cluster becomes more important from an economical and operational point of view. This talk will cover the different APIs and RPC protocols that are available for developers to implement new application frameworks on top of YARN. We will also go through a simple application which demonstrates how one can implement their own Application Master, schedule requests to the YARN resource-manager and then subsequently use the allocated resources to run user code on the NodeManagers.

Publié dans : Technologie, Formation
  • Dating for everyone is here: ❶❶❶ http://bit.ly/2Qu6Caa ❶❶❶
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/2Qu6Caa ❤❤❤
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Writing Yarn Applications Hadoop Summit 2012

  1. 1. Writing Application Frameworkson Apache Hadoop YARNHitesh Shahhitesh@hortonworks.com© Hortonworks Inc. 2011 Page 1
  2. 2. Hitesh Shah - Background• Member of Technical Staff at Hortonworks Inc.• Committer for Apache MapReduce and Ambari• Earlier, spent 8+ years at Yahoo! building various infrastructure pieces all the way from data storage platforms to high throughput online ad-serving systems. Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2011
  3. 3. Agenda•YARN Architecture and Concepts•Writing a New Framework Architecting the Future of Big Data Page 3 © Hortonworks Inc. 2011
  4. 4. YARN Architecture• Resource Manager –Global resource scheduler –Hierarchical queues• Node Manager –Per-machine agent –Manages the life-cycle of container –Container resource monitoring• Application Master –Per-application –Manages application scheduling and task execution –E.g. MapReduce Application Master Architecting the Future of Big Data Page 4 © Hortonworks Inc. 2011
  5. 5. YARN Architecture Node Manager Container App Mstr Client Resource Node Manager Manager Client App Mstr Container MapReduce Status Node Manager Job Submission Node Status Resource Request Container Container Architecting the Future of Big Data Page 5 © Hortonworks Inc. 2011
  6. 6. YARN Concepts• Application ID –Application Attempt IDs• Container –ContainerLaunchContext• ResourceRequest –Host/Rack/Any match –Priority –Resource constraints• Local Resource –File/Archive –Visibility – public/private/application Architecting the Future of Big Data Page 6 © Hortonworks Inc. 2011
  7. 7. What you need for a new Framework• Application Submission Client –For example, the MR Job Client• Application Master –The core framework library• Application History ( optional ) –History of all previously run instances• Auxiliary Services ( optional ) –Long-running application-specific services running on the NodeManager Architecting the Future of Big Data Page 7 © Hortonworks Inc. 2011
  8. 8. Use Case: Distributed Shell• Take a user-provided script Node or application and run it on a Manager set of nodes in the Cluster DS AppMaster• Input: – User Script to execute – Number of containers to run on Node Manager – Variable arguments for each different container Shell Script – Memory requirements for the shell script Node – Output Location/Dir Manager Shell Script Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2011
  9. 9. Client: RPC calls• Uses ClientRM Protocol ClientRMProtocol#getNewApplication• Get a new Application ID from the RM ClientRMProtocol#submitApplication• Application Submission CLIENT RM ClientRMProtocol#getApplicationReport• Application Monitoring ClientRMProtocol#killApplication• Kill the Application? Architecting the Future of Big Data Page 9 © Hortonworks Inc. 2011
  10. 10. Client• Registration with the RM –New Application ID• Application Submission –User information –Scheduler queue –Define the container for the Distributed Shell App Master via the ContainerLaunchContext• Application Monitoring – AppMaster host details with tokens if needed, tracking url – Application Status (submitted/running/finished) Architecting the Future of Big Data Page 10 © Hortonworks Inc. 2011
  11. 11. Defining a Container• ContainerLaunchContext class –Can run a shell script, a java process or launch a VM• Command(s) to run• Local resources needed for the process to run –Dependent jars, native libs, data files/archives• Environment to setup –Java Classpath• Security-related data –Container Tokens Architecting the Future of Big Data Page 11 © Hortonworks Inc. 2011
  12. 12. Application Master: RPC calls• AMRM and CM protocols Client• Register AM with RM AMRM.registerAM• Ask RM to allocate resources AMRM.allocate AM RM• Launch tasks on allocated containers AMRM. finishAM App-specific• Manage tasks to final RPC completion CM.startContainer• Inform RM of completion NM NM Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2011
  13. 13. Application Master• Setup RPC to handle requests from Client and/or tasks launched on Containers• Register and send regular heartbeats to the RM• Request resources from the RM.• Launch user shell script on containers as and when allocated.• Monitor status of user script of remote containers and manage failures by retrying if needed.• Inform RM of completion when application is done. Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2011
  14. 14. AMRM#allocate• Request: – Containers needed – Not a delta protocol – Locality constraints: Host/Rack/Any – Resource constraints: memory – Priority-based assignments – Containers to release – extra/unwanted? – Only non-launched containers• Response: – Allocated Containers – Launch or release – Completed Containers – Status of completion Architecting the Future of Big Data Page 14 © Hortonworks Inc. 2011
  15. 15. YARN Applications• Data Processing: – OpenMPI on Hadoop – Spark (UC Berkeley) – Shark ( Hive-on-Spark ) – Real-time data processing – Storm ( Twitter ) – Apache S4 – Graph processing – Apache Giraph• Beyond data: – Deploying Apache HBase via YARN (HBASE-4329) – Hbase Co-processors via YARN (HBASE-4047) Architecting the Future of Big Data Page 15 © Hortonworks Inc. 2011
  16. 16. References•Doc on writing new applications: –WritingYarnApplications.html ( available at http://hadoop.apache.org/common/docs/r2.0.0- alpha/ ) Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2011
  17. 17. Questions?Thank You!Hitesh Shahhitesh@hortonworks.com Architecting the Future of Big Data Page 17 © Hortonworks Inc. 2011
  18. 18. Appendix: CodeExamples Architecting the Future of Big Data Page 18 © Hortonworks Inc. 2011
  19. 19. Client: RegistrationClientRMProtocol applicationsManager;YarnConfiguration yarnConf = new YarnConfiguration(conf);InetSocketAddress rmAddress = NetUtils.createSocketAddr( yarnConf.get(YarnConfiguration.RM_ADDRESS));applicationsManager = ((ClientRMProtocol) rpc.getProxy(ClientRMProtocol.class, rmAddress, appsManagerServerConf));GetNewApplicationRequest request = Records.newRecord(GetNewApplicationRequest.class);GetNewApplicationResponse response = applicationsManager.getNewApplication(request); Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2011
  20. 20. Client: App SubmissionApplicationSubmissionContext appContext;ContainerLaunchContext amContainer;amContainer.setLocalResources(Map<String, LocalResource> localResources);amContainer.setEnvironment(Map<String, String> env);String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster " + " arg1 arg2“;amContainer.setCommands(List<String> commands);Resource capability; capability.setMemory(amMemory);amContainer.setResource(capability);appContext.setAMContainerSpec(amContainer);SubmitApplicationRequest appRequest;appRequest.setApplicationSubmissionContext(appContext);applicationsManager.submitApplication(appRequest); Architecting the Future of Big Data Page 20 © Hortonworks Inc. 2011
  21. 21. Client: App Monitoring• Get Application StatusGetApplicationReportRequest reportRequest = Records.newRecord(GetApplicationReportRequest.class);reportRequest.setApplicationId(appId);GetApplicationReportResponse reportResponse = applicationsManager.getApplicationReport(reportRequest);ApplicationReport report = reportResponse.getApplicationReport();• Kill the applicationKillApplicationRequest killRequest = Records.newRecord(KillApplicationRequest.class);killRequest.setApplicationId(appId);applicationsManager.forceKillApplication(killRequest); Architecting the Future of Big Data Page 21 © Hortonworks Inc. 2011
  22. 22. AM: Ask RM for ContainersResourceRequest rsrcRequest;rsrcRequest.setHostName("*”); // hostname, rack, wildcardrsrcRequest.setPriority(pri);Resource capability; capability.setMemory(containerMemory);rsrcRequest.setCapability(capability)rsrcRequest.setNumContainers(numContainers);List<ResourceRequest> requestedContainers;List<ContainerId> releasedContainers;AllocateRequest req;req.setResponseId(rmRequestID);req.addAllAsks(requestedContainers);req.addAllReleases(releasedContainers);req.setProgress(currentProgress);AllocateResponse allocateResponse = resourceManager.allocate(req); Architecting the Future of Big Data Page 22 © Hortonworks Inc. 2011
  23. 23. AM: Launch ContainersAMResponse amResp = allocateResponse.getAMResponse();ContainerManager cm = (ContainerManager)rpc.getProxy (ContainerManager.class, cmAddress, conf);List<Container> allocatedContainers = amResp.getAllocatedContainers();for (Container allocatedContainer : allocatedContainers) { ContainerLaunchContext ctx; ctx.setContainerId(allocatedContainer .getId()); ctx.setResource(allocatedContainer .getResource()); // set env, command, local resources, … StartContainerRequest startReq; startReq.setContainerLaunchContext(ctx); cm.startContainer(startReq);} Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2011
  24. 24. AM: Monitoring Containers• Running ContainersGetContainerStatusRequest statusReq;statusReq.setContainerId(containerId);GetContainerStatusResponse statusResp = cm.getContainerStatus(statusReq);• Completed ContainersAMResponse amResp = allocateResponse.getAMResponse();List<Container> completedContainersStatus = amResp.getCompletedContainerStatuses();for (ContainerStatus containerStatus : completedContainers) { // containerStatus.getContainerId() // containerStatus.getExitStatus() // containerStatus.getDiagnostics()} Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2011
  25. 25. AM: I am doneFinishApplicationMasterRequest finishReq;finishReq.setAppAttemptId(appAttemptID);finishReq.setFinishApplicationStatus (FinalApplicationStatus.SUCCEEDED); // or FAILEDfinishReq.setDiagnostics(diagnostics);resourceManager.finishApplicationMaster(finishReq); Architecting the Future of Big Data Page 25 © Hortonworks Inc. 2011