Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

SRV318_Research at PNNL Powered by AWS

778 vues

Publié le

15015 SRV318 Serverless Breakout Session Research at PNNL: Powered by AWS Pacific Northwest National Laboratory's rich data sciences capability has produced novel solutions in numerous research areas including image analysis, statistical modeling, and social media (and many more!). See how PNNL software engineers utilize AWS to enable better collaboration between researchers and engineers, and to power the data processing systems required to facilitate this work, with a focus on Lambda, EC2, S3, Apache Nifi and other technologies. Several approaches will be covered including lessons learned. AWS re:Invent 2017, Amazon, Giardinelli, Serverless, SRV318, EC2 11/28/2017 1:00:00 PM Tue Breakout Session

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

SRV318_Research at PNNL Powered by AWS

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT Research at PNNL: Powered by AWS M i k e G i a r d i n e l l i a n d R a l p h P e r k o P a c i f i c N o r t h w e s t N a t i o n a l L a b o r a t o r y N o v e m b e r 2 8 , 2 0 1 7 Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or Battelle Memorial Institute. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. SRV 318
  2. 2. Senior Software Engineers Mike Giardinelli Ralph Perko
  3. 3. The national laboratory system
  4. 4. PNNL at a glance $920.4 M In R&D expenditures 104 U.S. and foreign patents granted 1,058 Peer-reviewed publications 2 FLC Awards 5 R&D 100 Awards 4,400 Scientists, engineers and non-technical staff
  5. 5. Software engineering at PNNL • Staff focus is research and innovation, not operations • Developers work with scientists to enable research • Limited space and resources for hardware • Big driver for moving to AWS! • Agile is difficult
  6. 6. Problem: isolated research • Who are the researchers • Researchers work independently • Focus on innovation and novel concepts • Lack of collaboration with engineers • Creates long delivery times • Product usually isn’t what the customer has envisioned
  7. 7. Enabling research with AWS • Research is the life blood of the organization • Researchers should not be troubled with environment configurations, optimizations, etc. • Software engineers provide expertise needed to build applied solutions • Utilizing AWS has been a turning point. • AWS has dramatically helped to improve collaboration. • AWS fits better with our Agile software processes As a result, researchers can focus on the problem
  8. 8. Moving to the cloud Our progression to AWS Drivers • Lack of resources internally (hardware and people) • Customer deliverables and demands / deadlines Concerns • Cost • Vendor lock-in Initial Approach • Fork-lift model • Missed out on AWS services • Still had operational headaches Current Approach • Serverless wherever possible
  9. 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Image Classification Pipeline
  10. 10. Overview Goal Enable novel image classification research on live, streaming data First primarily serverless solution
  11. 11. Image retrieval and classification - requirements Requirements • Handle static and streaming media • Scalable, robust, and flexible • Easily deployed and maintained • Extensible (add additional models and instantiations) • Identify optimal ways to collaborate Research and customer requirements
  12. 12. Image retrieval and classification Research and engineering implementation
  13. 13. Image retrieval and classification How research and engineering collaborated on the effort
  14. 14. Image classification—Apache NiFi Why NiFi?
  15. 15. NiFi overview • Process and distribute data • Message / data routing is very flexible and robust • ETL is painless • Easy to install, scale, configure, and extend • Visually see what is going on with your pipelines • Backpressure and queueing are baked into the flows— excellent for systems that have brittle endpoints • Low barrier to entry; broadens user audience Where we find benefit and why we use it
  16. 16. Image cache data flow example
  17. 17. NiFi tuning on AWS • C4 and M4 EC2 instance types work well • Scaling: we go vertical, then horizontal • Keep normal CPU load at 50–60% CPU utilization • Set provenance to Volatile • General purpose SSDs work well • Follow the NiFi “Configuration Best Practices” in the admin guide
  18. 18. Data flow logic {"image":"http://somewebsite.com/puppies.jpeg"}, {"image":"http://somewebsite.com/kittens.jpeg"}, {"image":"http://somewebsite.com/koalas.jpeg"} Filter “We only care about Koala bears!” Create new message and push to Amazon SNS { "url":"http://somewebsite.com/koalas.jpeg", "hash" : "092f6b17d186adb2e121afcdc7e5470b0c6f82a5", "name" : "koalas.jpeg", "type" : "jpeg", "bucket” : “image-classifier-test” } {"image":"http://somewebsite.com/koalas.jpeg"} Read data from Kafka topic
  19. 19. Why SNS? Image classification
  20. 20. AWS SNS { "Records": [ { "EventVersion": "1.0", "EventSubscriptionArn": "arn:aws:sns:EXAMPLE", "EventSource": "aws:sns", "Sns": { "SignatureVersion": "1", "Timestamp": null, "Signature": "EXAMPLE", "SigningCertUrl": "EXAMPLE", "MessageId": "95df01b4-ee98-5cb9-9903-4c221d41eb5e", "Message": "{"hash": "092f6b17d186adb2e121afcdc7e5470b0c6f82a5", "url": "http://somewebsite.com/koalas.jpeg", "bucketname": "image- classifier-test", "name" : "koalas.jpeg", "type" : "jpeg",}",
  21. 21. Why Lambda? Image classification
  22. 22. Why AWS Lambda? Where we went serverless instead of NiFi • Hands down best choice for scaling • Performance, cost, maintenance • Straightforward code • 60+ image requests per second • Required 4+ large EC2 instances clustered • One month pilot • AWS Lambda • 161,490,065 requests; 61,490,065 seconds • $1,050 • Amazon EC2: • 4 EC2 m4.10xlarge instances • $3,329 (reserved)
  23. 23. AWS Lambda Where we went serverless instead of NiFi
  24. 24. Lambda code example public void handleRequest(SNSEvent snsEvent, Context context) throws Exception { //get the JSON payload String message = snsEvent.getRecords().get(0).getSNS().getMessage(); //parse JSON //after retrieving the URL download the image BufferedImage image = ImageIO.read(imageUrl); //convert ImageIO.write(image, “jpeg”, byteArrayOutputStream); //save to S3 s3Client.putObject(new PutObjectRequest(bucketName, fileName, inputStream,metadata)); //write metadata to Dynamo Table table = dynamoDB.getTable(dynamoDbTable); Item item = new Item() .withString("url_hash", request.getHash()) .withString("url", request.getUrl()) .withString("s3_bucket", s3Bucket); table.putItem(item); //create and send a notification SendMessageRequest sendNewImageCachedMsg = new SendMessageRequest() .withQueueUrl(queueUrl).withMessageBody(newImageJson); amazonSqs.sendMessage(sendNewImageCachedMsg); }
  25. 25. How research and engineering collaborated on the effort Collaboration
  26. 26. Lessons learned Where we find benefit and why we use it • Fantastic for scaling • Obvious choice • Very performant when functions are loaded (warm) • API is easy to use • Just Java • Used for two key situations • Low cost development/pilot efforts • High volume/throughput
  27. 27. Lessons learned (continued) Where we find benefit and why we use it • Cold start performance • 30 s (cold) as opposed to 400 ms (warm) • Legacy code vs new development • Limits on jar sizes • Message size on Amazon SNS • 256 KB limit • Combine functionality in a single Lambda function! • Easier and cheaper to manage • Step functions for our use cases were too expensive
  28. 28. EC2 vs Lambda - EC2 based solution
  29. 29. EC2 vs Lambda - Lambda based solution
  30. 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Exploring Serverless Exploring Serverless
  31. 31. Amazon Athena • Great for exploring data in Amazon S3 • HQL / SQL support • Partition support • Use AWS Glue crawlers • Complements Hadoop cluster Where else can we use serverless?
  32. 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Moving Forward with Serverless
  33. 33. Supporting research Evaluation of capabilities requires infrastructure AWS Glue
  34. 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lessons Learned
  35. 35. In Summary • More and more we lean on AWS serverless services • We don’t have the resources for operations and maintenance • Government customers we support prefer serverless solutions • Makes it easier to provide researchers and engineers with flexible blueprints for their implementations • Focus on solving problems not setting up infrastructure • What are your technical needs. Do we already have something similar? • Leverage AWS environment to provide easy access to data, services, tools, and resources • Pleased with performance • We can “brute force” solutions if we have to • Most performance tuning is trivial • Find most cost-effective use cases for your needs • We have been able to strike a balance between serverless and managed • Periodically do spot checks on cost. Upfront calculations may have been incorrect
  36. 36. In Summary Cont’d • Go-to tech stack • Apache NiFi, Amazon S3, AWS Lambda, Amazon SQS, Amazon SNS, Amazon DynamoDB, Amazon RDS, others as needed • Take advantage of built-in events / triggers when you can • Most of the time S3 + events are good enough • “Free” capability • We have abandoned Kafka in favor of Apache NiFi site-to-site or Amazon SQS • Apache Kafka is great, just don’t have the administrative resources to support. Use AWS alternative, when possible. • Most “streaming” requests by our customers don’t really require streaming • Request that researchers and engineers catalog their data and try to follow basic data lake practices • Keep raw and enriched / augmented separate • Add metadata to known events and important time frames • Enable start / stop and replay to improve evaluation
  37. 37. Questions
  38. 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!