Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

SRV304_Building High-Throughput Serverless Data Processing Pipelines

1 216 vues

Publié le

Have a lot of real-time data piling up? Need to analyze it, transform it, and store it somewhere else real quick? What if there were an easier way to perform streaming data processing, with less setup, instant scaling, and no servers to provision and manage? With serverless computing, you can build applications to meet your real-time needs for everything from IoT data to operational logs without needing to spin up servers or install software. Come learn how to leverage AWS Lambda with Amazon Kinesis, Kinesis Firehose, and Kinesis Analytics to architect highly scalable, high throughput pipelines that can cover all your real-time processing needs. We will cover different example architectures that handle use cases like in-line process or data manipulation, as well as discuss the advantages of using an AWS managed stream.

  • Soyez le premier à commenter

SRV304_Building High-Throughput Serverless Data Processing Pipelines

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT SRV304 Building High-Throughput Serverless Data Processing Pipelines C e c i l i a D e n g , S o f t w a r e D e v e l o p e r o n A W S L a m b d a N o v e m b e r 2 8 , 2 0 1 7
  2. 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Me • Canadian • UBC • EA Canada • AWS Lambda
  3. 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Goal Data processing that is • High-throughput ( > 1 GB/s) • Serverless (no servers to manage) • Real-time (pipeline)
  4. 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What to Expect • Why streams? • What’s AWS Lambda? • What’s Amazon Kinesis? • What does serverless stream processing look like? • How does Lambda process streams? • Examples use cases
  5. 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. WHY STREAMS?
  6. 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Stream Processing Goal • High-throughput ( > 1 GB/s) • Serverless (managed compute) • Real-time (pipeline) Streams • Data size constraint • Data time constraint • Have access to recent data • Processing time constraint Batch • No size constraint • No time constraint (not real-time) • Have access to all data • Long running processing (reports)
  7. 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Because you have data that is: • Generated continuously and simultaneously by thousands of data sources • Typically small sizes (KBs) And needs to be processed either: • Sequentially and incrementally • Or over sliding windows in some real-time constraint Stream Processing
  8. 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. WHAT’S LAMBDA?
  9. 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. It’s your function Your libraries, your code, your executable With a programming model Easy to start blueprints and tutorials, monitoring, and logging That runs stateless Infrastructure abstracted, persist data using Amazon DynamoDB, Amazon S3, or ElastiCache And integrated security model IAM resource policies and roles, VPC Support Lambda: What Is It? And flexible resource model Choose your memory and we allocate proportional CPU, network bandwidth, disk I/O
  10. 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda: How Do I Trigger It? Amazon S3 Amazon SNS ASYNCHRONOUS PUSH MODEL Amazon Alexa AWS IoT SYNCHRONOUS PUSH MODEL Mapping owned by Event Source triggers Lambda via Invoke APIs resource-based policy permissions RequestResponse invocation Event Invocation HOW IT WORKS
  11. 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda: How do I Trigger It? Amazon DynamoDB Amazon Kinesis STREAM PULL MODEL Mapping owned by Lambda Lambda function invokes when new records are found on stream Lambda execution role policy permissions Polled batch RequestResponse invocation Lambda polls the streams
  12. 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda Node.js Python Java C# FUNCTIONEVENT SOURCE AWS CloudFormation Amazon API Gateway Amazon SNS Database Cloud Service Anything ENDPOINT Amazon Kinesis
  13. 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Node.js Python Java C# FUNCTION Amazon Kinesis ENDPOINT Database Cloud Service Anything EVENT SOURCE IoT Data IoT Data Financial Data Log Data Kinesis
  14. 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. WHAT’S KINESIS?
  15. 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. It’s storage For real-time data that’s only stored for a limited time Where new data is made available quickly Typically less than 1 second put-to-get delay That uses a checkpoint model Supports multiple concurrent in- ordered processing Kinesis: What Is It? As a managed service With APIs that let you easily create and configure the stream and put and retrieve data
  16. 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis: How do I Process It? … Source Shards GetRecords PutRecords • Poll for work • Checkpoint for progress • Separate checkpoints for multiple consumers • Use the KCL library Scale Amazon Kinesis by splitting or merging shards
  17. 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streams: How do I Process It? … DDB events Shards GetRecords • Poll for work • Checkpoint for progress • Separate checkpoints for multiple consumers • Use the KCL library Scale Amazon Kinesis by splitting or merging shards
  18. 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SEEMS HARD. CAN I NOT?
  19. 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Firehose • Manages stream: • No shard configuration • No partition key or order • Manages stream processing: • Polls for records • Dump to one of • Amazon S3 • Amazon Redshift • Amazon Elasticsearch Service • Compute power default 8 * (1 vCPU + 4GB) KPU • Choose a Lambda transform function • JSON/CSV to whatever • Apache Log to JSON/CSV • Syslog to JSON/CSV
  20. 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Firehose
  21. 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Firehose
  22. 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Analytics • Does not manage stream: • Need to configure Kinesis Stream • Manages stream processing: • From Amazon Kinesis or Kinesis Firehose • Polls for records • Uses a SQL model to continuously: • Map record data to internal “stream tables” (aggregation) • Query the internal “stream tables” for desired results (filter) • Output the desired results to • Additional internal “stream tables” (further aggregation) or • External Kinesis Stream or Kinesis Firehose (destination store) • Compute power default 8 * (1 vCPU + 4GB) KPU
  23. 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Analytics
  24. 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Analytics
  25. 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis
  26. 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda • Does not manage stream: • Need to configure Kinesis Stream • Manages stream processing: • From Amazon Kinesis or DynamoDB streams • Polls for records • Sends for invocation to a Lambda function • Computer power default 1000 * (configured memory and associated sized CPU) • Setup with Lambda createEventSourceMapping • Lambda: • Preserves order • Soft concurrent limit of 1000 invocations * (max 3GB memory and associated sized CPU) • Completely customized model and functionality
  27. 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda
  28. 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda … Source Amazon Kinesis Destination 1 Destination 2 Shards Scale Amazon Kinesis by splitting or merging shards Polls a batch Lambda will scale automatically … Lambda Waits for response
  29. 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. STREAM PROCESSING BY LAMBDA
  30. 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda
  31. 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda … Source Shards Trim horizonCheckpointCheckpointLatest Checkpoint
  32. 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda Event received by Lambda function is a collection of records from the stream: { "Records": [ { "kinesis": { "partitionKey": "partitionKey-3", "kinesisSchemaVersion": "1.0", "data": "SGVsbG8sIHRoaXMgaXMgYSB0ZXN0IDEyMy4=", "sequenceNumber": "49545115243490985018280067714973144582180062593244200961" }, "eventSource": "aws:kinesis", "eventID": "shardId- 000000000000:49545115243490985018280067714973144582180062593244200961", "invokeIdentityArn": "arn:aws:iam::account-id:role/testLEBRole", "eventVersion": "1.0", "eventName": "aws:kinesis:record", "eventSourceARN": "arn:aws:kinesis:us-west-2:35667example:stream/examplestream", "awsRegion": "us-west-2" } ] }
  33. 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda
  34. 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda Per shard: ▪ Lambda calls GetRecords with max limit from Kinesis (10 k or 10 MB) ▪ If no record, wait some time (1s) ▪ Sub-batch in-memory and format records into Lambda payload ▪ Invoke Lambda with synchronous invoke … … Source Amazon Kinesis Destination 1 Lambda Destination 2 Shards Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards Waits for responsePolls a batch
  35. 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda ▪ Lambda blocks on ordered processing for each individual shard ▪ Increasing # of shards with even distribution allows increased concurrency ▪ Batch size may impact duration if the Lambda function takes longer to process more records … … Source Amazon Kinesis Destination 1 Lambda Destination 2 Shards Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards Waits for responsePolls a batch
  36. 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda Polls and blocks on synchronous invocation per shard If put/ingestion rate is greater than the theoretical throughput, your processing is at risk of falling behind Maximum theoretical throughput # shards * 2 MB / Lambda function duration (s) Effective theoretical throughput # shards * batch size (MB) / Lambda function duration (s)
  37. 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda Retries Will retry on execution failures until the record is expired Throttles and errors impact duration and directly impact throughput Best practice Retry with exponential backoff Effective theoretical throughput with retries ( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry) … Source Amazon Kinesis Destination 1 Lambda Destination 2 Shards Polls a batch Receives success Receives error Receives error
  38. 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MORE EXAMPLES
  39. 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-time Ad Serving
  40. 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Assembly Line
  41. 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Anomaly Detection
  42. 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Anomaly Detection
  43. 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Game Analytics Store real-time player scores and stats Send to Lambda for further aggregation like Top scores or Longest runs Surface leaderboards
  44. 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Game Analytics
  45. 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. QUESTIONS?
  46. 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! @ c i c i k e n d i g g i t ( m o s t l y m e c o m p l a i n i n g t o a i r l i n e s )

×