SlideShare une entreprise Scribd logo
1  sur  24
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cecilia Deng
Software Engineer
AWS Lambda Stream Processing
Agenda
Overview of Lambda
Overview of Kinesis to Lambda event source
Configuring the event source
Lambda side processing
AWS Lambda: Overview
Lambda functions: a piece of code with stateless execution
Triggered by events:
• Direct Sync and Async API calls
• AWS Service integrations
• 3rd party triggers
• And many more …
Makes it easy to:
• Perform data-driven auditing, analysis, and notification
• Build back-end services that perform at scale
Amazon Kinesis: Overview
Real-time Ingest
Highly Scalable
Durable
Replay-able Reads
Continuous Processing
GetShardIterator and GetRecords(ShardIterator)
Allows checkpointing/ replay
Enables multi concurrent processing
KCL, Firehose, Analytics, Lambda
• Managed Service - makes it easy to provision, deploy, manage your stream
• Low end-to-end (real time) latency
• Pay as you go, no upfront cost
• Enables data movement to many stores or processing engines
Data Processing/Streaming
Architecture & Workflow
Smart
Devices
Click
Stream
Log
Data
AWS Lambda and Amazon Kinesis integration
How it Works
Pull event source model:
▪ Kinesis mapped as Event Source in Lambda
▪ Lambda polls the stream and batches available records
▪ Batches are passed for invocation to Lambda through function
parameters
What this means:
▪ Fleet of pollers running a flavor of KCL
▪ No resource policy
AWS Lambda and Amazon Kinesis integration
Event structure:
▪ Event received by Lambda function is a collection of records
from Kinesis stream
AWS Lambda and Amazon Kinesis integration
Synchronous invocation:
▪ Lambda invoked as synchronous RequestResponse type
▪ Lambda honors Kinesis at least once semantics
▪ Each shard blocks on in order synchronous invocation
AWS Lambda and Amazon Kinesis integration
• Multiple functions can be mapped to one
stream
• Multiple streams can be mapped to one
Lambda function
• Each mapping is a unique key pair
Kinesis stream to Lambda function
• Each mapping has unique shard
iterators
Amazon
Kinesis
1
AWS
Lambda
1
Amazon
CloudWatc
h
Amazon
Dynamo
DB
AWS
Lambd
a 2
Amazo
n
S3Amazon
Kinesis
2
DEMO
Creating a Kinesis stream
Streams
▪ Made up of Shards
▪ Each Shard supports writes up to 1MB/s
▪ Each Shard supports reads up to 2MB/s
across maximum 5 transactions/s
Data
▪ All data is stored and replayable for 24 hours
▪ A Partition Key is supplied by producer and used to distribute the PUTs across
Shards (using MD5 function to hash to 128-bit integers)
▪ A unique Sequence # is returned to the Producer upon a successful PUT call
▪ Make sure partition key distribution is even to optimize parallel throughput
▪ Pick a key with more groups than shards
Creating a Lambda function
Memory:
▪ CPU and disk proportional to the memory configured
▪ Increasing memory makes your code execute faster (if CPU bound)
▪ Increasing memory allows for larger record sizes processed
Timeout:
▪ Increasing timeout allows for longer functions, but more wait in case of
errors
Retries:
▪ For Kinesis, Lambda retries until the data expires (default 24 hours)
Permission model:
• The execution role defined for Lambda must have permission to access
the stream
Configuring the Event Source
Batch size:
▪ Max number of records that Lambda will send to one invocation
▪ Not equivalent to how many records Lambda will poll
▪ Effective batch size is every 250 ms
MIN(records available, batch size, 6MB)
▪ Increasing batch size allows fewer Lambda function invocations with
more data processed per function
Configuring the Event Source
Starting Position:
▪ The position in the stream where Lambda starts reading
▪ Set to “Trim Horizon” for reading from start of stream (all data)
▪ Set to “Latest” for reading most recent data (LIFO) (latest data)
Processing
Per Shard:
▪ Lambda calls GetRecords with max limit from Kinesis (10k or 10MB)
▪ If no record, wait 250 ms
▪ From in memory, sub batches and formats records into Lambda payload
▪ Invoke Lambda with synchronous invoke
… …
Source
Kinesis
Destination
1
Lambda Poller
Destination
2
Functions
Shards
Lambda will scale automaticallyScale Kinesis by adding shards
Batch sync invokesPolls
Processing
▪ Lambda blocks on ordered processing for each individual shard
▪ Increasing # of shards with even distribution allows increased concurrency
▪ Batch size may impact duration if the Lambda function takes longer to process
more records
… …
Source
Kinesis
Destination
1
Lambda Poller
Destination
2
Functions
Shards
Lambda will scale automaticallyScale Kinesis by adding shards
Batch sync invokesPolls
Processing
▪ Maximum theoretical throughput :
# shards * 2MB / (s)
▪ Effective theoretical throughput :
# shards * batch size (MB) / Lambda function duration (s)
▪ If put / ingestion rate is greater than the theoretical throughput, your processing
is at risk of falling behind
Common observations:
▪ Effective batch size may be less than configured during low throughput
▪ Effective batch size will increase during higher throughput
▪ Number of invokes and GetRecord calls will decrease with increased Lambda
duration
Processing
▪ Retry execution failures until the record is expired
▪ Retry with exponential backoff up to 60s
▪ Throttles and errors impacts duration and directly impacts throughput
▪ Effective theoretical throughput :
( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry)
Kinesis
Destination
1
Destination
2
… …
Source
Functions
Shards
Lambda will scale automaticallyScale Kinesis by adding shards
Receives errorPolls
Receives error
Receives success
Lambda Poller
Monitoring Kinesis
•GetRecords (effective throughput) : bytes, latency, records etc
•PutRecord : bytes, latency, records, etc
•GetRecords.IteratorAgeMilliseconds: how old your last processed records were.
If high, processing is falling behind. If close to 24 hours, records are close to
being dropped.
Monitoring Lambda
•Monitoring: available in Amazon CloudWatch Metrics
• Invocation count
• Duration
• Error count
• Throttle count
•Debugging: available in Amazon CloudWatch Logs
• All Metrics
• Custom logs
• RAM consumed
• Search for log events
Best Practices
▪ Create enough shards for parallel processing
▪ Distribute load evenly across shards
▪ Monitor and address Lambda errors and throttles
▪ Monitor Kinesis limits and throttles
Best Practices
Create different Lambda functions for each task, associate to same Kinesis stream
Log to
CloudWatch
Logs
Push to SNS
Get Started: Data Processing with AWS
Next Steps
1. Create your first Kinesis stream. Configure hundreds of thousands
of data producers to put data into an Amazon Kinesis stream. Ex.
data from Social media feeds.
2. Create and test your first Lambda function. Use any third party
library, even native ones. First 1M requests each month are on us!
3. Read the Developer Guide, AWS Lambda and Kinesis Tutorial, and
resources on GitHub at AWS Labs
• http://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html
• https://github.com/awslabs/lambda-streams-to-firehose
Thank you!

Contenu connexe

En vedette

Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSAmazon Web Services
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesAmazon Web Services
 
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the CloudAWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the CloudAmazon Web Services
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheAmazon Web Services
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...Amazon Web Services
 
Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSAmazon Web Services
 
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAmazon Web Services
 
DevOps at Amazon: A Look at Our Tools and Processes
 DevOps at Amazon: A Look at Our Tools and Processes DevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and ProcessesAmazon Web Services
 
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at ScaleAWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at ScaleAmazon Web Services
 
AWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing ZoneAWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing ZoneAmazon Web Services
 
Fast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisFast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
AWS Enterprise Summit Netherlands - AWS IoT
AWS Enterprise Summit Netherlands - AWS IoTAWS Enterprise Summit Netherlands - AWS IoT
AWS Enterprise Summit Netherlands - AWS IoTAmazon Web Services
 
Managing Your Infrastructure as Code
Managing Your Infrastructure as CodeManaging Your Infrastructure as Code
Managing Your Infrastructure as CodeAmazon Web Services
 
Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016Amazon Web Services
 
Storage with Amazon S3 and Amazon Glacier
Storage with Amazon S3 and Amazon GlacierStorage with Amazon S3 and Amazon Glacier
Storage with Amazon S3 and Amazon GlacierAmazon Web Services
 
Mobile Software in AWS Marketplace
Mobile Software in AWS MarketplaceMobile Software in AWS Marketplace
Mobile Software in AWS MarketplaceAmazon Web Services
 

En vedette (19)

Getting Started on AWS
Getting Started on AWS Getting Started on AWS
Getting Started on AWS
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Deep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECSDeep Dive on Microservices and Amazon ECS
Deep Dive on Microservices and Amazon ECS
 
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar SeriesDeep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
Deep Dive Amazon Redshift for Big Data Analytics - September Webinar Series
 
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the CloudAWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
AWS Enterprise Summit Netherlands - Starting Your Journey in the Cloud
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCache
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
 
Rackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWSRackspace: Best Practices for Security Compliance on AWS
Rackspace: Best Practices for Security Compliance on AWS
 
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWSAWS Enterprise Summit Netherlands - Enterprise Applications on AWS
AWS Enterprise Summit Netherlands - Enterprise Applications on AWS
 
DevOps at Amazon: A Look at Our Tools and Processes
 DevOps at Amazon: A Look at Our Tools and Processes DevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and Processes
 
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at ScaleAWS Enterprise Summit Netherlands - Cost Optimisation at Scale
AWS Enterprise Summit Netherlands - Cost Optimisation at Scale
 
AWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing ZoneAWS Enterprise Summit Netherlands - Creating a Landing Zone
AWS Enterprise Summit Netherlands - Creating a Landing Zone
 
Fast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for RedisFast Data at Scale with Amazon ElastiCache for Redis
Fast Data at Scale with Amazon ElastiCache for Redis
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
AWS Enterprise Summit Netherlands - AWS IoT
AWS Enterprise Summit Netherlands - AWS IoTAWS Enterprise Summit Netherlands - AWS IoT
AWS Enterprise Summit Netherlands - AWS IoT
 
Managing Your Infrastructure as Code
Managing Your Infrastructure as CodeManaging Your Infrastructure as Code
Managing Your Infrastructure as Code
 
Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016Big Data on AWS - Toronto FSI Symposium - October 2016
Big Data on AWS - Toronto FSI Symposium - October 2016
 
Storage with Amazon S3 and Amazon Glacier
Storage with Amazon S3 and Amazon GlacierStorage with Amazon S3 and Amazon Glacier
Storage with Amazon S3 and Amazon Glacier
 
Mobile Software in AWS Marketplace
Mobile Software in AWS MarketplaceMobile Software in AWS Marketplace
Mobile Software in AWS Marketplace
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Real-Time Data Processing Using AWS Lambda - September Webinar Series

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cecilia Deng Software Engineer AWS Lambda Stream Processing
  • 2. Agenda Overview of Lambda Overview of Kinesis to Lambda event source Configuring the event source Lambda side processing
  • 3. AWS Lambda: Overview Lambda functions: a piece of code with stateless execution Triggered by events: • Direct Sync and Async API calls • AWS Service integrations • 3rd party triggers • And many more … Makes it easy to: • Perform data-driven auditing, analysis, and notification • Build back-end services that perform at scale
  • 4. Amazon Kinesis: Overview Real-time Ingest Highly Scalable Durable Replay-able Reads Continuous Processing GetShardIterator and GetRecords(ShardIterator) Allows checkpointing/ replay Enables multi concurrent processing KCL, Firehose, Analytics, Lambda • Managed Service - makes it easy to provision, deploy, manage your stream • Low end-to-end (real time) latency • Pay as you go, no upfront cost • Enables data movement to many stores or processing engines
  • 5. Data Processing/Streaming Architecture & Workflow Smart Devices Click Stream Log Data
  • 6. AWS Lambda and Amazon Kinesis integration How it Works Pull event source model: ▪ Kinesis mapped as Event Source in Lambda ▪ Lambda polls the stream and batches available records ▪ Batches are passed for invocation to Lambda through function parameters What this means: ▪ Fleet of pollers running a flavor of KCL ▪ No resource policy
  • 7. AWS Lambda and Amazon Kinesis integration Event structure: ▪ Event received by Lambda function is a collection of records from Kinesis stream
  • 8. AWS Lambda and Amazon Kinesis integration Synchronous invocation: ▪ Lambda invoked as synchronous RequestResponse type ▪ Lambda honors Kinesis at least once semantics ▪ Each shard blocks on in order synchronous invocation
  • 9. AWS Lambda and Amazon Kinesis integration • Multiple functions can be mapped to one stream • Multiple streams can be mapped to one Lambda function • Each mapping is a unique key pair Kinesis stream to Lambda function • Each mapping has unique shard iterators Amazon Kinesis 1 AWS Lambda 1 Amazon CloudWatc h Amazon Dynamo DB AWS Lambd a 2 Amazo n S3Amazon Kinesis 2
  • 10. DEMO
  • 11. Creating a Kinesis stream Streams ▪ Made up of Shards ▪ Each Shard supports writes up to 1MB/s ▪ Each Shard supports reads up to 2MB/s across maximum 5 transactions/s Data ▪ All data is stored and replayable for 24 hours ▪ A Partition Key is supplied by producer and used to distribute the PUTs across Shards (using MD5 function to hash to 128-bit integers) ▪ A unique Sequence # is returned to the Producer upon a successful PUT call ▪ Make sure partition key distribution is even to optimize parallel throughput ▪ Pick a key with more groups than shards
  • 12. Creating a Lambda function Memory: ▪ CPU and disk proportional to the memory configured ▪ Increasing memory makes your code execute faster (if CPU bound) ▪ Increasing memory allows for larger record sizes processed Timeout: ▪ Increasing timeout allows for longer functions, but more wait in case of errors Retries: ▪ For Kinesis, Lambda retries until the data expires (default 24 hours) Permission model: • The execution role defined for Lambda must have permission to access the stream
  • 13. Configuring the Event Source Batch size: ▪ Max number of records that Lambda will send to one invocation ▪ Not equivalent to how many records Lambda will poll ▪ Effective batch size is every 250 ms MIN(records available, batch size, 6MB) ▪ Increasing batch size allows fewer Lambda function invocations with more data processed per function
  • 14. Configuring the Event Source Starting Position: ▪ The position in the stream where Lambda starts reading ▪ Set to “Trim Horizon” for reading from start of stream (all data) ▪ Set to “Latest” for reading most recent data (LIFO) (latest data)
  • 15. Processing Per Shard: ▪ Lambda calls GetRecords with max limit from Kinesis (10k or 10MB) ▪ If no record, wait 250 ms ▪ From in memory, sub batches and formats records into Lambda payload ▪ Invoke Lambda with synchronous invoke … … Source Kinesis Destination 1 Lambda Poller Destination 2 Functions Shards Lambda will scale automaticallyScale Kinesis by adding shards Batch sync invokesPolls
  • 16. Processing ▪ Lambda blocks on ordered processing for each individual shard ▪ Increasing # of shards with even distribution allows increased concurrency ▪ Batch size may impact duration if the Lambda function takes longer to process more records … … Source Kinesis Destination 1 Lambda Poller Destination 2 Functions Shards Lambda will scale automaticallyScale Kinesis by adding shards Batch sync invokesPolls
  • 17. Processing ▪ Maximum theoretical throughput : # shards * 2MB / (s) ▪ Effective theoretical throughput : # shards * batch size (MB) / Lambda function duration (s) ▪ If put / ingestion rate is greater than the theoretical throughput, your processing is at risk of falling behind Common observations: ▪ Effective batch size may be less than configured during low throughput ▪ Effective batch size will increase during higher throughput ▪ Number of invokes and GetRecord calls will decrease with increased Lambda duration
  • 18. Processing ▪ Retry execution failures until the record is expired ▪ Retry with exponential backoff up to 60s ▪ Throttles and errors impacts duration and directly impacts throughput ▪ Effective theoretical throughput : ( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry) Kinesis Destination 1 Destination 2 … … Source Functions Shards Lambda will scale automaticallyScale Kinesis by adding shards Receives errorPolls Receives error Receives success Lambda Poller
  • 19. Monitoring Kinesis •GetRecords (effective throughput) : bytes, latency, records etc •PutRecord : bytes, latency, records, etc •GetRecords.IteratorAgeMilliseconds: how old your last processed records were. If high, processing is falling behind. If close to 24 hours, records are close to being dropped.
  • 20. Monitoring Lambda •Monitoring: available in Amazon CloudWatch Metrics • Invocation count • Duration • Error count • Throttle count •Debugging: available in Amazon CloudWatch Logs • All Metrics • Custom logs • RAM consumed • Search for log events
  • 21. Best Practices ▪ Create enough shards for parallel processing ▪ Distribute load evenly across shards ▪ Monitor and address Lambda errors and throttles ▪ Monitor Kinesis limits and throttles
  • 22. Best Practices Create different Lambda functions for each task, associate to same Kinesis stream Log to CloudWatch Logs Push to SNS
  • 23. Get Started: Data Processing with AWS Next Steps 1. Create your first Kinesis stream. Configure hundreds of thousands of data producers to put data into an Amazon Kinesis stream. Ex. data from Social media feeds. 2. Create and test your first Lambda function. Use any third party library, even native ones. First 1M requests each month are on us! 3. Read the Developer Guide, AWS Lambda and Kinesis Tutorial, and resources on GitHub at AWS Labs • http://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html • https://github.com/awslabs/lambda-streams-to-firehose

Notes de l'éditeur

  1. To elaborate a bit more on how Lambda works. You can trigger invocations through direct API / SDK calls – sync or async. You can also use various AWS Service integrators like S3 (an event whenever an entry is created/updated/deleted) or SNS, third party triggers, etc Event sources like kinesis or ddb update streams make it easy to perform real time data-driven auditing, analysis, and notification with Lambda. However, with the help of API Gateway to create HTTP endpoints powered by Lambda functions, you can create a set of back-end microservices that perform at scale.
  2. Limits per shard, but can scale to many shards Data is replicated synchronously across AZs for resiliency and durability Multiple applications can process from the stream concurrently in order at different rates Data is available to be processed within seconds of a put Processing constructs include checkpointing and shard iteratorg.
  3. Always FIFO Trim Horizon (start of stream) latest on reading current data.
  4. Always FIFO Trim Horizon (start of stream) latest on reading current data.