SlideShare une entreprise Scribd logo
1  sur  28
Analytics on write with AWS
Lambda
Unified Log London, 4th November 2015
Introducing myself
• Alex Dean
• Co-founder and technical lead at Snowplow,
the open-source event analytics platform
based here in London [1]
• Weekend writer of Unified Log Processing,
available on the Manning Early Access Program
[2]
[1] https://github.com/snowplow/snowplow
[2] http://manning.com/dean
Analytics on read, analytics
on write
It’s easier to start by explaining analytics on read, which
is much more widely practised and understood
1. Write all of our events to some kind of event store
2. Read the events from our event store to perform some analysis
In analytics on write, the analysis is performed on the
events in-stream (i.e. before reaching storage)
• Read our events from our event stream
• Analyze our events using a stream processing framework
• Write the summarized output of our analysis to some storage target
• Serve the summarized output into real-time dashboards, reports etc
Analytics on write and analytics on read are good at
different things, and leverage different technologies
With a unified log powered by Kafka or Kinesis, you can
apply both analytical approaches to your event stream
• Apache Kafka and Amazon Kinesis make it easy to have multiple
consuming apps on the same event stream
• Each consuming app can maintain its own “cursor position” on the
stream
Getting started with
analytics on write
What are some good use cases for getting started with
analytics on write?
Low-latency operational reporting, which must be fed
from the incoming event streams in as close to real-
time as possible
Dashboards to support thousands of simultaneous
users, for example a freight company might share a a
parcel tracker on its website for customers
Others? Please share your thoughts! 
1
2
?
Analytics on write is a very immature space – there’s only
a handful of tools and frameworks available so far…
PipelineDB
• Analytics on write (“continuous views”) using SQL
• Implemented as a Postgres fork
• Supports Kafka but no sharding yet (I believe)
amazon-kinesis-aggregators
• Reads from Kinesis streams and outputs to DynamoDB & CloudWatch
• JSON-based query recipes
• Written by Ian Meyers here in London
Druid
• Hybrid analytics on write, analytics on read
• Very rich JSON-based query language
• Supports Kafka
… or we can implement a bespoke analytics on write
solution – for example with AWS Lambda
• The central idea of AWS Lambda is that
developers should be writing functions
not servers
• With Lambda, we write self-contained
functions to process events, and then
we publish those functions to Lambda
to run
• We don’t worry about developing,
deploying or managing servers –
instead, Lambda takes care of auto-
scaling our functions to meet the
incoming event volumes
An AWS Lambda function is stateless and exists only for the side
effects that it performs
Designing an analytics on
write solution for OOPS
Let’s imagine that we have a global delivery company called
OOPS, which has five event types
OOPS management want a near-real time dashboard to tell
them two things
Where are each of our delivery trucks now?
How many miles has each of our delivery trucks
driven since its last oil change?
1
2
In DynamoDB, we could represent this as a simple table
All we need to do is write an AWS Lambda function to populate
this DynamoDB table from our event stream…
… however a more efficient approach is to apply some old-
school map-reduce to the micro-batch first
What do we mean when we talk about conditional writes in
DynamoDB?
Bonus: DynamoDB’s conditional write syntax is very readable
def conditionalWrite(row: Row) {
val vin = AttrVal.toJavaValue(row.vin)
updateIf(vin, "SET #m = :m",
"attribute_not_exists(#m) OR #m < :m",
Map(":m" -> AttrVal.toJavaValue(row.mileage)),
Map("#m" -> "mileage"))
for (maoc <- row.mileageAtOilChange) {
updateIf(vin, "SET #maoc = :maoc",
"attribute_not_exists(#maoc) OR #maoc < :maoc",
Map(":maoc" -> AttrVal.toJavaValue(maoc)),
Map("#maoc" -> "mileage-at-oil-change"))
}
for ((loc, ts) <- row.locationTs) {
updateIf(vin, "SET #ts = :ts, #lat = :lat, #long = :long",
"attribute_not_exists(#ts) OR #ts < :ts",
Map(":ts" -> AttrVal.toJavaValue(ts.toString),
":lat" -> AttrVal.toJavaValue(loc.latitude),
":long" -> AttrVal.toJavaValue(loc.longitude)),
Map("#ts" -> "location-timestamp", "#lat" -> "latitude",
"#long" -> "longitude"))
}
}
Demo
To simplify the demo, I performed some configuration
steps already (1/2)
1. Downloaded the Scala code from
https://github.com/alexanderdean/Unified-Log-
Processing/tree/master/ch11/11.2/aow-lambda
2. Built a “fatjar” for my Lambda function ($ sbt assembly)
3. Uploaded my fatjar to Amazon S3 ($ aws s3 cp …)
4. Ran a CloudFormation template to setup permissions for my
Lambda, available herehttps://ulp-
assets.s3.amazonaws.com/ch11/cf/aow-lambda.template
5. Registered my Lambda function with AWS Lambda ($ aws
lambda create-function …)
To simplify the demo, I performed some configuration
steps already (2/2)
6. Created a Kinesis stream ($ aws kinesis create-stream
…)
7. Created a DynamoDB table ($ aws dynamodb create-
table …)
8. Configured the registered Lambda function to use the Kinesis
stream as its input ($ aws lambda create-event-
source-mapping --event-source-arn
${stream_arn} --function-name AowLambda --
enabled --batch-size 100 --starting-position
TRIM_HORIZON)
Finally, let’s feed in some OOPS events…
host$ vagrant ssh
guest$ cd /vagrant/ch11/11.1
guest$ ./generate.py
Wrote DriverDeliversPackage with timestamp 2015-01-11 00:49:00
Wrote DriverMissesCustomer with timestamp 2015-01-11 04:07:00
Wrote TruckArrivesEvent with timestamp 2015-01-11 04:56:00
Wrote DriverDeliversPackage with timestamp 2015-01-11 06:16:00
Wrote TruckArrivesEvent with timestamp 2015-01-11 07:35:00
… and check our Kinesis stream, Lambda function and
DynamoDB table
Resources and further
reading
Further reading
Chapter 11, Analytics on write
Manning Deal of the Day today!
Discount code: dotd110415au (50% off
just today)
• https://www.pipelinedb.com/
• https://github.com/awslabs/amazon-kinesis-aggregators/
• http://druid.io/
• https://github.com/snowplow/aws-lambda-nodejs-example-project
• https://github.com/snowplow/aws-lambda-scala-example-project
• https://github.com/snowplow/spark-streaming-example-project
Questions?
http://snowplowanalytics.com
https://github.com/snowplow/snowplow
@snowplowdata
To meet up or chat, @alexcrdean on Twitter or
alex@snowplowanalytics.com

Contenu connexe

Plus de Alexander Dean

Unified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logUnified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logAlexander Dean
 
AWS User Group UK: Why your company needs a unified log
AWS User Group UK: Why your company needs a unified logAWS User Group UK: Why your company needs a unified log
AWS User Group UK: Why your company needs a unified logAlexander Dean
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaAlexander Dean
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againAlexander Dean
 
Span Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logSpan Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logAlexander Dean
 
Big Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowBig Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowAlexander Dean
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...Alexander Dean
 
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...Alexander Dean
 

Plus de Alexander Dean (8)

Unified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logUnified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified log
 
AWS User Group UK: Why your company needs a unified log
AWS User Group UK: Why your company needs a unified logAWS User Group UK: Why your company needs a unified log
AWS User Group UK: Why your company needs a unified log
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back again
 
Span Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logSpan Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified log
 
Big Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowBig Data Beers - Introducing Snowplow
Big Data Beers - Introducing Snowplow
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
 
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
Snowplow and Kinesis - Presentation to the inaugural Amazon Kinesis London Us...
 

Dernier

While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 

Dernier (20)

While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 

Unified Log London: Analytics on write with AWS Lambda

  • 1. Analytics on write with AWS Lambda Unified Log London, 4th November 2015
  • 2. Introducing myself • Alex Dean • Co-founder and technical lead at Snowplow, the open-source event analytics platform based here in London [1] • Weekend writer of Unified Log Processing, available on the Manning Early Access Program [2] [1] https://github.com/snowplow/snowplow [2] http://manning.com/dean
  • 3. Analytics on read, analytics on write
  • 4. It’s easier to start by explaining analytics on read, which is much more widely practised and understood 1. Write all of our events to some kind of event store 2. Read the events from our event store to perform some analysis
  • 5. In analytics on write, the analysis is performed on the events in-stream (i.e. before reaching storage) • Read our events from our event stream • Analyze our events using a stream processing framework • Write the summarized output of our analysis to some storage target • Serve the summarized output into real-time dashboards, reports etc
  • 6. Analytics on write and analytics on read are good at different things, and leverage different technologies
  • 7. With a unified log powered by Kafka or Kinesis, you can apply both analytical approaches to your event stream • Apache Kafka and Amazon Kinesis make it easy to have multiple consuming apps on the same event stream • Each consuming app can maintain its own “cursor position” on the stream
  • 9. What are some good use cases for getting started with analytics on write? Low-latency operational reporting, which must be fed from the incoming event streams in as close to real- time as possible Dashboards to support thousands of simultaneous users, for example a freight company might share a a parcel tracker on its website for customers Others? Please share your thoughts!  1 2 ?
  • 10. Analytics on write is a very immature space – there’s only a handful of tools and frameworks available so far… PipelineDB • Analytics on write (“continuous views”) using SQL • Implemented as a Postgres fork • Supports Kafka but no sharding yet (I believe) amazon-kinesis-aggregators • Reads from Kinesis streams and outputs to DynamoDB & CloudWatch • JSON-based query recipes • Written by Ian Meyers here in London Druid • Hybrid analytics on write, analytics on read • Very rich JSON-based query language • Supports Kafka
  • 11. … or we can implement a bespoke analytics on write solution – for example with AWS Lambda • The central idea of AWS Lambda is that developers should be writing functions not servers • With Lambda, we write self-contained functions to process events, and then we publish those functions to Lambda to run • We don’t worry about developing, deploying or managing servers – instead, Lambda takes care of auto- scaling our functions to meet the incoming event volumes
  • 12. An AWS Lambda function is stateless and exists only for the side effects that it performs
  • 13. Designing an analytics on write solution for OOPS
  • 14. Let’s imagine that we have a global delivery company called OOPS, which has five event types
  • 15. OOPS management want a near-real time dashboard to tell them two things Where are each of our delivery trucks now? How many miles has each of our delivery trucks driven since its last oil change? 1 2
  • 16. In DynamoDB, we could represent this as a simple table
  • 17. All we need to do is write an AWS Lambda function to populate this DynamoDB table from our event stream…
  • 18. … however a more efficient approach is to apply some old- school map-reduce to the micro-batch first
  • 19. What do we mean when we talk about conditional writes in DynamoDB?
  • 20. Bonus: DynamoDB’s conditional write syntax is very readable def conditionalWrite(row: Row) { val vin = AttrVal.toJavaValue(row.vin) updateIf(vin, "SET #m = :m", "attribute_not_exists(#m) OR #m < :m", Map(":m" -> AttrVal.toJavaValue(row.mileage)), Map("#m" -> "mileage")) for (maoc <- row.mileageAtOilChange) { updateIf(vin, "SET #maoc = :maoc", "attribute_not_exists(#maoc) OR #maoc < :maoc", Map(":maoc" -> AttrVal.toJavaValue(maoc)), Map("#maoc" -> "mileage-at-oil-change")) } for ((loc, ts) <- row.locationTs) { updateIf(vin, "SET #ts = :ts, #lat = :lat, #long = :long", "attribute_not_exists(#ts) OR #ts < :ts", Map(":ts" -> AttrVal.toJavaValue(ts.toString), ":lat" -> AttrVal.toJavaValue(loc.latitude), ":long" -> AttrVal.toJavaValue(loc.longitude)), Map("#ts" -> "location-timestamp", "#lat" -> "latitude", "#long" -> "longitude")) } }
  • 21. Demo
  • 22. To simplify the demo, I performed some configuration steps already (1/2) 1. Downloaded the Scala code from https://github.com/alexanderdean/Unified-Log- Processing/tree/master/ch11/11.2/aow-lambda 2. Built a “fatjar” for my Lambda function ($ sbt assembly) 3. Uploaded my fatjar to Amazon S3 ($ aws s3 cp …) 4. Ran a CloudFormation template to setup permissions for my Lambda, available herehttps://ulp- assets.s3.amazonaws.com/ch11/cf/aow-lambda.template 5. Registered my Lambda function with AWS Lambda ($ aws lambda create-function …)
  • 23. To simplify the demo, I performed some configuration steps already (2/2) 6. Created a Kinesis stream ($ aws kinesis create-stream …) 7. Created a DynamoDB table ($ aws dynamodb create- table …) 8. Configured the registered Lambda function to use the Kinesis stream as its input ($ aws lambda create-event- source-mapping --event-source-arn ${stream_arn} --function-name AowLambda -- enabled --batch-size 100 --starting-position TRIM_HORIZON)
  • 24. Finally, let’s feed in some OOPS events… host$ vagrant ssh guest$ cd /vagrant/ch11/11.1 guest$ ./generate.py Wrote DriverDeliversPackage with timestamp 2015-01-11 00:49:00 Wrote DriverMissesCustomer with timestamp 2015-01-11 04:07:00 Wrote TruckArrivesEvent with timestamp 2015-01-11 04:56:00 Wrote DriverDeliversPackage with timestamp 2015-01-11 06:16:00 Wrote TruckArrivesEvent with timestamp 2015-01-11 07:35:00
  • 25. … and check our Kinesis stream, Lambda function and DynamoDB table
  • 27. Further reading Chapter 11, Analytics on write Manning Deal of the Day today! Discount code: dotd110415au (50% off just today) • https://www.pipelinedb.com/ • https://github.com/awslabs/amazon-kinesis-aggregators/ • http://druid.io/ • https://github.com/snowplow/aws-lambda-nodejs-example-project • https://github.com/snowplow/aws-lambda-scala-example-project • https://github.com/snowplow/spark-streaming-example-project

Notes de l'éditeur

  1. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  2. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log