SlideShare une entreprise Scribd logo
1  sur  39
StreamING models
Realtime model deployment of ML capabilities
Erik de Nooij, IT Chapter Lead Fraud&Cybersec.
 IT Chapter Lead within the Fraud & Cybersecurity
department, based in Amsterdam
 Before ING implemented Enterprise Software,
mainly knowledge management and CRM related
 Background in: Scala, Java, C# (MCSD), Tomcat, Websphere,
Oracle, Cassandra and now….Flink
https://www.linkedin.com/in/erik-de-nooij-93ab1a/
Erik.g.de.Nooij@ing.nl
Who Am I?
2
About ING
Worldwide
 35 Million customers
 51.000 Employees
 Presence in over 40 countries
Netherlands
 9 Million Customers
 Billion logins yearly on https://www.ing.nl
 1 million transactions per day
About ING
4
Market leaders Benelux
Growth markets
Commercial Banking
Challengers
The Netherlands
Threats
Individuals Small groups worldwide groups Organized crime
Manual detection
Rule based detection
Model based detection
Criminal
organizationResponse
Scanomaly detection
Fake ID Skimming Phishing APT
?
2008 2010 2012 2014
2017
Threats related to fraud & cybersecurity
5
Carbanak APT (Advanced Persistent Threat)
6
 This started via a phishing email…
 Support various types of (ML) models
 Tools to create models versus scoring models
 One codebase, SaaS deployment model
 Make changes instantly (no downtime)
 Multiple domains
Goals
7
 Support various types of (ML) models
 One codebase, SaaS deployment model
 Pre-processor, Decoupled architecture
 Make changes instantly (no downtime)
 Multiple domains
Goals
8
 Support various types of (ML) models
 One codebase, SaaS deployment model
 Make changes instantly (no downtime)
 Use case
 Feature extraction
 Enriching streams
 End user tooling
 Demo
 Multiple domains
Goals
9
 Support various types of (ML) models
 One codebase, SaaS deployment model
 Make changes instantly (no downtime)
 Multiple domains
 examples
Goals
10
Support various types of models
Model creation
HDFS
offline
Model execution
Streaming
platform
online
Creating models offline, scoring online
12
<PMML />
{PFA}
Portable model
 The Predictive Model Markup Language (PMML)
is an XML-based predictive model interchange format
Predictive Model Markup Language (PMML)
13
<SimpleRule score="Alert" weight="1.0">
<CompoundPredicate booleanOperator="and">
<SimplePredicate field="field1" operator="greaterThan" value="500"/>
<SimplePredicate field="field2" operator="equal" value="1"/>
<SimplePredicate field="field3" operator="greaterThan" value="1"/>
</CompoundPredicate>
</SimpleRule>
if field1 > 500
AND
field2 == 1
AND
field3 > 1
 The Predictive Model Markup Language (PMML)
is an XML-based predictive model interchange format
Predictive Model Markup Language (PMML)
14
Machinelearningtools supporting pmml
15
 Parse the pmml file(s)
 Pass on the Feature Set to the model(s)
 Run the ‘predict’ function which returns the output of the model(s)
16
Model scoringusing OpenScoring.iolibrary
Control stream
Data stream
Score
Feature sets
model
scoring
Supportedmodels
17
Supported models(*)
Association rules Regression
Cluster model Rule set
General regression Scorecard
Naive Bayes Support Vector Machine
k-Nearestneighbours Tree model
Neural network Ensemble model
(*) supported models by http://openscoring.io/
Goals
18
Use of various types of models
One codebase, SaaS Deployment model
Pre-processor, Decoupled architecture
Make changes instantly (no downtime)
Multiple domains
Market leaders Benelux
Growth markets
Commercial Banking
Challengers
One Bank Strategy
19
How flexible is this architecture?
20
Feature extraction
&
Model scoring
Amount = “42,00”
Amountincents = 4200
Amount = 42.00
Decoupled architecture
21
Feature extraction
&
Model scoring
Pre-
Processor
Busines
s events
Amount = “42.00”
Amountincents = 4200
Amount = 42.00
Amountincents = 4200
Goals
22
Use of various types of models
One codebase, SaaS Deployment model
Make changes instantly (no downtime)
 Use case
 Feature extraction
 Enriching streams
 End user tooling
 Demo
Multiple domains
• Your phone with the banking app installed is stolen
• Limit on the banking app is 1.000,-
• Funds are transferred from your account (A) to a mule account (B)
Use case
23
Model features and model output
24
Amount > 500
NrOf Trxs Last 1h
First Trx <24h ago
Model
Alert || OK
Stream with stateless operators
25
A
B
1000
Ev.1
Model
scoring
Amount, Unknown, PrevTrxs
PMM
L
FeX
(1000, ?, ?)
Feature
extraction
Stream with stateful operators
26
STATE
A
B
1000
Ev.1
A
B
1000
Ev.2
Model
scoring
Alert ||
OK
Alert ||
OK
Key Value
(A,B, FirstTrx) Ev.1
(A,B, HistoricalTrxs) ev11000
Amount, Unknown, PrevTrxs
PMM
L
FeX
(1000, true, 1)
Key Value
(A,B, FirstTrx) Ev.1
(A,B, HistoricalTrxs) ev11000, ev21000
Amount, Unknown, PrevTrxs
(1000, true, 0)
How to perform aggregate functions on a stream?
27
Average amount last week: € 37,04
Max amount last month: € 834,12
Average amount last week: € 37,04
A
B
IP
1000
Ev.1
192.x.x.4, …….
192.x.x.3, 192.x.x.7
192.x.x.2, 192.x.x.6
192.x.x.1, 192.x.x.5
Aggregation
step
Calculating
features
Enriching the stream based on multiple keys
28
Split
A A’
A
B
IP
1000
Ev.1
B
A.
B
I
P
B’
A.B’
IP’
3542321
3542321
3542321
3542321
3542321
A,E,I ..
B D,F ..
C G, H
..
J, K ..
Accounts are distributed across the task managers
(A.B’,
1000)
Aggregating and model scoring
29
A
B
IP
1000
Ev.1
1. Amount
2. (A.B).FirstTr
x
3. (A.B).NrTrxs
A
B
IP
1000
Ev.1
A.B’
B’
(B’)
1. B’
1. IP’
2. ….
Aggregation Model Scoring
A DSL is a domain specific language. We use it to define the
behaviour of our operators.
 The persist rules (which data to store within state)
 Feature calculation rules
 Model definition rules
Domain Specific Language (DSL)
30
Definition instead of code - Persist rule
31
history[double, 4weeks,100] @(sourceAccntNr.destAccntNr).Trxs := $amount
NrOf Trxs Last 1h
count(between @(sourceAccntNr.destAccntNr).Trxs, $eventtime,$eventtime-1hour));
First Trx A to B <24h
@(sourceAccntNr.destAccntNr).FirstUsed >= $eventtime-24hours;
Feature Calculation rules
32
Model creation
HDFS
offline
Model execution
Streaming
platform
online
Creating models offline, scoring online
33
<PMML />
{PFA}
Portable model
DSL
Data scientist
with offline
tooling
Control streams
Split Fex
&
Model
scoring
Streaming in the defintions
35
Broad
cast
DSL files
 Model definitions
 Feature calculation rules Persist rules
Demo
Goals
37
Use of various types of models
One codebase, SaaS Deployment model
Make changes instantly (no downtime)
Multiple domains
We have built a feature-extraction engine and used that to make a
Fraud-Risk Engine
Can we also build this?….
 Customer Notifications?
 Calculating RFQ’s for Bond Prices?
 Product Fullfilment engine?
 Other?
Multiple domains – ponder on this
38
Take aways
39
Decoupled
architecture
with preprocessor
Enriching events
with multiple keys
End users
making changes
Multiple domain

Contenu connexe

Tendances

Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data StreamsSujaAldrin
 
The Real Cost of Slow Time vs Downtime
The Real Cost of Slow Time vs DowntimeThe Real Cost of Slow Time vs Downtime
The Real Cost of Slow Time vs DowntimeRadware
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB
 
Oracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration ConceptsOracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration ConceptsFumiko Yamashita
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotXiang Fu
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
CISA DOMAIN 2 Governance & Management of IT
CISA DOMAIN 2 Governance & Management of ITCISA DOMAIN 2 Governance & Management of IT
CISA DOMAIN 2 Governance & Management of ITShivamSharma909
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Credit Risk Management ppt
Credit Risk Management pptCredit Risk Management ppt
Credit Risk Management pptSneha Salian
 
Information technology risks
Information technology risksInformation technology risks
Information technology riskssalman butt
 
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Flink Forward
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Asynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka StreamsAsynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka StreamsJohan Andrén
 
Chapter 12 - Operational risk management
Chapter 12 - Operational risk managementChapter 12 - Operational risk management
Chapter 12 - Operational risk managementQuan Risk
 
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaTop 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaKai Wähner
 
Machine Learning in Banking Sector
Machine Learning in Banking SectorMachine Learning in Banking Sector
Machine Learning in Banking SectorKnoldus Inc.
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 

Tendances (20)

Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
 
The Real Cost of Slow Time vs Downtime
The Real Cost of Slow Time vs DowntimeThe Real Cost of Slow Time vs Downtime
The Real Cost of Slow Time vs Downtime
 
MongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema DesignMongoDB for Time Series Data: Schema Design
MongoDB for Time Series Data: Schema Design
 
Oracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration ConceptsOracle GoldenGate Demo and Data Integration Concepts
Oracle GoldenGate Demo and Data Integration Concepts
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
 
Apache Flink Hands On
Apache Flink Hands OnApache Flink Hands On
Apache Flink Hands On
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
CISA DOMAIN 2 Governance & Management of IT
CISA DOMAIN 2 Governance & Management of ITCISA DOMAIN 2 Governance & Management of IT
CISA DOMAIN 2 Governance & Management of IT
 
Fraud Analytics
Fraud AnalyticsFraud Analytics
Fraud Analytics
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Credit Risk Management ppt
Credit Risk Management pptCredit Risk Management ppt
Credit Risk Management ppt
 
Information technology risks
Information technology risksInformation technology risks
Information technology risks
 
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Asynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka StreamsAsynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka Streams
 
Chapter 12 - Operational risk management
Chapter 12 - Operational risk managementChapter 12 - Operational risk management
Chapter 12 - Operational risk management
 
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaTop 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
 
Machine Learning in Banking Sector
Machine Learning in Banking SectorMachine Learning in Banking Sector
Machine Learning in Banking Sector
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 

Similaire à Flink Forward SF 2017: Erik de Nooij - StreamING models, how ING adds models at runtime to catch fraudsters

Visual basic 6.0
Visual basic 6.0Visual basic 6.0
Visual basic 6.0Aarti P
 
EclipseCon 2008: Fundamentals of the Eclipse Modeling Framework
EclipseCon 2008: Fundamentals of the Eclipse Modeling FrameworkEclipseCon 2008: Fundamentals of the Eclipse Modeling Framework
EclipseCon 2008: Fundamentals of the Eclipse Modeling FrameworkDave Steinberg
 
Machine learning on streams of data
Machine learning on streams of dataMachine learning on streams of data
Machine learning on streams of dataTomasz Sosiński
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Implementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCImplementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCjimfuller2009
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Spark Summit
 
WCF and WF in Framework 3.5
WCF and WF in Framework 3.5WCF and WF in Framework 3.5
WCF and WF in Framework 3.5ukdpe
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreDatabricks
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflowKeon Kim
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowDatabricks
 
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...Publicis Sapient Engineering
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkChester Chen
 
Performance is a Feature! at DDD 11
Performance is a Feature! at DDD 11Performance is a Feature! at DDD 11
Performance is a Feature! at DDD 11Matt Warren
 
Compiler Construction for DLX Processor
Compiler Construction for DLX Processor Compiler Construction for DLX Processor
Compiler Construction for DLX Processor Soham Kulkarni
 
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresAdvanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresIvo Andreev
 
databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringMohamed MEJDOUBI
 
Understanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & ConfluentUnderstanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & Confluentconfluent
 

Similaire à Flink Forward SF 2017: Erik de Nooij - StreamING models, how ING adds models at runtime to catch fraudsters (20)

Visual basic 6.0
Visual basic 6.0Visual basic 6.0
Visual basic 6.0
 
OneTeam Media Server
OneTeam Media ServerOneTeam Media Server
OneTeam Media Server
 
EclipseCon 2008: Fundamentals of the Eclipse Modeling Framework
EclipseCon 2008: Fundamentals of the Eclipse Modeling FrameworkEclipseCon 2008: Fundamentals of the Eclipse Modeling Framework
EclipseCon 2008: Fundamentals of the Eclipse Modeling Framework
 
Machine learning on streams of data
Machine learning on streams of dataMachine learning on streams of data
Machine learning on streams of data
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Implementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoCImplementing the Genetic Algorithm in XSLT: PoC
Implementing the Genetic Algorithm in XSLT: PoC
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
 
WCF and WF in Framework 3.5
WCF and WF in Framework 3.5WCF and WF in Framework 3.5
WCF and WF in Framework 3.5
 
KFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature StoreKFServing, Model Monitoring with Apache Spark and a Feature Store
KFServing, Model Monitoring with Apache Spark and a Feature Store
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflow
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
 
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...
XebiCon'17 : AxonFramework @ SGCIB (our experience) : (CQRS, Eventsourcing, A...
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
 
Performance is a Feature! at DDD 11
Performance is a Feature! at DDD 11Performance is a Feature! at DDD 11
Performance is a Feature! at DDD 11
 
Compiler Construction for DLX Processor
Compiler Construction for DLX Processor Compiler Construction for DLX Processor
Compiler Construction for DLX Processor
 
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresAdvanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
 
Performance is a Feature!
Performance is a Feature!Performance is a Feature!
Performance is a Feature!
 
databricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineeringdatabricks ml flow demonstration using automatic features engineering
databricks ml flow demonstration using automatic features engineering
 
Understanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & ConfluentUnderstanding the TCO and ROI of Apache Kafka & Confluent
Understanding the TCO and ROI of Apache Kafka & Confluent
 

Plus de Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

Plus de Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Dernier

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 

Dernier (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 

Flink Forward SF 2017: Erik de Nooij - StreamING models, how ING adds models at runtime to catch fraudsters

  • 1. StreamING models Realtime model deployment of ML capabilities Erik de Nooij, IT Chapter Lead Fraud&Cybersec.
  • 2.  IT Chapter Lead within the Fraud & Cybersecurity department, based in Amsterdam  Before ING implemented Enterprise Software, mainly knowledge management and CRM related  Background in: Scala, Java, C# (MCSD), Tomcat, Websphere, Oracle, Cassandra and now….Flink https://www.linkedin.com/in/erik-de-nooij-93ab1a/ Erik.g.de.Nooij@ing.nl Who Am I? 2
  • 4. Worldwide  35 Million customers  51.000 Employees  Presence in over 40 countries Netherlands  9 Million Customers  Billion logins yearly on https://www.ing.nl  1 million transactions per day About ING 4 Market leaders Benelux Growth markets Commercial Banking Challengers The Netherlands
  • 5. Threats Individuals Small groups worldwide groups Organized crime Manual detection Rule based detection Model based detection Criminal organizationResponse Scanomaly detection Fake ID Skimming Phishing APT ? 2008 2010 2012 2014 2017 Threats related to fraud & cybersecurity 5
  • 6. Carbanak APT (Advanced Persistent Threat) 6  This started via a phishing email…
  • 7.  Support various types of (ML) models  Tools to create models versus scoring models  One codebase, SaaS deployment model  Make changes instantly (no downtime)  Multiple domains Goals 7
  • 8.  Support various types of (ML) models  One codebase, SaaS deployment model  Pre-processor, Decoupled architecture  Make changes instantly (no downtime)  Multiple domains Goals 8
  • 9.  Support various types of (ML) models  One codebase, SaaS deployment model  Make changes instantly (no downtime)  Use case  Feature extraction  Enriching streams  End user tooling  Demo  Multiple domains Goals 9
  • 10.  Support various types of (ML) models  One codebase, SaaS deployment model  Make changes instantly (no downtime)  Multiple domains  examples Goals 10
  • 12. Model creation HDFS offline Model execution Streaming platform online Creating models offline, scoring online 12 <PMML /> {PFA} Portable model
  • 13.  The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format Predictive Model Markup Language (PMML) 13 <SimpleRule score="Alert" weight="1.0"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="field1" operator="greaterThan" value="500"/> <SimplePredicate field="field2" operator="equal" value="1"/> <SimplePredicate field="field3" operator="greaterThan" value="1"/> </CompoundPredicate> </SimpleRule> if field1 > 500 AND field2 == 1 AND field3 > 1
  • 14.  The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format Predictive Model Markup Language (PMML) 14
  • 16.  Parse the pmml file(s)  Pass on the Feature Set to the model(s)  Run the ‘predict’ function which returns the output of the model(s) 16 Model scoringusing OpenScoring.iolibrary Control stream Data stream Score Feature sets model scoring
  • 17. Supportedmodels 17 Supported models(*) Association rules Regression Cluster model Rule set General regression Scorecard Naive Bayes Support Vector Machine k-Nearestneighbours Tree model Neural network Ensemble model (*) supported models by http://openscoring.io/
  • 18. Goals 18 Use of various types of models One codebase, SaaS Deployment model Pre-processor, Decoupled architecture Make changes instantly (no downtime) Multiple domains
  • 19. Market leaders Benelux Growth markets Commercial Banking Challengers One Bank Strategy 19
  • 20. How flexible is this architecture? 20 Feature extraction & Model scoring Amount = “42,00” Amountincents = 4200 Amount = 42.00
  • 21. Decoupled architecture 21 Feature extraction & Model scoring Pre- Processor Busines s events Amount = “42.00” Amountincents = 4200 Amount = 42.00 Amountincents = 4200
  • 22. Goals 22 Use of various types of models One codebase, SaaS Deployment model Make changes instantly (no downtime)  Use case  Feature extraction  Enriching streams  End user tooling  Demo Multiple domains
  • 23. • Your phone with the banking app installed is stolen • Limit on the banking app is 1.000,- • Funds are transferred from your account (A) to a mule account (B) Use case 23
  • 24. Model features and model output 24 Amount > 500 NrOf Trxs Last 1h First Trx <24h ago Model Alert || OK
  • 25. Stream with stateless operators 25 A B 1000 Ev.1 Model scoring Amount, Unknown, PrevTrxs PMM L FeX (1000, ?, ?) Feature extraction
  • 26. Stream with stateful operators 26 STATE A B 1000 Ev.1 A B 1000 Ev.2 Model scoring Alert || OK Alert || OK Key Value (A,B, FirstTrx) Ev.1 (A,B, HistoricalTrxs) ev11000 Amount, Unknown, PrevTrxs PMM L FeX (1000, true, 1) Key Value (A,B, FirstTrx) Ev.1 (A,B, HistoricalTrxs) ev11000, ev21000 Amount, Unknown, PrevTrxs (1000, true, 0)
  • 27. How to perform aggregate functions on a stream? 27 Average amount last week: € 37,04 Max amount last month: € 834,12 Average amount last week: € 37,04
  • 28. A B IP 1000 Ev.1 192.x.x.4, ……. 192.x.x.3, 192.x.x.7 192.x.x.2, 192.x.x.6 192.x.x.1, 192.x.x.5 Aggregation step Calculating features Enriching the stream based on multiple keys 28 Split A A’ A B IP 1000 Ev.1 B A. B I P B’ A.B’ IP’ 3542321 3542321 3542321 3542321 3542321 A,E,I .. B D,F .. C G, H .. J, K .. Accounts are distributed across the task managers
  • 29. (A.B’, 1000) Aggregating and model scoring 29 A B IP 1000 Ev.1 1. Amount 2. (A.B).FirstTr x 3. (A.B).NrTrxs A B IP 1000 Ev.1 A.B’ B’ (B’) 1. B’ 1. IP’ 2. …. Aggregation Model Scoring
  • 30. A DSL is a domain specific language. We use it to define the behaviour of our operators.  The persist rules (which data to store within state)  Feature calculation rules  Model definition rules Domain Specific Language (DSL) 30
  • 31. Definition instead of code - Persist rule 31 history[double, 4weeks,100] @(sourceAccntNr.destAccntNr).Trxs := $amount
  • 32. NrOf Trxs Last 1h count(between @(sourceAccntNr.destAccntNr).Trxs, $eventtime,$eventtime-1hour)); First Trx A to B <24h @(sourceAccntNr.destAccntNr).FirstUsed >= $eventtime-24hours; Feature Calculation rules 32
  • 33. Model creation HDFS offline Model execution Streaming platform online Creating models offline, scoring online 33 <PMML /> {PFA} Portable model DSL Data scientist with offline tooling
  • 35. Split Fex & Model scoring Streaming in the defintions 35 Broad cast DSL files  Model definitions  Feature calculation rules Persist rules
  • 36. Demo
  • 37. Goals 37 Use of various types of models One codebase, SaaS Deployment model Make changes instantly (no downtime) Multiple domains
  • 38. We have built a feature-extraction engine and used that to make a Fraud-Risk Engine Can we also build this?….  Customer Notifications?  Calculating RFQ’s for Bond Prices?  Product Fullfilment engine?  Other? Multiple domains – ponder on this 38
  • 39. Take aways 39 Decoupled architecture with preprocessor Enriching events with multiple keys End users making changes Multiple domain