SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Storm at
Picture https://www.flickr.com/photos/silentmind8/15865860242 by silentmind8 under CC BY 2.0 http://creativecommons.org/licenses/by/2.0/
Forter
• We detect fraud
• A lot of data is collected
• New data can introduce new data sources
• At transaction time, we do our magic. Fast.
• We deny less
What’s Storm?
• Streaming/data-pipeline infrastructure
• What’s a pipeline?
• “Topology” driven flow, static
• Written over JVM and also supports Python and
Node.js
• Easy clustering
• Apache top level project, large community
Storm Lingo
• Tuples
• The basic data transfer object in storm. Basically a dictionary (key->val).
• Spouts
• Entry points into the pipe. This is where data comes from.
• Bolts
• Components that can transform and route tuples
• Joins
• Joins are where async branches of the topology meet and join
• Streams
• Streams allow for flow control in the topology
System challenges
• Latency should be determined by business needs -
flexible per customer (300ms - customers who just don’t
care)
• Data dependencies in decision part can get very complex
• Getting data can be slow, especially 3rd party
• Data scientists write in Python
• Should be scaleable, because we’re ever growing
• Should be very granularly monitored
Bird’s eye view
• Two systems:
• System 1: data prefetching & preparing
• System 2: decision engine, must have all
available data handy at TX time
System 1: high
throughput pipeline
• Stream Batching
• Prefetching / Preparing
• Common use case, lots of competitors
System 2: low latency
decision
• Dedicated everything
• Complex dependency graph
• Less common, fewer players
System 1
High Throughput
Cache and cache layering
• Storm constructs make it easy to tweak caches,
add enrichment steps transparently
• Different enrichment operations may require
different execution power
• Each operation can be replaced by a sub-topology
- layering of cache levels
• Field grouping allows the ability to maintain state in
components - local cache or otherwise
Maintain a stored state
• Many events coming in, some cause a state to
change
• State of a working set is saved in memory
• New/old states are fetched from an external data
source
• Sate updates are saved immediately
• State machine is scalable - again, field grouping
And the rest…
• Batching content for writing (Storm’s tick tuples)
• Aggregating events in memory
• Throttling/Circuit-breaking external calls
System 2: Low Latency
Unique Challenges
• Scaling. Resources need to be very dedicated,
parallelizing is bad
• Join logic is much stricter, with short timeouts
• Data validity is crucial for the stream routing
• Error handling
• Component graph is immense and hard to contain
mentally - especially considering the delicate time
window configurations.
Scalability
• Each topology is built to handle a fixed number of
parallel TXs. Storm’s max-spout-pending
• Each topology atomically polls a queue
• Trying to keep as much of the logic in the same
process to reduce network and serialization costs
• Latency is the only measure
Joining and errors
• Waiting is not an option
• Tick tuples no good, break the single
thread illusion
• Static topologies are easy to analyze and
edit in runtime, and intervene
• Fallback streams are an elegant solution
to the problem, preventing developers
from explicitly defining escape routes
• Also allow for “try->finally” semantics
Multilang
• Storm allows running bolt processes (shell-bolt)
with the builtin capability of communicating through
standard i/o
• Not hugely scalable, but works
• Implemented are: Node.js (our contribution) and
Python
• We use for legacy and to keep data scientists
happy
Data Validity
• Wrapping the bolts, we implemented contracts for
outputs
• Java POJOs with Hibernate Validator
• Contracts allow us “hard-typing” the links in the
topologies
• Also help minimize data flow, especially to shell-bolts
• Checkout storm-data-contracts on github
Managing Complexity
• Complexity of the data dependencies is maintained
by literally drawing it.
• Nimbus REST APIs offer access to the topology
layout
• Timing complexity reduced by synchronizing the
joins to a shared point-in-time. Still pretty complex.
• Proves better than our previous iterative solution
Monitoring
• Nimbus metrics give out averages - not
good enough
• Reimann used to efficiently monitor
latencies for every tuple in the system
• Inherent low latency monitoring issue:
CPU utilization monitoring
• More at Itai Frenkel’s lecture
Questions?
Contact info:
Re’em Bensimhon
reem@forter.com / reem.bs@gmail.com
linkedin.com/in/bensimhon
twitter: @reembs

Contenu connexe

Tendances

Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architectureMatteo Merli
 
Coap based application for android phones-end
Coap based application for android phones-endCoap based application for android phones-end
Coap based application for android phones-endMd Syed Ahamad
 
Functional? Reactive? Why?
Functional? Reactive? Why?Functional? Reactive? Why?
Functional? Reactive? Why?Aleksandr Tavgen
 
Openstack vm live migration
Openstack vm live migrationOpenstack vm live migration
Openstack vm live migrationDeepak Mane
 
Concurrency Learning From Jdk Source
Concurrency Learning From Jdk SourceConcurrency Learning From Jdk Source
Concurrency Learning From Jdk SourceKaniska Mandal
 
Using OVSDB and OpenFlow southbound plugins
Using OVSDB and OpenFlow southbound pluginsUsing OVSDB and OpenFlow southbound plugins
Using OVSDB and OpenFlow southbound pluginsOpenDaylight
 
Hazelcast Distributed Lock
Hazelcast Distributed LockHazelcast Distributed Lock
Hazelcast Distributed LockJadson Santos
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scaleMatteo Merli
 
Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...
Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...
Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...HostedbyConfluent
 
Monitoring Large-scale Cloud Infrastructures with OpenNebula
Monitoring Large-scale Cloud Infrastructures with OpenNebulaMonitoring Large-scale Cloud Infrastructures with OpenNebula
Monitoring Large-scale Cloud Infrastructures with OpenNebulaNETWAYS
 
Load balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed systemLoad balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed systemAchal Gupta
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarSijie Guo
 
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...Codership Oy - Creators of Galera Cluster
 
PLNOG 13: Michał Dubiel: OpenContrail software architecture
PLNOG 13: Michał Dubiel: OpenContrail software architecturePLNOG 13: Michał Dubiel: OpenContrail software architecture
PLNOG 13: Michał Dubiel: OpenContrail software architecturePROIDEA
 
Real time operating systems (rtos) concepts 5
Real time operating systems (rtos) concepts 5Real time operating systems (rtos) concepts 5
Real time operating systems (rtos) concepts 5Abu Bakr Ramadan
 
Blockchain sidechain brief
Blockchain sidechain briefBlockchain sidechain brief
Blockchain sidechain briefmanmohanpanda
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackgjdevos
 
Introducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka StreamsIntroducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka Streamsconfluent
 

Tendances (20)

Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architecture
 
Coap based application for android phones-end
Coap based application for android phones-endCoap based application for android phones-end
Coap based application for android phones-end
 
Functional? Reactive? Why?
Functional? Reactive? Why?Functional? Reactive? Why?
Functional? Reactive? Why?
 
Openstack vm live migration
Openstack vm live migrationOpenstack vm live migration
Openstack vm live migration
 
Concurrency Learning From Jdk Source
Concurrency Learning From Jdk SourceConcurrency Learning From Jdk Source
Concurrency Learning From Jdk Source
 
Using OVSDB and OpenFlow southbound plugins
Using OVSDB and OpenFlow southbound pluginsUsing OVSDB and OpenFlow southbound plugins
Using OVSDB and OpenFlow southbound plugins
 
Hazelcast Distributed Lock
Hazelcast Distributed LockHazelcast Distributed Lock
Hazelcast Distributed Lock
 
Real time database
Real time databaseReal time database
Real time database
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
 
Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...
Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...
Asynchronous Transaction Processing With Kafka as a Single Source of Truth - ...
 
Monitoring Large-scale Cloud Infrastructures with OpenNebula
Monitoring Large-scale Cloud Infrastructures with OpenNebulaMonitoring Large-scale Cloud Infrastructures with OpenNebula
Monitoring Large-scale Cloud Infrastructures with OpenNebula
 
Load balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed systemLoad balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed system
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache Pulsar
 
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
Choosing between Codership's MySQL Galera, MariaDB Galera Cluster and Percona...
 
PLNOG 13: Michał Dubiel: OpenContrail software architecture
PLNOG 13: Michał Dubiel: OpenContrail software architecturePLNOG 13: Michał Dubiel: OpenContrail software architecture
PLNOG 13: Michał Dubiel: OpenContrail software architecture
 
Real time operating systems (rtos) concepts 5
Real time operating systems (rtos) concepts 5Real time operating systems (rtos) concepts 5
Real time operating systems (rtos) concepts 5
 
Blockchain sidechain brief
Blockchain sidechain briefBlockchain sidechain brief
Blockchain sidechain brief
 
How to run a bank on Apache CloudStack
How to run a bank on Apache CloudStackHow to run a bank on Apache CloudStack
How to run a bank on Apache CloudStack
 
Mini-Training: Message Brokers
Mini-Training: Message BrokersMini-Training: Message Brokers
Mini-Training: Message Brokers
 
Introducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka StreamsIntroducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka Streams
 

En vedette

A Practical Guide to Post-EMV Card Not Present Fraud
A Practical Guide to Post-EMV Card Not Present FraudA Practical Guide to Post-EMV Card Not Present Fraud
A Practical Guide to Post-EMV Card Not Present FraudForter
 
StatsCraft 2015: Monitoring using riemann - Moshe Zada
StatsCraft 2015: Monitoring using riemann - Moshe ZadaStatsCraft 2015: Monitoring using riemann - Moshe Zada
StatsCraft 2015: Monitoring using riemann - Moshe ZadaStatsCraft
 
Scala does the Catwalk
Scala does the CatwalkScala does the Catwalk
Scala does the CatwalkAriel Kogan
 
Elasticsearch na prática
Elasticsearch na práticaElasticsearch na prática
Elasticsearch na práticaBreno Oliveira
 
JavaScript TDD
JavaScript TDDJavaScript TDD
JavaScript TDDUri Lavi
 
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...Uri Cohen
 
What's the Magic in LinkedIn?
What's the Magic in LinkedIn?What's the Magic in LinkedIn?
What's the Magic in LinkedIn?Efrat Fenigson
 
Not your dad's h base new
Not your dad's h base newNot your dad's h base new
Not your dad's h base newYaniv Rodenski
 
Scrum. software engineering seminar
Scrum. software engineering seminarScrum. software engineering seminar
Scrum. software engineering seminarAlexandr Gavrishev
 
טלפונים חכמים ואתם
טלפונים חכמים ואתםטלפונים חכמים ואתם
טלפונים חכמים ואתםIdan ofek
 
1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)Yoav Francis
 
Guice - dependency injection framework
Guice - dependency injection frameworkGuice - dependency injection framework
Guice - dependency injection frameworkEvgeny Barabanov
 
How does the Internet Work?
How does the Internet Work?How does the Internet Work?
How does the Internet Work?Dina Goldshtein
 
מכתב המלצה - לירן פרידמן
מכתב המלצה - לירן פרידמןמכתב המלצה - לירן פרידמן
מכתב המלצה - לירן פרידמןLiran Fridman
 
Lessons Learned with Unity and WebGL
Lessons Learned with Unity and WebGLLessons Learned with Unity and WebGL
Lessons Learned with Unity and WebGLLior Tal
 
How fast ist it really? Benchmarking in practice
How fast ist it really? Benchmarking in practiceHow fast ist it really? Benchmarking in practice
How fast ist it really? Benchmarking in practiceTobias Pfeiffer
 
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...Gilad Garon
 
Optimizing DevOps strategy in a large enterprise
Optimizing DevOps strategy in a large enterpriseOptimizing DevOps strategy in a large enterprise
Optimizing DevOps strategy in a large enterpriseEyal Edri
 

En vedette (20)

A Practical Guide to Post-EMV Card Not Present Fraud
A Practical Guide to Post-EMV Card Not Present FraudA Practical Guide to Post-EMV Card Not Present Fraud
A Practical Guide to Post-EMV Card Not Present Fraud
 
StatsCraft 2015: Monitoring using riemann - Moshe Zada
StatsCraft 2015: Monitoring using riemann - Moshe ZadaStatsCraft 2015: Monitoring using riemann - Moshe Zada
StatsCraft 2015: Monitoring using riemann - Moshe Zada
 
HagayOnn_EnglishCV_ 2016
HagayOnn_EnglishCV_ 2016HagayOnn_EnglishCV_ 2016
HagayOnn_EnglishCV_ 2016
 
Scala does the Catwalk
Scala does the CatwalkScala does the Catwalk
Scala does the Catwalk
 
Elasticsearch na prática
Elasticsearch na práticaElasticsearch na prática
Elasticsearch na prática
 
JavaScript TDD
JavaScript TDDJavaScript TDD
JavaScript TDD
 
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
 
What's the Magic in LinkedIn?
What's the Magic in LinkedIn?What's the Magic in LinkedIn?
What's the Magic in LinkedIn?
 
Not your dad's h base new
Not your dad's h base newNot your dad's h base new
Not your dad's h base new
 
Scrum. software engineering seminar
Scrum. software engineering seminarScrum. software engineering seminar
Scrum. software engineering seminar
 
טלפונים חכמים ואתם
טלפונים חכמים ואתםטלפונים חכמים ואתם
טלפונים חכמים ואתם
 
Joy of scala
Joy of scalaJoy of scala
Joy of scala
 
1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)1953 and all that. A tale of two sciences (Kitcher, 1984)
1953 and all that. A tale of two sciences (Kitcher, 1984)
 
Guice - dependency injection framework
Guice - dependency injection frameworkGuice - dependency injection framework
Guice - dependency injection framework
 
How does the Internet Work?
How does the Internet Work?How does the Internet Work?
How does the Internet Work?
 
מכתב המלצה - לירן פרידמן
מכתב המלצה - לירן פרידמןמכתב המלצה - לירן פרידמן
מכתב המלצה - לירן פרידמן
 
Lessons Learned with Unity and WebGL
Lessons Learned with Unity and WebGLLessons Learned with Unity and WebGL
Lessons Learned with Unity and WebGL
 
How fast ist it really? Benchmarking in practice
How fast ist it really? Benchmarking in practiceHow fast ist it really? Benchmarking in practice
How fast ist it really? Benchmarking in practice
 
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...
Continuous Deployment into the Unknown with Artifactory, Bintray, Docker and ...
 
Optimizing DevOps strategy in a large enterprise
Optimizing DevOps strategy in a large enterpriseOptimizing DevOps strategy in a large enterprise
Optimizing DevOps strategy in a large enterprise
 

Similaire à Storm at Forter

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Open west 2015 talk ben coverston
Open west 2015 talk ben coverstonOpen west 2015 talk ben coverston
Open west 2015 talk ben coverstonbcoverston
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Instrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in productionInstrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in productionbcantrill
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Connecting Stuff to Azure (IoT)
Connecting Stuff to Azure (IoT)Connecting Stuff to Azure (IoT)
Connecting Stuff to Azure (IoT)Mark Simms
 
Hadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureHadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureInSemble
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database SystemsDaniel Abadi
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
Journey to Blockchain Scalability: A Close Look at Complete Scaling Solutions...
Journey to Blockchain Scalability: A Close Look at Complete Scaling Solutions...Journey to Blockchain Scalability: A Close Look at Complete Scaling Solutions...
Journey to Blockchain Scalability: A Close Look at Complete Scaling Solutions...Zeeve
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceAnil Nair
 
From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...P. Taylor Goetz
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 

Similaire à Storm at Forter (20)

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Open west 2015 talk ben coverston
Open west 2015 talk ben coverstonOpen west 2015 talk ben coverston
Open west 2015 talk ben coverston
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Instrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in productionInstrumenting the real-time web: Node.js in production
Instrumenting the real-time web: Node.js in production
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Connecting Stuff to Azure (IoT)
Connecting Stuff to Azure (IoT)Connecting Stuff to Azure (IoT)
Connecting Stuff to Azure (IoT)
 
Hadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureHadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming Architecture
 
The Power of Determinism in Database Systems
The Power of Determinism in Database SystemsThe Power of Determinism in Database Systems
The Power of Determinism in Database Systems
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Journey to Blockchain Scalability: A Close Look at Complete Scaling Solutions...
Journey to Blockchain Scalability: A Close Look at Complete Scaling Solutions...Journey to Blockchain Scalability: A Close Look at Complete Scaling Solutions...
Journey to Blockchain Scalability: A Close Look at Complete Scaling Solutions...
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC Performance
 
From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 
From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...From Device to Data Center to Insights: Architectural Considerations for the ...
From Device to Data Center to Insights: Architectural Considerations for the ...
 
Software defined networking: Primer
Software defined networking: PrimerSoftware defined networking: Primer
Software defined networking: Primer
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
 

Dernier

1- Practice occupational health and safety procedures.pptx
1- Practice occupational health and safety procedures.pptx1- Practice occupational health and safety procedures.pptx
1- Practice occupational health and safety procedures.pptxMel Paras
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labsamber724300
 
Introduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxIntroduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxPoonam60376
 
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...KrishnaveniKrishnara1
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
input buffering in lexical analysis in CD
input buffering in lexical analysis in CDinput buffering in lexical analysis in CD
input buffering in lexical analysis in CDHeadOfDepartmentComp1
 
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...Ayisha586983
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...shreenathji26
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organizationchnrketan
 
Substation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRHSubstation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRHbirinder2
 
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Amil baba
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
Indian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfIndian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfalokitpathak01
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunicationnovrain7111
 
Machine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdfMachine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdfadeyimikaipaye
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 

Dernier (20)

1- Practice occupational health and safety procedures.pptx
1- Practice occupational health and safety procedures.pptx1- Practice occupational health and safety procedures.pptx
1- Practice occupational health and safety procedures.pptx
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labs
 
Introduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxIntroduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptx
 
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
input buffering in lexical analysis in CD
input buffering in lexical analysis in CDinput buffering in lexical analysis in CD
input buffering in lexical analysis in CD
 
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...
Submerged Combustion, Explosion Flame Combustion, Pulsating Combustion, and E...
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organization
 
Substation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRHSubstation Automation SCADA and Gateway Solutions by BRH
Substation Automation SCADA and Gateway Solutions by BRH
 
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
Versatile Engineering Construction Firms
Versatile Engineering Construction FirmsVersatile Engineering Construction Firms
Versatile Engineering Construction Firms
 
Indian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdfIndian Tradition, Culture & Societies.pdf
Indian Tradition, Culture & Societies.pdf
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunication
 
Machine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdfMachine Learning 5G Federated Learning.pdf
Machine Learning 5G Federated Learning.pdf
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 

Storm at Forter

  • 1. Storm at Picture https://www.flickr.com/photos/silentmind8/15865860242 by silentmind8 under CC BY 2.0 http://creativecommons.org/licenses/by/2.0/
  • 2. Forter • We detect fraud • A lot of data is collected • New data can introduce new data sources • At transaction time, we do our magic. Fast. • We deny less
  • 3. What’s Storm? • Streaming/data-pipeline infrastructure • What’s a pipeline? • “Topology” driven flow, static • Written over JVM and also supports Python and Node.js • Easy clustering • Apache top level project, large community
  • 4. Storm Lingo • Tuples • The basic data transfer object in storm. Basically a dictionary (key->val). • Spouts • Entry points into the pipe. This is where data comes from. • Bolts • Components that can transform and route tuples • Joins • Joins are where async branches of the topology meet and join • Streams • Streams allow for flow control in the topology
  • 5. System challenges • Latency should be determined by business needs - flexible per customer (300ms - customers who just don’t care) • Data dependencies in decision part can get very complex • Getting data can be slow, especially 3rd party • Data scientists write in Python • Should be scaleable, because we’re ever growing • Should be very granularly monitored
  • 6. Bird’s eye view • Two systems: • System 1: data prefetching & preparing • System 2: decision engine, must have all available data handy at TX time
  • 7. System 1: high throughput pipeline • Stream Batching • Prefetching / Preparing • Common use case, lots of competitors
  • 8. System 2: low latency decision • Dedicated everything • Complex dependency graph • Less common, fewer players
  • 10. Cache and cache layering • Storm constructs make it easy to tweak caches, add enrichment steps transparently • Different enrichment operations may require different execution power • Each operation can be replaced by a sub-topology - layering of cache levels • Field grouping allows the ability to maintain state in components - local cache or otherwise
  • 11.
  • 12. Maintain a stored state • Many events coming in, some cause a state to change • State of a working set is saved in memory • New/old states are fetched from an external data source • Sate updates are saved immediately • State machine is scalable - again, field grouping
  • 13.
  • 14. And the rest… • Batching content for writing (Storm’s tick tuples) • Aggregating events in memory • Throttling/Circuit-breaking external calls
  • 15. System 2: Low Latency
  • 16. Unique Challenges • Scaling. Resources need to be very dedicated, parallelizing is bad • Join logic is much stricter, with short timeouts • Data validity is crucial for the stream routing • Error handling • Component graph is immense and hard to contain mentally - especially considering the delicate time window configurations.
  • 17. Scalability • Each topology is built to handle a fixed number of parallel TXs. Storm’s max-spout-pending • Each topology atomically polls a queue • Trying to keep as much of the logic in the same process to reduce network and serialization costs • Latency is the only measure
  • 18. Joining and errors • Waiting is not an option • Tick tuples no good, break the single thread illusion • Static topologies are easy to analyze and edit in runtime, and intervene • Fallback streams are an elegant solution to the problem, preventing developers from explicitly defining escape routes • Also allow for “try->finally” semantics
  • 19. Multilang • Storm allows running bolt processes (shell-bolt) with the builtin capability of communicating through standard i/o • Not hugely scalable, but works • Implemented are: Node.js (our contribution) and Python • We use for legacy and to keep data scientists happy
  • 20. Data Validity • Wrapping the bolts, we implemented contracts for outputs • Java POJOs with Hibernate Validator • Contracts allow us “hard-typing” the links in the topologies • Also help minimize data flow, especially to shell-bolts • Checkout storm-data-contracts on github
  • 21. Managing Complexity • Complexity of the data dependencies is maintained by literally drawing it. • Nimbus REST APIs offer access to the topology layout • Timing complexity reduced by synchronizing the joins to a shared point-in-time. Still pretty complex. • Proves better than our previous iterative solution
  • 22. Monitoring • Nimbus metrics give out averages - not good enough • Reimann used to efficiently monitor latencies for every tuple in the system • Inherent low latency monitoring issue: CPU utilization monitoring • More at Itai Frenkel’s lecture
  • 23. Questions? Contact info: Re’em Bensimhon reem@forter.com / reem.bs@gmail.com linkedin.com/in/bensimhon twitter: @reembs