SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
Real-Time stream computation on
graphs using Storm, Neo4j and
Python
Sonal Raj
http://www.sonalraj.com
Presented at Pycon India 2013
Bangalore, India
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
1
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Introduction
2
• With data multiplying each day, storage and
knowledge extraction is a major concern.
• Social Data Analysis, Business Intelligence
• Constraints of Real Time and Fault-Tolerant
Processing
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
. . In this Talk
3
• A look at storm as a distributed
computation Framework
• Neo4J as a NoSQL graph database
• Some Cool Pictures
• What are we trying to achieve ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Disclaimer !
4
• This talk presents an overview of Storm and
Neo4J . . Less dirty details 
• I’m going to go pretty fast . . . Please hang on.
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
5
Part -1
Storm – The Hadoop
of Real Time
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Don’t we have Hadoop ?
6
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
7
STORM
HADOOP
• Distributed
Processing
• Fault Tolerance
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
8
HADOOP
• Large but Finite Jobs
• Processes a Lot of Data at Once
• High Latency
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
9
HADOOP
• Large but Finite Jobs
• Processes a Lot of Data at Once
• High Latency
Storm
Infinite Computations called Topologies
Process Infinite Streams of data one-tuple-at-a-time
Low Latency
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
So, what Storm gives us . .
10
 Real-Time Computations
 Guaranteed data Processing
 Horizontal Scalability and Fault-Tolerance
 No intermediate message Brokers
 Higher Abstraction than Message Passing, so makes
sense !
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
11
Streams
Tuple Tuple Tuple Tuple Tuple
An unbounded sequence of Tuples
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
12
Streams
Tuple Tuple Tuple Tuple Tuple
An unbounded sequence of Tuples
So, what kind of
a tuple is this ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
13
Spouts
A source of Streams
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
14
Spouts
A source of Streams
But, what is the
source FOR the
spouts ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
15
Bolts
Computational units processing input
streams and producing new streams
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
16
Bolts
Computational units processing input
streams and producing new streams
Just 1 stream ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
17
Topologies
A network of spouts and bolts
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Is that it . . . ?
18
Tasks and Parallelism
A spout or bolt can execute
multiple tasks across the
cluster
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
19
[ ]Mr. Tuple
O Shoot, where
do I go now?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Groupings . . To the rescue of Mr. Tuple !
20
• Shuffle Grouping #pick a random task
• Fields Grouping #mod hashing on a
subset of tuple fields
• All Grouping #sends to all tasks
• Global Grouping #picks task with lowest
task id
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
21
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
22
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
If this were Hadoop
Job Tracker
Task Tracker
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
23
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
But it’s NOT Hadoop !
Co-ordinates
Everything
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Salient Features . .
24
• Storm > 0.7 supports Transactional Topologies
 Processes small batches of topologies
 If failure during commit, both batch+commit is
retried
• Storm guarantees message Processing using
acknowledgements
• Petrel by AirSage is a python wrapper for
Storm ; you can write and submit topologies in
Python.
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
25
Part -2
Neo4J – “Get Graphed”
26
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
This is how
Graph Data was
represented in
RDBMS.
27
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
ENTER, NOSQL DATABASES
28
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Types of NOSQL Databases
Graph
databases
Document
databases
Column-
Family
Key-Value
Stores
Data Complexity
DataSize
29
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Why NOSQL Databases
• Easily horizontally scalable
• Dynamic Schemas, Handle Unstructured data really
well.
• Excel in speed and volume
• Trade off in consistency for efficiency (except in
graph databases . . .We’ll see why  )
• Pleasure to code
• Free to use any query language ( even SQL ! )
• Downtime? What Downtime ?
30
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
The Property Graph Model of Graph Databases
• Core Abstractions
 Nodes
 Relationship between Nodes
 Properties of both
• Traversal Framework
High Performance Queries on connected datasets
• Bindings
REST, Gremlin, etc.
31
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Neo4J
• Fully ACID with rollbacks support (unbelievable!)
• Schema-less and Efficient storage of Semi Structured
Data
• Fast deep traversal instead of slow SQL queries that
span many table joins
• Whiteboard Friendly
• Very natural to express graph related problems with
traversals (recommendation engine, shortest path etc..)
32
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Neo4J Pythonized !
• Py2Neo is an excellent binding for Neo4J
• Accesses Neo4J using it’s RESTful API
• Still under development . . Features like labels yet to be
included !
33
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
So,Will Relational databases be Extinct ?
OOPS!
34
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Categories of Graphical Data
• Social Networks
• Citations
• Product Co-Purchasing
• Internet peer-to-peer
• Road Network and Map Data
• Web Graphs
Excellent Source of Sample Graphical Data
“ http://snap.Stanford.edu/data/ “
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
35
Part -3
Get your hands dirty !
36
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A demo . .
• Sample Social Network data set
• Data Includes people signing up info,
adding friends, unfriending etc. . . for a
month’s activity
• Neo4J
 Store and Update the social data
• Storm
 Calculate “friendship-index”
37
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A demo . .
• “friendship-index”
 n = Through how many people is
person “A” connected to person “B”
 Gives an idea of how close two people
are !
 Useful while searching friends on Social
Networks ( something like friends of friends concept
in facebook’s graph search )
38
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
The Topology . .
Update
Spout
Update
Bolt
Query
Spout Query
Bolt
Source
Source
39Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
40Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
Define what kind of tuples
are emitted
41Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
Gets and emits tuple streams
42Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
43Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
Objects for database access
and indexing service
44Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
45Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Spout
46Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Spout
The tuple to be emitted
can contain multiple
entities.
47Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
48Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
49Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
Retrieve caller friend and
requested friend ids
50Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
Retrieve caller friend
and requested friend
ids as per database
51Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
52Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
Import all spout and
bolt files
53Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
Unfortunately,There was no option in
Petrel to turn off console debug, so the
console view is really messy.
54Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Topology.yaml
Configurations to the topology are
specified in this file
55
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little More . .
Update
Spout
Update
Bolt
Query
Spout Query
Bolt
Source
Source
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
56
Final Thoughts
• A Storm-Neo4j framework is a boon for real-time
graph computations
• Quite flexible in Java, Python bindings and
implementations still have a long way to go.
• If you are an Admin or developer, Analyse your data
and computing requirements before narrowing down
on a framework.
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
57
…to play with Storm and Neo4J
• My PyCon Talk Repo – slides, code skeletons,
etc.
http://www.sonalraj.com/neo-storm.html
• Storm documentation (official)
http://github.com/nathanmarz/storm
• Storm Book
http://www.amazon.com/Getting-Started-Storm-Jonathan-
Leibiusky/dp/1449324010
• Deployment of storm on AWS
http://github.com/nathanmarz/storm-deploy
• Neo4J Documentation
http://www.neo4j.org
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
58
Ex-terminated . . .
- That’s it
- Thanks for Listening !
- Questions

Contenu connexe

Tendances

Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter StormUwe Printz
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Dan Lynn
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridDataWorks Summit
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with StormMariusz Gil
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleDung Ngua
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsData Con LA
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleDataWorks Summit/Hadoop Summit
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 

Tendances (20)

Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
STORM
STORMSTORM
STORM
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Tutorial Kafka-Storm
Tutorial Kafka-StormTutorial Kafka-Storm
Tutorial Kafka-Storm
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 

Similaire à Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013

Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...
Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...
Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...rivetlogic
 
Social Content Management with MongoDB
Social Content Management with MongoDBSocial Content Management with MongoDB
Social Content Management with MongoDBMongoDB
 
Introduction to MySQL Enterprise Monitor
Introduction to MySQL Enterprise MonitorIntroduction to MySQL Enterprise Monitor
Introduction to MySQL Enterprise MonitorMark Leith
 
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...jaxLondonConference
 
Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoopChris Huang
 
GraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish AbramsGraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish AbramsOracle Developers
 
Diagnose Your Microservices
Diagnose Your MicroservicesDiagnose Your Microservices
Diagnose Your MicroservicesMarcus Hirt
 
Spring & messaging
Spring & messagingSpring & messaging
Spring & messagingArtem Bilan
 
Comprehensive Monitoring for Docker
Comprehensive Monitoring for DockerComprehensive Monitoring for Docker
Comprehensive Monitoring for DockerChristian Beedgen
 
Developers vs DBA's - APACOUC webinar 2017
Developers vs DBA's - APACOUC webinar 2017Developers vs DBA's - APACOUC webinar 2017
Developers vs DBA's - APACOUC webinar 2017Connor McDonald
 
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudeProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudMarco Gralike
 
How To Visualize Graphs
How To Visualize GraphsHow To Visualize Graphs
How To Visualize GraphsJean Ihm
 
Pentest: footprinting & scan
Pentest: footprinting & scanPentest: footprinting & scan
Pentest: footprinting & scanJUNIOR SORO
 
Jfokus 2017 Oracle Dev Cloud and Containers
Jfokus 2017 Oracle Dev Cloud and ContainersJfokus 2017 Oracle Dev Cloud and Containers
Jfokus 2017 Oracle Dev Cloud and ContainersMika Rinne
 
What is WebRTC? What can I do with it?
What is WebRTC? What can I do with it?What is WebRTC? What can I do with it?
What is WebRTC? What can I do with it?Dan Jenkins
 
Full-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWSFull-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWSMongoDB
 
GraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster EverywhereGraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster EverywhereJ On The Beach
 
Crafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jetCrafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jetRohit Dhamija
 

Similaire à Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013 (20)

Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...
Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...
Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...
 
Social Content Management with MongoDB
Social Content Management with MongoDBSocial Content Management with MongoDB
Social Content Management with MongoDB
 
Logging & Docker - Season 2
Logging & Docker - Season 2Logging & Docker - Season 2
Logging & Docker - Season 2
 
Introduction to MySQL Enterprise Monitor
Introduction to MySQL Enterprise MonitorIntroduction to MySQL Enterprise Monitor
Introduction to MySQL Enterprise Monitor
 
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
 
Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoop
 
GraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish AbramsGraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
 
Diagnose Your Microservices
Diagnose Your MicroservicesDiagnose Your Microservices
Diagnose Your Microservices
 
Spring & messaging
Spring & messagingSpring & messaging
Spring & messaging
 
Session 203 iouc summit database
Session 203 iouc summit databaseSession 203 iouc summit database
Session 203 iouc summit database
 
Comprehensive Monitoring for Docker
Comprehensive Monitoring for DockerComprehensive Monitoring for Docker
Comprehensive Monitoring for Docker
 
Developers vs DBA's - APACOUC webinar 2017
Developers vs DBA's - APACOUC webinar 2017Developers vs DBA's - APACOUC webinar 2017
Developers vs DBA's - APACOUC webinar 2017
 
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudeProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
 
How To Visualize Graphs
How To Visualize GraphsHow To Visualize Graphs
How To Visualize Graphs
 
Pentest: footprinting & scan
Pentest: footprinting & scanPentest: footprinting & scan
Pentest: footprinting & scan
 
Jfokus 2017 Oracle Dev Cloud and Containers
Jfokus 2017 Oracle Dev Cloud and ContainersJfokus 2017 Oracle Dev Cloud and Containers
Jfokus 2017 Oracle Dev Cloud and Containers
 
What is WebRTC? What can I do with it?
What is WebRTC? What can I do with it?What is WebRTC? What can I do with it?
What is WebRTC? What can I do with it?
 
Full-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWSFull-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWS
 
GraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster EverywhereGraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster Everywhere
 
Crafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jetCrafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jet
 

Plus de Sonal Raj

Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...
Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...
Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...Sonal Raj
 
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...Sonal Raj
 
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019Sonal Raj
 
Progressive Javascript: Why React when you can Vue?
Progressive Javascript: Why React when you can Vue?Progressive Javascript: Why React when you can Vue?
Progressive Javascript: Why React when you can Vue?Sonal Raj
 
Alexa enabled smart home programming in Python - PyCon India 2018
Alexa enabled smart home programming in Python - PyCon India 2018Alexa enabled smart home programming in Python - PyCon India 2018
Alexa enabled smart home programming in Python - PyCon India 2018Sonal Raj
 
Startup Diagnostics: Reasons why startups can fail.
Startup Diagnostics: Reasons why startups can fail.Startup Diagnostics: Reasons why startups can fail.
Startup Diagnostics: Reasons why startups can fail.Sonal Raj
 
IT Quiz Mains
IT Quiz MainsIT Quiz Mains
IT Quiz MainsSonal Raj
 
IT Quiz Prelims
IT Quiz PrelimsIT Quiz Prelims
IT Quiz PrelimsSonal Raj
 
Spock the human computer interaction system - synopsis
Spock   the human computer interaction system - synopsisSpock   the human computer interaction system - synopsis
Spock the human computer interaction system - synopsisSonal Raj
 

Plus de Sonal Raj (9)

Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...
Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...
Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...
 
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...
 
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019
 
Progressive Javascript: Why React when you can Vue?
Progressive Javascript: Why React when you can Vue?Progressive Javascript: Why React when you can Vue?
Progressive Javascript: Why React when you can Vue?
 
Alexa enabled smart home programming in Python - PyCon India 2018
Alexa enabled smart home programming in Python - PyCon India 2018Alexa enabled smart home programming in Python - PyCon India 2018
Alexa enabled smart home programming in Python - PyCon India 2018
 
Startup Diagnostics: Reasons why startups can fail.
Startup Diagnostics: Reasons why startups can fail.Startup Diagnostics: Reasons why startups can fail.
Startup Diagnostics: Reasons why startups can fail.
 
IT Quiz Mains
IT Quiz MainsIT Quiz Mains
IT Quiz Mains
 
IT Quiz Prelims
IT Quiz PrelimsIT Quiz Prelims
IT Quiz Prelims
 
Spock the human computer interaction system - synopsis
Spock   the human computer interaction system - synopsisSpock   the human computer interaction system - synopsis
Spock the human computer interaction system - synopsis
 

Dernier

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013

  • 1. Real-Time stream computation on graphs using Storm, Neo4j and Python Sonal Raj http://www.sonalraj.com Presented at Pycon India 2013 Bangalore, India Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 1
  • 2. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Introduction 2 • With data multiplying each day, storage and knowledge extraction is a major concern. • Social Data Analysis, Business Intelligence • Constraints of Real Time and Fault-Tolerant Processing
  • 3. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com . . In this Talk 3 • A look at storm as a distributed computation Framework • Neo4J as a NoSQL graph database • Some Cool Pictures • What are we trying to achieve ?
  • 4. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Disclaimer ! 4 • This talk presents an overview of Storm and Neo4J . . Less dirty details  • I’m going to go pretty fast . . . Please hang on.
  • 5. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 5 Part -1 Storm – The Hadoop of Real Time
  • 6. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Don’t we have Hadoop ? 6
  • 7. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Storm v/s Hadoop 7 STORM HADOOP • Distributed Processing • Fault Tolerance
  • 8. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Storm v/s Hadoop 8 HADOOP • Large but Finite Jobs • Processes a Lot of Data at Once • High Latency
  • 9. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Storm v/s Hadoop 9 HADOOP • Large but Finite Jobs • Processes a Lot of Data at Once • High Latency Storm Infinite Computations called Topologies Process Infinite Streams of data one-tuple-at-a-time Low Latency
  • 10. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com So, what Storm gives us . . 10  Real-Time Computations  Guaranteed data Processing  Horizontal Scalability and Fault-Tolerance  No intermediate message Brokers  Higher Abstraction than Message Passing, so makes sense !
  • 11. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 11 Streams Tuple Tuple Tuple Tuple Tuple An unbounded sequence of Tuples
  • 12. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 12 Streams Tuple Tuple Tuple Tuple Tuple An unbounded sequence of Tuples So, what kind of a tuple is this ?
  • 13. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 13 Spouts A source of Streams
  • 14. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 14 Spouts A source of Streams But, what is the source FOR the spouts ?
  • 15. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 15 Bolts Computational units processing input streams and producing new streams
  • 16. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 16 Bolts Computational units processing input streams and producing new streams Just 1 stream ?
  • 17. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 17 Topologies A network of spouts and bolts
  • 18. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Is that it . . . ? 18 Tasks and Parallelism A spout or bolt can execute multiple tasks across the cluster
  • 19. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 19 [ ]Mr. Tuple O Shoot, where do I go now?
  • 20. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Groupings . . To the rescue of Mr. Tuple ! 20 • Shuffle Grouping #pick a random task • Fields Grouping #mod hashing on a subset of tuple fields • All Grouping #sends to all tasks • Global Grouping #picks task with lowest task id
  • 21. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A Storm Cluster 21 NIMBUS ZOOKEEPER ZOOKEEPER ZOOKEEPER SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR
  • 22. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A Storm Cluster 22 NIMBUS ZOOKEEPER ZOOKEEPER ZOOKEEPER SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR If this were Hadoop Job Tracker Task Tracker
  • 23. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A Storm Cluster 23 NIMBUS ZOOKEEPER ZOOKEEPER ZOOKEEPER SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR But it’s NOT Hadoop ! Co-ordinates Everything
  • 24. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Salient Features . . 24 • Storm > 0.7 supports Transactional Topologies  Processes small batches of topologies  If failure during commit, both batch+commit is retried • Storm guarantees message Processing using acknowledgements • Petrel by AirSage is a python wrapper for Storm ; you can write and submit topologies in Python.
  • 25. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 25 Part -2 Neo4J – “Get Graphed”
  • 26. 26 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com This is how Graph Data was represented in RDBMS.
  • 27. 27 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com ENTER, NOSQL DATABASES
  • 28. 28 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Types of NOSQL Databases Graph databases Document databases Column- Family Key-Value Stores Data Complexity DataSize
  • 29. 29 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Why NOSQL Databases • Easily horizontally scalable • Dynamic Schemas, Handle Unstructured data really well. • Excel in speed and volume • Trade off in consistency for efficiency (except in graph databases . . .We’ll see why  ) • Pleasure to code • Free to use any query language ( even SQL ! ) • Downtime? What Downtime ?
  • 30. 30 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com The Property Graph Model of Graph Databases • Core Abstractions  Nodes  Relationship between Nodes  Properties of both • Traversal Framework High Performance Queries on connected datasets • Bindings REST, Gremlin, etc.
  • 31. 31 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Neo4J • Fully ACID with rollbacks support (unbelievable!) • Schema-less and Efficient storage of Semi Structured Data • Fast deep traversal instead of slow SQL queries that span many table joins • Whiteboard Friendly • Very natural to express graph related problems with traversals (recommendation engine, shortest path etc..)
  • 32. 32 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Neo4J Pythonized ! • Py2Neo is an excellent binding for Neo4J • Accesses Neo4J using it’s RESTful API • Still under development . . Features like labels yet to be included !
  • 33. 33 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com So,Will Relational databases be Extinct ? OOPS!
  • 34. 34 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Categories of Graphical Data • Social Networks • Citations • Product Co-Purchasing • Internet peer-to-peer • Road Network and Map Data • Web Graphs Excellent Source of Sample Graphical Data “ http://snap.Stanford.edu/data/ “
  • 35. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 35 Part -3 Get your hands dirty !
  • 36. 36 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A demo . . • Sample Social Network data set • Data Includes people signing up info, adding friends, unfriending etc. . . for a month’s activity • Neo4J  Store and Update the social data • Storm  Calculate “friendship-index”
  • 37. 37 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A demo . . • “friendship-index”  n = Through how many people is person “A” connected to person “B”  Gives an idea of how close two people are !  Useful while searching friends on Social Networks ( something like friends of friends concept in facebook’s graph search )
  • 38. 38 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com The Topology . . Update Spout Update Bolt Query Spout Query Bolt Source Source
  • 39. 39Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Spout
  • 40. 40Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Spout Define what kind of tuples are emitted
  • 41. 41Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Spout Gets and emits tuple streams
  • 42. 42Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Bolt
  • 43. 43Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Bolt Objects for database access and indexing service
  • 44. 44Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Bolt
  • 45. 45Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Spout
  • 46. 46Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Spout The tuple to be emitted can contain multiple entities.
  • 47. 47Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt
  • 48. 48Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt
  • 49. 49Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt Retrieve caller friend and requested friend ids
  • 50. 50Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt Retrieve caller friend and requested friend ids as per database
  • 51. 51Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Create Topology
  • 52. 52Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Create Topology Import all spout and bolt files
  • 53. 53Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Create Topology Unfortunately,There was no option in Petrel to turn off console debug, so the console view is really messy.
  • 54. 54Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Topology.yaml Configurations to the topology are specified in this file
  • 55. 55 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little More . . Update Spout Update Bolt Query Spout Query Bolt Source Source
  • 56. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 56 Final Thoughts • A Storm-Neo4j framework is a boon for real-time graph computations • Quite flexible in Java, Python bindings and implementations still have a long way to go. • If you are an Admin or developer, Analyse your data and computing requirements before narrowing down on a framework.
  • 57. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 57 …to play with Storm and Neo4J • My PyCon Talk Repo – slides, code skeletons, etc. http://www.sonalraj.com/neo-storm.html • Storm documentation (official) http://github.com/nathanmarz/storm • Storm Book http://www.amazon.com/Getting-Started-Storm-Jonathan- Leibiusky/dp/1449324010 • Deployment of storm on AWS http://github.com/nathanmarz/storm-deploy • Neo4J Documentation http://www.neo4j.org
  • 58. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 58 Ex-terminated . . . - That’s it - Thanks for Listening ! - Questions