SlideShare une entreprise Scribd logo
1  sur  41
Small Pieces Loosely
       Joined
        #cgn13
or...
A practical example of processing real-time
   data with a distributed agent network
            (Warning: does not contain real code)
Red Gate
12th October 2011
eMail Marketing
Mailchimp webhook
"type": "subscribe",
"fired_at": "2009-03-26 21:35:57",
"data[id]": "8a25ff1d98",
"data[list_id]": "a6b5da1054",
"data[email]": "api@mailchimp.com",
"data[email_type]": "html",
"data[merges][EMAIL]": "api@mailchimp.com",
"data[merges][FNAME]": "MailChimp",
"data[merges][LNAME]": "API",
"data[merges][INTERESTS]": "Group1,Group2",
"data[ip_opt]": "10.20.10.30",
"data[ip_signup]": "10.20.10.30"
Pump the callbacks into a message bus...
Messaging
mailchimp-pump.php


$json = json_encode($_POST);
$msg = new AMQPMessage($json);
$channel->basic_publish($msg, 'mailchimp', "morat.campaign.mailchimp.".
$_POST['type']);
I’d like to watch the stream on IRC...
Valve

Subscribe to mailchimp exchange
morat.campaign.mailchimp.#
Translate to plain english for IRC
Inject into irc exchange with routing key morat.irc.
[channel]
mailchimp-irc-valve.rb

case record['type']
when 'subscribe'
    output :irc, "'#{record['data']['merges']['FNAME']} #{record['data']
    ['merges']['LNAME']}' has joined the list"
when 'unsubscribe'
    output :irc, "'#{record['data']['merges']['FNAME']} #{record['data']
    ['merges']['LNAME']}' has left the list"
...
Create a Sink to send the messages to IRC...
irc-sink.pl

$q = $amq->channel(1)->queue('morat.irc.' . $channel , { passive => 0,
durable => 0, auto_delete => 1, exclusive => 0, })->subscribe( sub {
    my ($payload, $meta) = @_;
    my ($channel) = $meta->{'queue'} =~ /.([^.]+)$/;

      $irc->yield('privmsg', '#'.$channel, GREEN.$payload);
});
Where have we got to?

Pump: Mailchimp webhook (HTTP POST) >
morat.[campaign].mailchimp.[type] (JSON)
Valve: morat.campaign.mailchimp.[type] (JSON) >
morat.irc.[campaign] (Text)
Sink: morat.irc.[campaign] (Text) > IRC server
That’s cool, but hey it would be great to see
#campaign tweets as well...
twitter-search-pump.rb
TweetStream::Client.new.track(keywords.split(',')) do |status|
  keywords.split(',').each do |searchterm|
    if status.text.match(searchterm)
      searchterm.sub!(' ','')
      searchterm.sub!('#','')
      log.debug "Sending: #{status.user.screen_name} :: #{status.text} ::
morat.twitter.search.#{searchterm}"
      broker.exchange.publish JSON.generate(status), :routing_key =>
"morat.twitter.search.#{searchterm}"
    end
  end
end
twitter-irc-valve.rb

case routing_key
when 'morat.twitter.@neildavidson.list.redgaters'
     output :irc, "RG chatter: #{record['user']['screen_name']} tweeted:
     #{record['text']}", :routing_key => "morat.irc.redgaters"
else
     searchterm = routing_key.match(/morat.twitter.search.(.+)/)[1]
     output :irc, "#{record['user']['screen_name']} tweeted:
     #{record['text']}", :routing_key => "morat.irc.#{searchterm}"
I feel the urge to graph...
Thanks @garethr
Valve
Subscribe to mailchimp exchange morat.
[campaign].mailchimp.#
Translate to Graphite format: [value] [timestamp]
Inject into graphite exchange with routing key
based on sample window: 10sec.
[campaign].mailchimp.[action].count
But let’s make it cool...
Complex Event Processing
mailchimp-graphite-
                valve.rb
    %w{ subscribe unsubscribe campaign }.each do |action|
  [ '10 sec', '1 min', '5 min', '15 min' ].each do |window|
     valve.register "SELECT count(*) from
MailchimpEvent(type='#{action}').win:time_batch(#{window})", (
       Listener.new(valve) do |agent, event|
         valve.output :graphite, "#{event.get('count(*)')}", :routing_key =>
window.delete(' ') + ".morat.#{valve.application}.mailchimp.#{action}"
       end
     )
  end
end
Why use CEP?
# find the sum of retweets of last 5 tweets which saw more than 10 retweets
SELECT sum(retweets) from TweetEvent(retweets >= 10).win:length(5)

# find max, min and average number of retweets for a sliding 60 second window of time
SELECT max(retweets), min(retweets), avg(retweets) FROM TweetEvent.win:time(60 sec)

# compute number of retweets for all tweets in 10 second batches
SELECT sum(retweets) from TweetEvent.win:time_batch(10 sec)

# number of retweets, grouped by timezone, buffered in 10 second increments
SELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone

# compute the sum of retweets in sliding 60 second window, and emit count every 30 events
SELECT sum(retweets) from TweetEvent.win:time(60 sec) output snapshot every 30 events

# every 10 seconds, report timezones which accumulated more than 10 retweets
SELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone having
sum(retweets) > 10

       Courtesy @igrigorik http://www.igvita.com/2011/05/27/streamsql-event-processing-with-esper/
Is there really a correlation?
Statistical Computing
Valve

Grab raw data for window from graphite via REST
Create scatter graph using R and calculate
correlation
Inject correlation into graphite exchange
twitter-correlation-valve.rb
      require 'rsruby'
...

r.jpeg(filename)
r.assign('xs', data[1])
r.assign('ys', data[2])
fit = r.lm('ys ~ xs')
r.plot({
   'x' => data[1],
   'y' => data[2],
   'xlab' => label[1],
   'ylab' => label[2]
})
cor = r.cor(data[1],data[2]).to_s
r.title("Correlation: " + cor)
r.abline(fit['coefficients']['(Intercept)'],fit['coefficients']['xs'])
r.eval_R("dev.off()")
Lets add some realtime visualisation...
Websockets
Valve
Subscribe to twitter exchange
morat.twitter.search.[keyword]
Extract adjectives using entagger
Inject adjectives into twitter exchange with routing
key morat.twitter.search.[keyword].adjectives as:
[adjective] [count]
twitter-sentiment-valve.rb
      require 'engtagger'
...

log.debug "Received tweet from #{record['user']['screen_name']} on
#{routing_key}"

adjectives = @parser.add_tags(record['text']).scan(EngTagger::ADJ).map do |
n|
   @parser.strip_tags(n)
end

ret = Hash.new(0)
adjectives.each do |n|
  n = @parser.stem(n)
  ret[n] += 1 unless n =~ /As*z/
end
Sink

Subscribe to twitter exchange
morat.twitter.search.[keyword].adjectives
Use node.js and Socket.IO to send data to web
client via Websockets
Visualise with processing.js in web browser
twitter-sentiment-sink.js
    io.sockets.on('connection', function (socket) {
     amqp_connection.on('ready', function () {
         var queue = amqp_connection.queue('');
         exchange = amqp_connection.exchange('twitter', { type: 'topic',
passive: false, durable: true, autoDelete: true}, function (exchange) {
             queue.bind(exchange,routing_key);
             queue.subscribe(function (message) {
                 socket.emit('data', { text: message.data.toString() });
             });
         });
     });
});
twitter-sentiment-sink.html
     <H1>Twitter Sentiment</H1>
  <div id="container">
    <canvas id="twitter-sentiment-sink" data-processing-sources="twitter-
sentiment-sink.pde" WIDTH=800 HEIGHT=600></canvas>
  </div>
  <script src="/socket.io/socket.io.js"></script>
  <script type="text/javascript">
    var socket = io.connect('http://localhost');
    socket.on('data', function (data) {
      var pjs = Processing.getInstanceById('twitter-sentiment-sink');
      pjs.addDatum(data.text.split(' ')[0]);
    });
  </script>
@ennui2342

www.morat.co.uk
 polis.ecafe.org

Contenu connexe

Tendances

Build Lightweight Web Module
Build Lightweight Web ModuleBuild Lightweight Web Module
Build Lightweight Web Module
Morgan Cheng
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
Wesley Beary
 
Redux. From twitter hype to production
Redux. From twitter hype to productionRedux. From twitter hype to production
Redux. From twitter hype to production
FDConf
 
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at StripeBuilding Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
MongoDB
 
Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"
Fwdays
 
Assignment no39
Assignment no39Assignment no39
Assignment no39
Jay Patel
 

Tendances (20)

Cross Domain Web
Mashups with JQuery and Google App Engine
Cross Domain Web
Mashups with JQuery and Google App EngineCross Domain Web
Mashups with JQuery and Google App Engine
Cross Domain Web
Mashups with JQuery and Google App Engine
 
The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212
 
Firebase ng2 zurich
Firebase ng2 zurichFirebase ng2 zurich
Firebase ng2 zurich
 
Build Lightweight Web Module
Build Lightweight Web ModuleBuild Lightweight Web Module
Build Lightweight Web Module
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
 
Using Change Streams to Keep Up with Your Data
Using Change Streams to Keep Up with Your DataUsing Change Streams to Keep Up with Your Data
Using Change Streams to Keep Up with Your Data
 
Use Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionUse Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extension
 
Nagios Conference 2014 - Rodrigo Faria - Developing your Plugin
Nagios Conference 2014 - Rodrigo Faria - Developing your PluginNagios Conference 2014 - Rodrigo Faria - Developing your Plugin
Nagios Conference 2014 - Rodrigo Faria - Developing your Plugin
 
Psycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptPsycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python Script
 
Rxjs swetugg
Rxjs swetuggRxjs swetugg
Rxjs swetugg
 
Streaming twitter data using kafka
Streaming twitter data using kafkaStreaming twitter data using kafka
Streaming twitter data using kafka
 
Watch out: Observables are here to stay
Watch out: Observables are here to stayWatch out: Observables are here to stay
Watch out: Observables are here to stay
 
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in NagiosNagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
 
SignalR
SignalRSignalR
SignalR
 
Akka: Actor Design & Communication Technics
Akka: Actor Design & Communication TechnicsAkka: Actor Design & Communication Technics
Akka: Actor Design & Communication Technics
 
Redux. From twitter hype to production
Redux. From twitter hype to productionRedux. From twitter hype to production
Redux. From twitter hype to production
 
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at StripeBuilding Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
 
Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"
 
Rxjs marble-testing
Rxjs marble-testingRxjs marble-testing
Rxjs marble-testing
 
Assignment no39
Assignment no39Assignment no39
Assignment no39
 

Similaire à Small pieces loosely joined

Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Lucidworks
 
Zeromq - Pycon India 2013
Zeromq - Pycon India 2013Zeromq - Pycon India 2013
Zeromq - Pycon India 2013
Srinivasan R
 
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event Processor
WSO2
 

Similaire à Small pieces loosely joined (20)

TSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkTSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech Talk
 
Tsar tech talk
Tsar tech talkTsar tech talk
Tsar tech talk
 
fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloud
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Streaming Way to Webscale: How We Scale Bitly via Streaming
Streaming Way to Webscale: How We Scale Bitly via StreamingStreaming Way to Webscale: How We Scale Bitly via Streaming
Streaming Way to Webscale: How We Scale Bitly via Streaming
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
 
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
 
Streaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talkStreaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talk
 
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
 
Introduction to Marionette Collective
Introduction to Marionette CollectiveIntroduction to Marionette Collective
Introduction to Marionette Collective
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
 
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
 
Pycon - Python for ethical hackers
Pycon - Python for ethical hackers Pycon - Python for ethical hackers
Pycon - Python for ethical hackers
 
Zeromq - Pycon India 2013
Zeromq - Pycon India 2013Zeromq - Pycon India 2013
Zeromq - Pycon India 2013
 
Arduino and the real time web
Arduino and the real time webArduino and the real time web
Arduino and the real time web
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every day
 
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event Processor
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flink
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Small pieces loosely joined

  • 1. Small Pieces Loosely Joined #cgn13
  • 2. or... A practical example of processing real-time data with a distributed agent network (Warning: does not contain real code)
  • 6. Mailchimp webhook "type": "subscribe", "fired_at": "2009-03-26 21:35:57", "data[id]": "8a25ff1d98", "data[list_id]": "a6b5da1054", "data[email]": "api@mailchimp.com", "data[email_type]": "html", "data[merges][EMAIL]": "api@mailchimp.com", "data[merges][FNAME]": "MailChimp", "data[merges][LNAME]": "API", "data[merges][INTERESTS]": "Group1,Group2", "data[ip_opt]": "10.20.10.30", "data[ip_signup]": "10.20.10.30"
  • 7. Pump the callbacks into a message bus...
  • 9. mailchimp-pump.php $json = json_encode($_POST); $msg = new AMQPMessage($json); $channel->basic_publish($msg, 'mailchimp', "morat.campaign.mailchimp.". $_POST['type']);
  • 10. I’d like to watch the stream on IRC...
  • 11. Valve Subscribe to mailchimp exchange morat.campaign.mailchimp.# Translate to plain english for IRC Inject into irc exchange with routing key morat.irc. [channel]
  • 12. mailchimp-irc-valve.rb case record['type'] when 'subscribe' output :irc, "'#{record['data']['merges']['FNAME']} #{record['data'] ['merges']['LNAME']}' has joined the list" when 'unsubscribe' output :irc, "'#{record['data']['merges']['FNAME']} #{record['data'] ['merges']['LNAME']}' has left the list" ...
  • 13. Create a Sink to send the messages to IRC...
  • 14. irc-sink.pl $q = $amq->channel(1)->queue('morat.irc.' . $channel , { passive => 0, durable => 0, auto_delete => 1, exclusive => 0, })->subscribe( sub { my ($payload, $meta) = @_; my ($channel) = $meta->{'queue'} =~ /.([^.]+)$/; $irc->yield('privmsg', '#'.$channel, GREEN.$payload); });
  • 15. Where have we got to? Pump: Mailchimp webhook (HTTP POST) > morat.[campaign].mailchimp.[type] (JSON) Valve: morat.campaign.mailchimp.[type] (JSON) > morat.irc.[campaign] (Text) Sink: morat.irc.[campaign] (Text) > IRC server
  • 16. That’s cool, but hey it would be great to see #campaign tweets as well...
  • 17. twitter-search-pump.rb TweetStream::Client.new.track(keywords.split(',')) do |status| keywords.split(',').each do |searchterm| if status.text.match(searchterm) searchterm.sub!(' ','') searchterm.sub!('#','') log.debug "Sending: #{status.user.screen_name} :: #{status.text} :: morat.twitter.search.#{searchterm}" broker.exchange.publish JSON.generate(status), :routing_key => "morat.twitter.search.#{searchterm}" end end end
  • 18. twitter-irc-valve.rb case routing_key when 'morat.twitter.@neildavidson.list.redgaters' output :irc, "RG chatter: #{record['user']['screen_name']} tweeted: #{record['text']}", :routing_key => "morat.irc.redgaters" else searchterm = routing_key.match(/morat.twitter.search.(.+)/)[1] output :irc, "#{record['user']['screen_name']} tweeted: #{record['text']}", :routing_key => "morat.irc.#{searchterm}"
  • 19.
  • 20. I feel the urge to graph...
  • 22. Valve Subscribe to mailchimp exchange morat. [campaign].mailchimp.# Translate to Graphite format: [value] [timestamp] Inject into graphite exchange with routing key based on sample window: 10sec. [campaign].mailchimp.[action].count
  • 23. But let’s make it cool...
  • 25. mailchimp-graphite- valve.rb %w{ subscribe unsubscribe campaign }.each do |action| [ '10 sec', '1 min', '5 min', '15 min' ].each do |window| valve.register "SELECT count(*) from MailchimpEvent(type='#{action}').win:time_batch(#{window})", ( Listener.new(valve) do |agent, event| valve.output :graphite, "#{event.get('count(*)')}", :routing_key => window.delete(' ') + ".morat.#{valve.application}.mailchimp.#{action}" end ) end end
  • 26. Why use CEP? # find the sum of retweets of last 5 tweets which saw more than 10 retweets SELECT sum(retweets) from TweetEvent(retweets >= 10).win:length(5) # find max, min and average number of retweets for a sliding 60 second window of time SELECT max(retweets), min(retweets), avg(retweets) FROM TweetEvent.win:time(60 sec) # compute number of retweets for all tweets in 10 second batches SELECT sum(retweets) from TweetEvent.win:time_batch(10 sec) # number of retweets, grouped by timezone, buffered in 10 second increments SELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone # compute the sum of retweets in sliding 60 second window, and emit count every 30 events SELECT sum(retweets) from TweetEvent.win:time(60 sec) output snapshot every 30 events # every 10 seconds, report timezones which accumulated more than 10 retweets SELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone having sum(retweets) > 10 Courtesy @igrigorik http://www.igvita.com/2011/05/27/streamsql-event-processing-with-esper/
  • 27.
  • 28. Is there really a correlation?
  • 30. Valve Grab raw data for window from graphite via REST Create scatter graph using R and calculate correlation Inject correlation into graphite exchange
  • 31. twitter-correlation-valve.rb require 'rsruby' ... r.jpeg(filename) r.assign('xs', data[1]) r.assign('ys', data[2]) fit = r.lm('ys ~ xs') r.plot({ 'x' => data[1], 'y' => data[2], 'xlab' => label[1], 'ylab' => label[2] }) cor = r.cor(data[1],data[2]).to_s r.title("Correlation: " + cor) r.abline(fit['coefficients']['(Intercept)'],fit['coefficients']['xs']) r.eval_R("dev.off()")
  • 32.
  • 33. Lets add some realtime visualisation...
  • 35. Valve Subscribe to twitter exchange morat.twitter.search.[keyword] Extract adjectives using entagger Inject adjectives into twitter exchange with routing key morat.twitter.search.[keyword].adjectives as: [adjective] [count]
  • 36. twitter-sentiment-valve.rb require 'engtagger' ... log.debug "Received tweet from #{record['user']['screen_name']} on #{routing_key}" adjectives = @parser.add_tags(record['text']).scan(EngTagger::ADJ).map do | n| @parser.strip_tags(n) end ret = Hash.new(0) adjectives.each do |n| n = @parser.stem(n) ret[n] += 1 unless n =~ /As*z/ end
  • 37. Sink Subscribe to twitter exchange morat.twitter.search.[keyword].adjectives Use node.js and Socket.IO to send data to web client via Websockets Visualise with processing.js in web browser
  • 38. twitter-sentiment-sink.js io.sockets.on('connection', function (socket) { amqp_connection.on('ready', function () { var queue = amqp_connection.queue(''); exchange = amqp_connection.exchange('twitter', { type: 'topic', passive: false, durable: true, autoDelete: true}, function (exchange) { queue.bind(exchange,routing_key); queue.subscribe(function (message) { socket.emit('data', { text: message.data.toString() }); }); }); }); });
  • 39. twitter-sentiment-sink.html <H1>Twitter Sentiment</H1> <div id="container"> <canvas id="twitter-sentiment-sink" data-processing-sources="twitter- sentiment-sink.pde" WIDTH=800 HEIGHT=600></canvas> </div> <script src="/socket.io/socket.io.js"></script> <script type="text/javascript"> var socket = io.connect('http://localhost'); socket.on('data', function (data) { var pjs = Processing.getInstanceById('twitter-sentiment-sink'); pjs.addDatum(data.text.split(' ')[0]); }); </script>
  • 40.

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. Take the adjectives and store a running total in Redis to create long timeline tag clouds\n Pull out @replies and RT&amp;#x2019;s and throw them into Neo4j - a graph database for post-competition analysis\n Hook an Arduino up to IRC to receive Mailchimp subscriptions and create a physical visualisation in the office (e.g. glow ball)\n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n