SlideShare une entreprise Scribd logo
1  sur  38
Real Time Analytics for Big Data
A Twitter Case Study

                                        Uri Cohen
                                   Head of Product
                                        @uri1803
Big Data Predictions



         “Over the next few years we'll see the adoption of scalable
         frameworks and platforms for handling
         streaming, or near real-time, analysis and processing. In the
         same way that Hadoop has been borne out of large-scale web
         applications, these platforms will be driven by the needs of large-
         scale location-aware mobile, social and sensor use.”
         Edd Dumbill, O’REILLY




          ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
                                                                     2
We’re Living in a Real Time World…
 Facebook Real Time                      Real Time User                        Google RT Web Analytics
 Social Analytics                        Engagement




Twitter paid tweet analytics                New Real Time                      Google Real Time Search
                                            Analytics Startups




 3                      ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
3 Flavors to Analytics




    Counting                                Correlating               Research




4              ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analytics @ Twitter – Counting

 How many
  signups, tweets, retweet
  s for a topic?
 What’s the average
  latency?
 Demographics
       Countries and cities
       Gender
       Age groups
       Device types
       …



5                  ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analytics @ Twitter – Correlating

 What devices fail at the
  same time?
 What features get user
  hooked?
 What places on the
  globe are “happening”?




6             ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analytics @ Twitter – Research

 Sentiment analysis
     “Obama is popular”
 Trends
     “People like to tweet
      after watching
      American Idol”
 Spam patterns
     How can you tell when
      a user spams?




7                ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
It’s All about Timing




      “Real time”                      Reasonably Quick                    Batch
    (< few Seconds)                   (seconds - minutes)               (hours/days)




8                ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
It’s All about Timing

              • Event driven / stream processing
              • High resolution – every tweet gets counted



              • Ad-hoc querying          This is what
              • Medium resolution (aggregations) here
                                         we’re
                                                                 to discuss 

              • Long running batch jobs (ETL, map/reduce)
              • Low resolution (trends & patterns)

9         ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)



     It takes a week for users to
     send   1 billion Tweets.
                                                      Source: http://blog.twitter.com/2011/03/numbers.html

10          ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)


                  On average,
        140 million
     tweets get sent every day.
                                                    Source: http://blog.twitter.com/2011/03/numbers.html

11        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)


         The highest
     throughput to date is
6,939 tweets/sec.
                                                   Source: http://blog.twitter.com/2011/03/numbers.html

12       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers (March 2011)



     460,000 new
       accounts
        are created daily.
                                                   Source: http://blog.twitter.com/2011/03/numbers.html

13       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Twitter in Numbers




     5% of the users generate
      75% of the content.
                                                            Source: http://www.sysomos.com/insidetwitter/

14        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Challenge 1 – Computing Reach




15       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Challenge 1 – Computing Reach
       Tweets                 Followers                     Distinct
                                                            Followers




                                                                        Count
                                                                                Reach




16         ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Challenge 2 – Word Count
       Tweets




                                                                      Count
                                                                               Word:Count




                                                                • Hottest topics
                                                                • URL mentions
                                                                • etc.
17       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
URL Mentions – Here’s One Use Case




18       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Analyze the Problem

 (Tens of) thousands of tweets per second to
  process
      Assumption: Need to process in near real time
 Aggregate counters for each word
      A few 10s of thousands of words (or hundreds of
       thousands if we include URLs)
 System needs to linearly scale


19            ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
It’s Really Just an Online Map/Reduce




20        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Event Driven Flow

               Store
                                                                  Research
               Raw


           Tokenize                                      Filter    Count


           Store
          Counters

21       ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
It’s Not That Simple…
 (Tens) of thousands of tweets to tokenize every
  second, even more words to filter 
  CPU bottleneck
 Tens/hundreds of thousands of counters to
  update  Counters contention
 Tens/hundreds of thousands of counters to
  persist  Database bottleneck
 (Tens) of thousands of tweets to store every
  second in the database  Database bottleneck

22         ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Implementation Challenges

 Twitter is growing by the day,
  how can this scale?
 Fault tolerance.
  What happens if a server fails?
 Consistency.
  Counters should be accurate!



23         ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Solutions, Solutions




24   ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Dealing with the Massive Tweet Stream
 We need a system that could scale linearly
      Event partitioning to the rescue!
      What to partition by?
                                                               Tokenizer1   Filterer 1

                                                               Tokenizer2   Filterer 2

                                                                Tokenizer
                                                                            Filterer 3
                                                                    3



                                                                Tokenizer
                                                                            Filterer n
                                                                    n

25             ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Counters Persistence & Contention
 Why not keep things in memory?
      Treat it as a SoR
                                                                                  IMDG
                                                                       Counter
                           Tokenizer1                    Filterer 1   Updater 1

                                                                       Counter
                           Tokenizer2                    Filterer 2   Updater 2

                            Tokenizer                                  Counter
                                                         Filterer 3
                                3                                     Updater 3




                            Tokenizer                                  Counter
                                                         Filterer n
                                n                                     Updater n

26     ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Counters Persistence & Contention
 Why not keep things in memory (data grid)?
      Question: How to partition counters?
                                                                                  IMDG
                                                                       Counter
                           Tokenizer1                    Filterer 1   Updater 1

                                                                       Counter
                           Tokenizer2                    Filterer 2   Updater 2

                            Tokenizer                                  Counter
                                                         Filterer 3
                                3                                     Updater 3




                            Tokenizer                                  Counter
                                                         Filterer n
                                n                                     Updater n

27     ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Counters Persistence & Contention
 Why not keep things in memory?
      Question: How to partition counters? in
             Facebook keeps 80% of its data
             Memory (Stanford research)                                           IMDG
                                                                       Counter
                           Tokenizer1                    Filterer 1   Updater 1
                     RAM is 100-1000x faster than Disk
                                                Counter
                     (Random seek)
                       Tokenizer2  Filterer 2  Updater 2
                     • Disk: 5 -10ms
                        Tokenizer               Counter
                                   Filterer 3
                     • RAM: ~0.001msec
                            3                  Updater 3




                            Tokenizer                                  Counter
                                                         Filterer n
                                n                                     Updater n

28     ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Database Bottleneck – Storing Raw Tweets
  RDBMS won’t cut it (unless you have $$$$$)
      Hadoop / NoSQL to the rescue
      Use the data grid for batching and as a reliable
       transaction log
                           Persister
                              1

                           Persister
                              2




                           Persister
                              n
29     ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Counter Consistency & Resiliency
 Primary/hot backup setup, sync replication
 Transactional, atomic updates
                                    IMDG
 “On Demand” backups
                                                            P   B


                                                            P   B


                                                            P   B


                                                            P   B


30   ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Counter Consistency & Resiliency
 Primary/hot backup setup, sync replication
 Transactional, atomic updates
                                    IMDG
 “On Demand” backups
                                                            P   B
                                                                P


                                                            P   B


                                                            P   B


                                                            P   B


31   ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Implementing Event Flows
 Use the data grid as event bus
 Stateful objects as transactional events

     Raw   Tokenizer               Tokenized      Filterer                Filtered   Counter




                                  P                                   B



32             ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Putting It All Together
                                IMDG


                            P                    B


                            P                    B


                                                 B
                            P

                                                 B
                            P




33        ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Putting It All Together

     • Use an IMDG IMDG
     • Partition events, counters
     • Collocate eventP handlers with data for low
                              B

       latency and simple scaling
     • Use atomic updates to update counters in
                      P       B

       memory
                      P       B
     • Persist asynchronously to a big data store
                                 P                     B


34             ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Reducing Counter Contention - Optimization
 Store processed                             word1
  tweets locally in                           word2
  each partition
                                              word3
 Use periodic batch
  writes to update                                             1 sec
  counters
 Process batches as
  a whole



35                                 35
            ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Keep Your Eyes Open…




36      ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
References
 Learn and fork the code on github:
     https://github.com/Gigaspaces/rt-analytics
 Detailed blog post
     http://bit.ly/gs-bigdata-analytics
 Twitter in numbers:
     http://blog.twitter.com/2011/03/numbers.html
 Twitter Storm:
     http://bit.ly/twitter-storm
 Apache S4
     http://incubator.apache.org/s4/

37               ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
38   ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Contenu connexe

Tendances

Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseAge Mooij
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big DataForwardSprint
 
Big Data
Big DataBig Data
Big DataNGDATA
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopDataWorks Summit
 
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityDatabricks
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design PatternsAllen Day, PhD
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrowmagda3695
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDatabricks
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
IoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected WorldIoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected WorldDataWorks Summit
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Big Data Spain
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Big Data Spain
 

Tendances (20)

Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
An introduction to Big Data
An introduction to Big DataAn introduction to Big Data
An introduction to Big Data
 
Big Data
Big DataBig Data
Big Data
 
ROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on HadoopROI of Big Data Analytics Native on Hadoop
ROI of Big Data Analytics Native on Hadoop
 
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrow
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn Creator
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
IoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected WorldIoT: How Data Science Driven Software is Eating the Connected World
IoT: How Data Science Driven Software is Eating the Connected World
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 

En vedette

Data Analytics on Twitter Feeds
Data Analytics on Twitter FeedsData Analytics on Twitter Feeds
Data Analytics on Twitter FeedsEu Jin Lok
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesKarol Chlasta
 
Discovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterDiscovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterKamran Munshi
 
Twitter Case Study: Just Falafel
Twitter Case Study: Just FalafelTwitter Case Study: Just Falafel
Twitter Case Study: Just FalafelFadi Khater
 
Using Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopUsing Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopJames Kebinger
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Rich Heimann
 
Twitter Data Analytics
Twitter Data AnalyticsTwitter Data Analytics
Twitter Data Analyticsrupika08
 
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)Kevin Weil
 
The twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategyThe twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategyJohn Ashcroft
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
Twitter Business Model - 8th Wonder of the World
Twitter Business Model - 8th Wonder of the WorldTwitter Business Model - 8th Wonder of the World
Twitter Business Model - 8th Wonder of the WorldP J
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP Technology
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonKrishna Sankar
 

En vedette (20)

Twitter Big Data
Twitter Big DataTwitter Big Data
Twitter Big Data
 
Analyzing social media with Python and other tools (2/4)
Analyzing social media with Python and other tools (2/4) Analyzing social media with Python and other tools (2/4)
Analyzing social media with Python and other tools (2/4)
 
BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3
 
Data Analytics on Twitter Feeds
Data Analytics on Twitter FeedsData Analytics on Twitter Feeds
Data Analytics on Twitter Feeds
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
Discovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @TwitterDiscovery & Consumption of Analytics Data @Twitter
Discovery & Consumption of Analytics Data @Twitter
 
The Big 5- Twitter
The Big 5- TwitterThe Big 5- Twitter
The Big 5- Twitter
 
BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1
 
BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1
 
Twitter Case Study: Just Falafel
Twitter Case Study: Just FalafelTwitter Case Study: Just Falafel
Twitter Case Study: Just Falafel
 
Using Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with HadoopUsing Ruby to do Map/Reduce with Hadoop
Using Ruby to do Map/Reduce with Hadoop
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
 
Twitter Data Analytics
Twitter Data AnalyticsTwitter Data Analytics
Twitter Data Analytics
 
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
Analyzing Big Data at Twitter (Web 2.0 Expo NYC Sep 2010)
 
The twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategyThe twitter case study 2014 dimensions of strategy
The twitter case study 2014 dimensions of strategy
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
Twitter Business Model - 8th Wonder of the World
Twitter Business Model - 8th Wonder of the WorldTwitter Business Model - 8th Wonder of the World
Twitter Business Model - 8th Wonder of the World
 
SAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data AnalysisSAP HANA in Healthcare: Real-Time Big Data Analysis
SAP HANA in Healthcare: Real-Time Big Data Analysis
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & Python
 

Similaire à Real Time Analytics for Twitter with a Scalable Architecture

Big Data in Real Time
Big Data in Real TimeBig Data in Real Time
Big Data in Real TimeRon Zavner
 
Search Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSearch Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSematext Group, Inc.
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Big data paris 2011 is cool florian douetteau
Big data paris 2011 is cool florian douetteauBig data paris 2011 is cool florian douetteau
Big data paris 2011 is cool florian douetteauIsCoolEnt
 
VA Smalltalk Update
VA Smalltalk UpdateVA Smalltalk Update
VA Smalltalk UpdateESUG
 
One Million Players Isn't Cool (at Festival of Games 2011)
One Million Players Isn't Cool (at Festival of Games 2011)One Million Players Isn't Cool (at Festival of Games 2011)
One Million Players Isn't Cool (at Festival of Games 2011)Ex Machina
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
Microsoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessMicrosoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessFrederik De Bruyne
 
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Karthik Ramasamy
 
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019Codemotion
 
Using Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementUsing Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementAmazon Web Services
 
Digital Asset Management with Alfresco
Digital Asset Management with AlfrescoDigital Asset Management with Alfresco
Digital Asset Management with Alfrescorivetlogic
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentationTUSHAR GARG
 
Scalable olap with druid
Scalable olap with druidScalable olap with druid
Scalable olap with druidKashif Khan
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerDataWorks Summit
 
Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)Lou Bajuk
 

Similaire à Real Time Analytics for Twitter with a Scalable Architecture (20)

Big Data in Real Time
Big Data in Real TimeBig Data in Real Time
Big Data in Real Time
 
Search Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL BackendSearch Analytics Business Value & NoSQL Backend
Search Analytics Business Value & NoSQL Backend
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Big data paris 2011 is cool florian douetteau
Big data paris 2011 is cool florian douetteauBig data paris 2011 is cool florian douetteau
Big data paris 2011 is cool florian douetteau
 
VA Smalltalk Update
VA Smalltalk UpdateVA Smalltalk Update
VA Smalltalk Update
 
One Million Players Isn't Cool (at Festival of Games 2011)
One Million Players Isn't Cool (at Festival of Games 2011)One Million Players Isn't Cool (at Festival of Games 2011)
One Million Players Isn't Cool (at Festival of Games 2011)
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
Microsoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessMicrosoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of business
 
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
 
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019
Horacio Gonzalez - Monitoring OVH - Codemotion Amsterdam 2019
 
Using Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementUsing Big Data to Driving Big Engagement
Using Big Data to Driving Big Engagement
 
Digital Asset Management with Alfresco
Digital Asset Management with AlfrescoDigital Asset Management with Alfresco
Digital Asset Management with Alfresco
 
Big Data
Big DataBig Data
Big Data
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentation
 
Scalable olap with druid
Scalable olap with druidScalable olap with druid
Scalable olap with druid
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging Manager
 
Data Mining & Engineering
Data Mining & EngineeringData Mining & Engineering
Data Mining & Engineering
 
Big Data
Big DataBig Data
Big Data
 
Big data by_mcal
Big data by_mcalBig data by_mcal
Big data by_mcal
 
Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)Extend the Reach of R to the Enterprise (for useR! 2013)
Extend the Reach of R to the Enterprise (for useR! 2013)
 

Plus de Uri Cohen

Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...Uri Cohen
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Uri Cohen
 
SSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonSSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonUri Cohen
 
Alef event - going open source
Alef event - going open source Alef event - going open source
Alef event - going open source Uri Cohen
 
GigaSpaces XAP for Financial Services
GigaSpaces XAP for Financial Services GigaSpaces XAP for Financial Services
GigaSpaces XAP for Financial Services Uri Cohen
 
In Memory Data Grids, Demystified!
In Memory Data Grids, Demystified! In Memory Data Grids, Demystified!
In Memory Data Grids, Demystified! Uri Cohen
 
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14Uri Cohen
 
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14 Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14 Uri Cohen
 
Deployment Automation on OpenStack with TOSCA and Cloudify
Deployment Automation on OpenStack with TOSCA and CloudifyDeployment Automation on OpenStack with TOSCA and Cloudify
Deployment Automation on OpenStack with TOSCA and CloudifyUri Cohen
 
Cloud stack collabiration conference - It's the app, stupid!
Cloud stack collabiration conference - It's the app, stupid!Cloud stack collabiration conference - It's the app, stupid!
Cloud stack collabiration conference - It's the app, stupid!Uri Cohen
 
Changing organizational culture - a sweaty usecase
Changing organizational culture - a sweaty usecaseChanging organizational culture - a sweaty usecase
Changing organizational culture - a sweaty usecaseUri Cohen
 
GigaSpaces XAP - Don't Call Me Cache!
GigaSpaces XAP - Don't Call Me Cache!GigaSpaces XAP - Don't Call Me Cache!
GigaSpaces XAP - Don't Call Me Cache!Uri Cohen
 
Oscon 2013 - Lessons from building an open source community
Oscon 2013 - Lessons from building an open source community Oscon 2013 - Lessons from building an open source community
Oscon 2013 - Lessons from building an open source community Uri Cohen
 
Oscon 2013 -Your OSS Project Is now served
Oscon 2013 -Your OSS Project Is now servedOscon 2013 -Your OSS Project Is now served
Oscon 2013 -Your OSS Project Is now servedUri Cohen
 
OpenStack Israel Summit 2013 - It’s the App, Stupid!
OpenStack Israel Summit 2013 - It’s the App, Stupid! OpenStack Israel Summit 2013 - It’s the App, Stupid!
OpenStack Israel Summit 2013 - It’s the App, Stupid! Uri Cohen
 
One Does Not Simply Walk Into Devops
One Does Not Simply Walk Into Devops One Does Not Simply Walk Into Devops
One Does Not Simply Walk Into Devops Uri Cohen
 
MongoDB in the Clouds
MongoDB in the CloudsMongoDB in the Clouds
MongoDB in the CloudsUri Cohen
 
Carrier Paas - CloudStack Collaboration Event 2012
Carrier Paas - CloudStack Collaboration Event 2012Carrier Paas - CloudStack Collaboration Event 2012
Carrier Paas - CloudStack Collaboration Event 2012Uri Cohen
 
Your Apps on the Cloud - What it really takes
Your Apps on the Cloud - What it really takes Your Apps on the Cloud - What it really takes
Your Apps on the Cloud - What it really takes Uri Cohen
 
Cassandra summit - Big Data Apps on the cloud
Cassandra summit - Big Data Apps on the cloud Cassandra summit - Big Data Apps on the cloud
Cassandra summit - Big Data Apps on the cloud Uri Cohen
 

Plus de Uri Cohen (20)

Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014
 
SSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax LondonSSDs, IMDGs and All the Rest - Jax London
SSDs, IMDGs and All the Rest - Jax London
 
Alef event - going open source
Alef event - going open source Alef event - going open source
Alef event - going open source
 
GigaSpaces XAP for Financial Services
GigaSpaces XAP for Financial Services GigaSpaces XAP for Financial Services
GigaSpaces XAP for Financial Services
 
In Memory Data Grids, Demystified!
In Memory Data Grids, Demystified! In Memory Data Grids, Demystified!
In Memory Data Grids, Demystified!
 
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
App Centric Devops - CloudStack 2014 Collaboration Conference #CCNA14
 
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14 Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
Its the app stupid - CloudStack 2014 Collaboration Conference #CCNA14
 
Deployment Automation on OpenStack with TOSCA and Cloudify
Deployment Automation on OpenStack with TOSCA and CloudifyDeployment Automation on OpenStack with TOSCA and Cloudify
Deployment Automation on OpenStack with TOSCA and Cloudify
 
Cloud stack collabiration conference - It's the app, stupid!
Cloud stack collabiration conference - It's the app, stupid!Cloud stack collabiration conference - It's the app, stupid!
Cloud stack collabiration conference - It's the app, stupid!
 
Changing organizational culture - a sweaty usecase
Changing organizational culture - a sweaty usecaseChanging organizational culture - a sweaty usecase
Changing organizational culture - a sweaty usecase
 
GigaSpaces XAP - Don't Call Me Cache!
GigaSpaces XAP - Don't Call Me Cache!GigaSpaces XAP - Don't Call Me Cache!
GigaSpaces XAP - Don't Call Me Cache!
 
Oscon 2013 - Lessons from building an open source community
Oscon 2013 - Lessons from building an open source community Oscon 2013 - Lessons from building an open source community
Oscon 2013 - Lessons from building an open source community
 
Oscon 2013 -Your OSS Project Is now served
Oscon 2013 -Your OSS Project Is now servedOscon 2013 -Your OSS Project Is now served
Oscon 2013 -Your OSS Project Is now served
 
OpenStack Israel Summit 2013 - It’s the App, Stupid!
OpenStack Israel Summit 2013 - It’s the App, Stupid! OpenStack Israel Summit 2013 - It’s the App, Stupid!
OpenStack Israel Summit 2013 - It’s the App, Stupid!
 
One Does Not Simply Walk Into Devops
One Does Not Simply Walk Into Devops One Does Not Simply Walk Into Devops
One Does Not Simply Walk Into Devops
 
MongoDB in the Clouds
MongoDB in the CloudsMongoDB in the Clouds
MongoDB in the Clouds
 
Carrier Paas - CloudStack Collaboration Event 2012
Carrier Paas - CloudStack Collaboration Event 2012Carrier Paas - CloudStack Collaboration Event 2012
Carrier Paas - CloudStack Collaboration Event 2012
 
Your Apps on the Cloud - What it really takes
Your Apps on the Cloud - What it really takes Your Apps on the Cloud - What it really takes
Your Apps on the Cloud - What it really takes
 
Cassandra summit - Big Data Apps on the cloud
Cassandra summit - Big Data Apps on the cloud Cassandra summit - Big Data Apps on the cloud
Cassandra summit - Big Data Apps on the cloud
 

Dernier

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Dernier (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Real Time Analytics for Twitter with a Scalable Architecture

  • 1. Real Time Analytics for Big Data A Twitter Case Study Uri Cohen Head of Product @uri1803
  • 2. Big Data Predictions “Over the next few years we'll see the adoption of scalable frameworks and platforms for handling streaming, or near real-time, analysis and processing. In the same way that Hadoop has been borne out of large-scale web applications, these platforms will be driven by the needs of large- scale location-aware mobile, social and sensor use.” Edd Dumbill, O’REILLY ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved 2
  • 3. We’re Living in a Real Time World… Facebook Real Time Real Time User Google RT Web Analytics Social Analytics Engagement Twitter paid tweet analytics New Real Time Google Real Time Search Analytics Startups 3 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 4. 3 Flavors to Analytics Counting Correlating Research 4 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 5. Analytics @ Twitter – Counting  How many signups, tweets, retweet s for a topic?  What’s the average latency?  Demographics  Countries and cities  Gender  Age groups  Device types  … 5 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 6. Analytics @ Twitter – Correlating  What devices fail at the same time?  What features get user hooked?  What places on the globe are “happening”? 6 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 7. Analytics @ Twitter – Research  Sentiment analysis  “Obama is popular”  Trends  “People like to tweet after watching American Idol”  Spam patterns  How can you tell when a user spams? 7 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 8. It’s All about Timing “Real time” Reasonably Quick Batch (< few Seconds) (seconds - minutes) (hours/days) 8 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 9. It’s All about Timing • Event driven / stream processing • High resolution – every tweet gets counted • Ad-hoc querying This is what • Medium resolution (aggregations) here we’re to discuss  • Long running batch jobs (ETL, map/reduce) • Low resolution (trends & patterns) 9 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 10. Twitter in Numbers (March 2011) It takes a week for users to send 1 billion Tweets. Source: http://blog.twitter.com/2011/03/numbers.html 10 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 11. Twitter in Numbers (March 2011) On average, 140 million tweets get sent every day. Source: http://blog.twitter.com/2011/03/numbers.html 11 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 12. Twitter in Numbers (March 2011) The highest throughput to date is 6,939 tweets/sec. Source: http://blog.twitter.com/2011/03/numbers.html 12 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 13. Twitter in Numbers (March 2011) 460,000 new accounts are created daily. Source: http://blog.twitter.com/2011/03/numbers.html 13 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 14. Twitter in Numbers 5% of the users generate 75% of the content. Source: http://www.sysomos.com/insidetwitter/ 14 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 15. Challenge 1 – Computing Reach 15 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 16. Challenge 1 – Computing Reach Tweets Followers Distinct Followers Count Reach 16 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 17. Challenge 2 – Word Count Tweets Count Word:Count • Hottest topics • URL mentions • etc. 17 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 18. URL Mentions – Here’s One Use Case 18 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 19. Analyze the Problem  (Tens of) thousands of tweets per second to process  Assumption: Need to process in near real time  Aggregate counters for each word  A few 10s of thousands of words (or hundreds of thousands if we include URLs)  System needs to linearly scale 19 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 20. It’s Really Just an Online Map/Reduce 20 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 21. Event Driven Flow Store Research Raw Tokenize Filter Count Store Counters 21 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 22. It’s Not That Simple…  (Tens) of thousands of tweets to tokenize every second, even more words to filter  CPU bottleneck  Tens/hundreds of thousands of counters to update  Counters contention  Tens/hundreds of thousands of counters to persist  Database bottleneck  (Tens) of thousands of tweets to store every second in the database  Database bottleneck 22 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 23. Implementation Challenges  Twitter is growing by the day, how can this scale?  Fault tolerance. What happens if a server fails?  Consistency. Counters should be accurate! 23 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 24. Solutions, Solutions 24 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 25. Dealing with the Massive Tweet Stream  We need a system that could scale linearly  Event partitioning to the rescue!  What to partition by? Tokenizer1 Filterer 1 Tokenizer2 Filterer 2 Tokenizer Filterer 3 3 Tokenizer Filterer n n 25 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 26. Counters Persistence & Contention  Why not keep things in memory?  Treat it as a SoR IMDG Counter Tokenizer1 Filterer 1 Updater 1 Counter Tokenizer2 Filterer 2 Updater 2 Tokenizer Counter Filterer 3 3 Updater 3 Tokenizer Counter Filterer n n Updater n 26 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 27. Counters Persistence & Contention  Why not keep things in memory (data grid)?  Question: How to partition counters? IMDG Counter Tokenizer1 Filterer 1 Updater 1 Counter Tokenizer2 Filterer 2 Updater 2 Tokenizer Counter Filterer 3 3 Updater 3 Tokenizer Counter Filterer n n Updater n 27 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 28. Counters Persistence & Contention  Why not keep things in memory?  Question: How to partition counters? in Facebook keeps 80% of its data Memory (Stanford research) IMDG Counter Tokenizer1 Filterer 1 Updater 1 RAM is 100-1000x faster than Disk Counter (Random seek) Tokenizer2 Filterer 2 Updater 2 • Disk: 5 -10ms Tokenizer Counter Filterer 3 • RAM: ~0.001msec 3 Updater 3 Tokenizer Counter Filterer n n Updater n 28 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 29. Database Bottleneck – Storing Raw Tweets  RDBMS won’t cut it (unless you have $$$$$)  Hadoop / NoSQL to the rescue  Use the data grid for batching and as a reliable transaction log Persister 1 Persister 2 Persister n 29 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 30. Counter Consistency & Resiliency  Primary/hot backup setup, sync replication  Transactional, atomic updates IMDG  “On Demand” backups P B P B P B P B 30 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 31. Counter Consistency & Resiliency  Primary/hot backup setup, sync replication  Transactional, atomic updates IMDG  “On Demand” backups P B P P B P B P B 31 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 32. Implementing Event Flows  Use the data grid as event bus  Stateful objects as transactional events Raw Tokenizer Tokenized Filterer Filtered Counter P B 32 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 33. Putting It All Together IMDG P B P B B P B P 33 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 34. Putting It All Together • Use an IMDG IMDG • Partition events, counters • Collocate eventP handlers with data for low B latency and simple scaling • Use atomic updates to update counters in P B memory P B • Persist asynchronously to a big data store P B 34 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 35. Reducing Counter Contention - Optimization  Store processed word1 tweets locally in word2 each partition word3  Use periodic batch writes to update 1 sec counters  Process batches as a whole 35 35 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 36. Keep Your Eyes Open… 36 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 37. References  Learn and fork the code on github: https://github.com/Gigaspaces/rt-analytics  Detailed blog post http://bit.ly/gs-bigdata-analytics  Twitter in numbers: http://blog.twitter.com/2011/03/numbers.html  Twitter Storm: http://bit.ly/twitter-storm  Apache S4 http://incubator.apache.org/s4/ 37 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
  • 38. 38 ® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

Notes de l'éditeur

  1. ActiveInsight