SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
T : @markrittman
HOW A TWEET WENT VIRAL
Mark Rittman, Oracle ACE Director & Independent Analyst

MJR Analytics ltd (http://www.mjr-analytics.com)
BIWA SUMMIT 2017, SAN FRANCISCO
•Oracle ACE Director, now Independent Analyst
•Past ODTUG Executive Board Member
•Author of two books on Oracle BI
•Co-founder & CTO of Rittman Mead
•15+ Years in Oracle BI, DW, ETL + now Big Data
•Now working in analytics product management + strategy
•Host of the Drill to Detail Podcast (www.drilltodetail.com)
•Based in Brighton & work in London, UK
About The Presenter
2
3
HOME AUTOMATION
3
•One of my personal interests is Home Automation
•Started with Nest thermostat and Philips Hue lights
•Extended the Nest system to include 

Nest Protect and Nest Cam
•Used Apple HomeKit, Apple TV for Siri voice control
•Added Samsung Smart Things hub for Z-wave, 

Zigbee compatibility
•Linked Smart Things to Homekit using open-source
HomeBridge project to enable for Siri control
•Added Logitech Harmony for TV, Console, Roku
Home Automation and Smart ‘IoT’ Devices
Philips Hue 

Lighting
Nest Protect (X2), 

Thermostat, Cam
Withings

Smart Scales
Airplay

Speakers
Homebridge

Homekit / Smarthings 

Connector
Samsung

Smart Things
Hub (Z-Wave, Zigbee)
Door, Motion, Moisture,

Presence Sensors
Apple Homekit,

Apple TV, Siri
•Then Amazon Echo (x2) and Echo Dots (x4) to

extend voice control + add Alexa skills
•… and then Google Home + Chromecasts for

hangouts, Google Assistant + Google Search
Voice Control - When Home Automation Gets Real
Philips Hue 

Lighting
Nest Protect (X2), 

Thermostat, Cam
Withings

Smart Scales
Airplay

Speakers
Samsung

Smart Things
Hub (Z-Wave, 

Zigbee)
Door, Motion, Moisture,

Presence Sensors
•Position multiple units around the house for 

ubiquitous voice control and music playback
•Integration with smart home devices
•Use ML algorithms in the cloud, constantly
improving and leveraging cloud-scale processing
“Alexa, turn on the
kitchen lights”
“Hey Google, turn up
the heating”
Amazon Echo
Google Home
ONE DAY BACK IN SEPTEMBER 2016 …
6
7
THE FOLLOWING MORNING…
8
9
10
11
12
13
14
15
BUT WAIT…
16
THIS COULD BE INTERESTING…
17
18
All Device Data at Home Logged to Hadoop Cluster
•Data extracted or transported to target platform using LogStash, CSV file batch loads
•Landed into HDFS as JSON documents, then exposed as Hive tables using Storage Handler
•Cataloged, visualised and analysed using Oracle Big Data Discovery + Python ML
Other Personal Project : Home + Wearables Analytics
19
Data Transfer Data Access
“Personal” Data Lake
Jupyter

Web Notebook
6 Node Hadoop Cluster (CDH5.5)
Discovery & Development Labs

Oracle Big Data Discovery 1.2
Data sets and samples
Models and programs
Oracle DV

Desktop
Models
BDD Shell,

Python, 

Spark ML
Data Factory
LogStash

via HTTP
Manual

CSV U/L
Data streams
CSV, IFTTT

or API call
Raw JSON log files
in HDFS
Each document an
event, daily record or
comms message
Hive Tables

w/ Elastic

Storage Handler
Index data turned
into tabular format
Health Data
Unstructured Comms
Data
Smart Home

Sensor Data
20
21
22
THIS TIME LAST YEAR…
23
24
•Graph, spatial and raster data processing for big data
•Primarily documented + tested against Oracle BDA
•Installable on commodity cluster using CDH
•Data stored in Apache HBase or Oracle NoSQL DB
•Complements Spatial & Graph in Oracle Database
•Designed for trillions of nodes, edges etc
•Out-of-the-box spatial enrichment services
•Over 35 of most popular graph analysis functions
•Graph traversal, recommendations
•Finding communities and influencers,
•Pattern matching
Oracle Big Data Spatial & Graph
25
CAN WE USE GRAPH ANALYSIS

AND ORACLE BIG DATA TO FIND OUT…
26
HOW THIS TWEET WENT VIRAL?
27
28
AND AROUND THE WORLD IN 24 HOURS?
29
30
3454Tweets, retweets and mentions 

over 48 hours
3017Twitters users commenting
30+Number of countries ‘WiFi Kettle” 

became meme or news item
•Tweets in HDFS files processed and transformed into OBDGS file format for HBase load
Loading Tweets (Edges) And Users (Vertices)
• Unique ID for the vertex
• Integer added via sequence in ODI
• Property name (“name”, “followers”)
• Vertex Property datatype and value
Vertex File (.opv)
• Unique ID for the edge
• Leading edge vertex ID
• Trailing edge vertex ID
• Edge Type (“tweet”)
• Edge Property (“timestamp” or “location”)
• Edge Property datatype and value
Edge File (.ope)
•Data loaded from files or through Java API into HBase
•In-Memory Analytics layer runs common graph and spatial algorithms on data
•Visualised using Cytoscape, R or in this example, Tom Sawyer Perspectives
Oracle Big Data Graph And Spatial Architecture
32
Massively Scalable Graph Store
• Oracle NoSQL
• HBase
Lightning-Fast In-Memory Analytics
• YARN Container
• Standalone Server
• Embedded
cfg = GraphConfigBuilder.forPropertyGraphHbase() 
.setName("connectionsHBase") 
.setZkQuorum("bigdatalite").setZkClientPort(2181) 
.setZkSessionTimeout(120000).setInitialEdgeNumRegions(3) 
.setInitialVertexNumRegions(3).setSplitsPerRegion(1) 
.build();
opg = OraclePropertyGraph.getInstance(cfg);
opg.clearRepository();
vfile=“../../data/kettle_nodes.opv"
efile=“../../data/kettle_edges.ope"
opgdl=OraclePropertyGraphDataLoader.getInstance();
opgdl.loadData(opg, vfile, efile, 2);
// read through the vertices
opg.getVertices();
// read through the edges
opg.getEdges();
Loading Edges And Vertices Into Hbase
33
Uses “Gremlin” Shell for HBase
• Creates connection to HBase
• Sets initial configuration for database
• Builds the database ready for load
• Defines location of Vertex and Edge files
• Creates instance of 

OraclePropertyGraphDataLoader
• Loads data from files
• Prepares the property graph for use
• Loads in Edges and Vertices
• Now ready for in-memory processing
•Plugin created by Oracle to add to open-source Cytoscape analysis tool
•Connects to HBase or NoSQL property graph
•Connect to PGX analytics engine
•Run Page Rank and other analyses
•Visualize property graph on-screen
•Search for nodes and edges using

Apache Solr search engine
Visualize And Analyze Using Cytoscape Plugin
34
Top 5 Influencers Based On Mentions, Retweets
35
36
@markrittman
@erinscafe
@internetofshit
•The story was picked-up by several influential Twitter users and online news sites
•ErinsCafe, BoingBoing, Internet of Sh*t
•Featured as a “Twitter Moment” on Day 1 PM
•Guardian Newspaper website Day 2 AM
•Influencers identified in two ways
•By number of followers in Twitter profile
•By number of connecting edges in tweets

Property Graph using Page Rank algorithm
Role Of Network Influencers In Meme Propagation
37
•But … how did they hear about the story?
Understanding How A User Joined Conversation
38
Visualising Potential Story Paths To Influencer Nodes
39
@philjoneswired
But did this tweet cause, or
just comment on, the virality?
We need to see the timeline…
•Filters PGX analysis on timestamp edge or vertex property when present in property graph
•Select start date, optional end date for filter
•Supports two-sided timeline in directed graphs
•View property graph as it develops over time
New Cytoscape Plugin Feature - Timeline Analysis
40
•The Timeline Analysis plugin for Cytoscape is useful and helps us filter by date range
•Another option for visualising property graphs is Tom Sawyer Perspectives
•Timeline analysis down to the hour - 3hr periods are perfect for this analysis
•Map visualization, network visualization
•Prototype using subsets of tweets in CSV files, or connect to full HBase/NoSQL dataset
Tom Sawyer Perspectives For Social Network Analysis
42
43
Day 1 : 10am GMT
44
Day 1 : 3pm GMT
45
Day 1 : 6pm GMT
46
Day 1 : 8pm GMT
47
Day 2 : 6am GMT
IT WAS @ERINSCAFE
48
IT WAS @ERINSCAFE
49
50
51
AND USING SPATIAL CO-ORDINATES

IN THE TWEET METADATA…
52
•Tweet went viral because it was picked-up on by a very well-connected Twitter user
•And why did that happen? Probably because the story “had legs”…
•Some mentions in the Twitter-verse before this but main viral explosion due to @erinscafe
•All subsequent activity including mentions by @guardian, @internetofsh*t followed that
•You can analyse Twitter and other meme breakouts using Oracle Big Data Spatial & Graph
•New Timeline Analysis feature in Cytoscape Plugin useful for time-slice analysis of data
•Tom Sawyer Perspectives provides even more visualisation incl. mapping analysis capabilities
•Thank you to Alan Wu, Juan Francisco & Hans Viehmann from Oracle, 

and Kevin Madden & Austris Krastiņš from Tom Sawyer for their help with the demos
Conclusions
54
T : @markrittman
HOW A TWEET WENT VIRAL
Mark Rittman, Oracle ACE Director & Independent Analyst

MJR Analytics ltd (http://www.mjr-analytics.com)
BIWA SUMMIT 2017, SAN FRANCISCO

Contenu connexe

Tendances

Tendances (20)

Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
SQL-on-Hadoop for Analytics + BI: What Are My Options, What's the Future?
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
 
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
Hadoop Data Lake vs classical Data Warehouse: How to utilize best of both wor...
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
 
Big Data & Data Lakes Building Blocks
Big Data & Data Lakes Building BlocksBig Data & Data Lakes Building Blocks
Big Data & Data Lakes Building Blocks
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 

Similaire à How a Tweet Went Viral - BIWA Summit 2017

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
Christopher Whitaker
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
Tillmann Eitelberg
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptx
VanshGupta597842
 

Similaire à How a Tweet Went Viral - BIWA Summit 2017 (20)

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Decode2018 report
Decode2018 reportDecode2018 report
Decode2018 report
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
 
Music streams
Music streamsMusic streams
Music streams
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
UI Dev in Big data world using open source
UI Dev in Big data world using open sourceUI Dev in Big data world using open source
UI Dev in Big data world using open source
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25
 
New big data architecture in hadoop.pptx
New big data architecture in hadoop.pptxNew big data architecture in hadoop.pptx
New big data architecture in hadoop.pptx
 
20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patenge20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patenge
 

Plus de Rittman Analytics

Plus de Rittman Analytics (15)

From Zero to One with Rittman Analytics
From Zero to One with Rittman AnalyticsFrom Zero to One with Rittman Analytics
From Zero to One with Rittman Analytics
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big Data
 
User Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity ModelUser Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity Model
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
 
Planning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data WarehousingPlanning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data Warehousing
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big Data
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
 
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
 
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 HoursAnalytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
 
Petabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and LookerPetabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and Looker
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

How a Tweet Went Viral - BIWA Summit 2017

  • 1. T : @markrittman HOW A TWEET WENT VIRAL Mark Rittman, Oracle ACE Director & Independent Analyst
 MJR Analytics ltd (http://www.mjr-analytics.com) BIWA SUMMIT 2017, SAN FRANCISCO
  • 2. •Oracle ACE Director, now Independent Analyst •Past ODTUG Executive Board Member •Author of two books on Oracle BI •Co-founder & CTO of Rittman Mead •15+ Years in Oracle BI, DW, ETL + now Big Data •Now working in analytics product management + strategy •Host of the Drill to Detail Podcast (www.drilltodetail.com) •Based in Brighton & work in London, UK About The Presenter 2
  • 4. •One of my personal interests is Home Automation •Started with Nest thermostat and Philips Hue lights •Extended the Nest system to include 
 Nest Protect and Nest Cam •Used Apple HomeKit, Apple TV for Siri voice control •Added Samsung Smart Things hub for Z-wave, 
 Zigbee compatibility •Linked Smart Things to Homekit using open-source HomeBridge project to enable for Siri control •Added Logitech Harmony for TV, Console, Roku Home Automation and Smart ‘IoT’ Devices Philips Hue 
 Lighting Nest Protect (X2), 
 Thermostat, Cam Withings
 Smart Scales Airplay
 Speakers Homebridge
 Homekit / Smarthings 
 Connector Samsung
 Smart Things Hub (Z-Wave, Zigbee) Door, Motion, Moisture,
 Presence Sensors Apple Homekit,
 Apple TV, Siri •Then Amazon Echo (x2) and Echo Dots (x4) to
 extend voice control + add Alexa skills •… and then Google Home + Chromecasts for
 hangouts, Google Assistant + Google Search
  • 5. Voice Control - When Home Automation Gets Real Philips Hue 
 Lighting Nest Protect (X2), 
 Thermostat, Cam Withings
 Smart Scales Airplay
 Speakers Samsung
 Smart Things Hub (Z-Wave, 
 Zigbee) Door, Motion, Moisture,
 Presence Sensors •Position multiple units around the house for 
 ubiquitous voice control and music playback •Integration with smart home devices •Use ML algorithms in the cloud, constantly improving and leveraging cloud-scale processing “Alexa, turn on the kitchen lights” “Hey Google, turn up the heating” Amazon Echo Google Home
  • 6. ONE DAY BACK IN SEPTEMBER 2016 … 6
  • 7. 7
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. 14
  • 15. 15
  • 17. THIS COULD BE INTERESTING… 17
  • 18. 18 All Device Data at Home Logged to Hadoop Cluster
  • 19. •Data extracted or transported to target platform using LogStash, CSV file batch loads •Landed into HDFS as JSON documents, then exposed as Hive tables using Storage Handler •Cataloged, visualised and analysed using Oracle Big Data Discovery + Python ML Other Personal Project : Home + Wearables Analytics 19 Data Transfer Data Access “Personal” Data Lake Jupyter
 Web Notebook 6 Node Hadoop Cluster (CDH5.5) Discovery & Development Labs
 Oracle Big Data Discovery 1.2 Data sets and samples Models and programs Oracle DV
 Desktop Models BDD Shell,
 Python, 
 Spark ML Data Factory LogStash
 via HTTP Manual
 CSV U/L Data streams CSV, IFTTT
 or API call Raw JSON log files in HDFS Each document an event, daily record or comms message Hive Tables
 w/ Elastic
 Storage Handler Index data turned into tabular format Health Data Unstructured Comms Data Smart Home
 Sensor Data
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. THIS TIME LAST YEAR… 23
  • 24. 24
  • 25. •Graph, spatial and raster data processing for big data •Primarily documented + tested against Oracle BDA •Installable on commodity cluster using CDH •Data stored in Apache HBase or Oracle NoSQL DB •Complements Spatial & Graph in Oracle Database •Designed for trillions of nodes, edges etc •Out-of-the-box spatial enrichment services •Over 35 of most popular graph analysis functions •Graph traversal, recommendations •Finding communities and influencers, •Pattern matching Oracle Big Data Spatial & Graph 25
  • 26. CAN WE USE GRAPH ANALYSIS
 AND ORACLE BIG DATA TO FIND OUT… 26
  • 27. HOW THIS TWEET WENT VIRAL? 27
  • 28. 28
  • 29. AND AROUND THE WORLD IN 24 HOURS? 29
  • 30. 30 3454Tweets, retweets and mentions 
 over 48 hours 3017Twitters users commenting 30+Number of countries ‘WiFi Kettle” 
 became meme or news item
  • 31. •Tweets in HDFS files processed and transformed into OBDGS file format for HBase load Loading Tweets (Edges) And Users (Vertices) • Unique ID for the vertex • Integer added via sequence in ODI • Property name (“name”, “followers”) • Vertex Property datatype and value Vertex File (.opv) • Unique ID for the edge • Leading edge vertex ID • Trailing edge vertex ID • Edge Type (“tweet”) • Edge Property (“timestamp” or “location”) • Edge Property datatype and value Edge File (.ope)
  • 32. •Data loaded from files or through Java API into HBase •In-Memory Analytics layer runs common graph and spatial algorithms on data •Visualised using Cytoscape, R or in this example, Tom Sawyer Perspectives Oracle Big Data Graph And Spatial Architecture 32 Massively Scalable Graph Store • Oracle NoSQL • HBase Lightning-Fast In-Memory Analytics • YARN Container • Standalone Server • Embedded
  • 33. cfg = GraphConfigBuilder.forPropertyGraphHbase() .setName("connectionsHBase") .setZkQuorum("bigdatalite").setZkClientPort(2181) .setZkSessionTimeout(120000).setInitialEdgeNumRegions(3) .setInitialVertexNumRegions(3).setSplitsPerRegion(1) .build(); opg = OraclePropertyGraph.getInstance(cfg); opg.clearRepository(); vfile=“../../data/kettle_nodes.opv" efile=“../../data/kettle_edges.ope" opgdl=OraclePropertyGraphDataLoader.getInstance(); opgdl.loadData(opg, vfile, efile, 2); // read through the vertices opg.getVertices(); // read through the edges opg.getEdges(); Loading Edges And Vertices Into Hbase 33 Uses “Gremlin” Shell for HBase • Creates connection to HBase • Sets initial configuration for database • Builds the database ready for load • Defines location of Vertex and Edge files • Creates instance of 
 OraclePropertyGraphDataLoader • Loads data from files • Prepares the property graph for use • Loads in Edges and Vertices • Now ready for in-memory processing
  • 34. •Plugin created by Oracle to add to open-source Cytoscape analysis tool •Connects to HBase or NoSQL property graph •Connect to PGX analytics engine •Run Page Rank and other analyses •Visualize property graph on-screen •Search for nodes and edges using
 Apache Solr search engine Visualize And Analyze Using Cytoscape Plugin 34
  • 35. Top 5 Influencers Based On Mentions, Retweets 35
  • 37. •The story was picked-up by several influential Twitter users and online news sites •ErinsCafe, BoingBoing, Internet of Sh*t •Featured as a “Twitter Moment” on Day 1 PM •Guardian Newspaper website Day 2 AM •Influencers identified in two ways •By number of followers in Twitter profile •By number of connecting edges in tweets
 Property Graph using Page Rank algorithm Role Of Network Influencers In Meme Propagation 37 •But … how did they hear about the story?
  • 38. Understanding How A User Joined Conversation 38
  • 39. Visualising Potential Story Paths To Influencer Nodes 39 @philjoneswired But did this tweet cause, or just comment on, the virality? We need to see the timeline…
  • 40. •Filters PGX analysis on timestamp edge or vertex property when present in property graph •Select start date, optional end date for filter •Supports two-sided timeline in directed graphs •View property graph as it develops over time New Cytoscape Plugin Feature - Timeline Analysis 40
  • 41. •The Timeline Analysis plugin for Cytoscape is useful and helps us filter by date range •Another option for visualising property graphs is Tom Sawyer Perspectives •Timeline analysis down to the hour - 3hr periods are perfect for this analysis •Map visualization, network visualization •Prototype using subsets of tweets in CSV files, or connect to full HBase/NoSQL dataset Tom Sawyer Perspectives For Social Network Analysis 42
  • 42. 43 Day 1 : 10am GMT
  • 43. 44 Day 1 : 3pm GMT
  • 44. 45 Day 1 : 6pm GMT
  • 45. 46 Day 1 : 8pm GMT
  • 46. 47 Day 2 : 6am GMT
  • 49. 50
  • 50. 51
  • 51. AND USING SPATIAL CO-ORDINATES
 IN THE TWEET METADATA… 52
  • 52.
  • 53. •Tweet went viral because it was picked-up on by a very well-connected Twitter user •And why did that happen? Probably because the story “had legs”… •Some mentions in the Twitter-verse before this but main viral explosion due to @erinscafe •All subsequent activity including mentions by @guardian, @internetofsh*t followed that •You can analyse Twitter and other meme breakouts using Oracle Big Data Spatial & Graph •New Timeline Analysis feature in Cytoscape Plugin useful for time-slice analysis of data •Tom Sawyer Perspectives provides even more visualisation incl. mapping analysis capabilities •Thank you to Alan Wu, Juan Francisco & Hans Viehmann from Oracle, 
 and Kevin Madden & Austris Krastiņš from Tom Sawyer for their help with the demos Conclusions 54
  • 54. T : @markrittman HOW A TWEET WENT VIRAL Mark Rittman, Oracle ACE Director & Independent Analyst
 MJR Analytics ltd (http://www.mjr-analytics.com) BIWA SUMMIT 2017, SAN FRANCISCO