SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
.
Contributions
1. a brand new method for crawling social networks
2. a framework that can be used by social media to evaluate impact
◦ = probability for tweets to show up in hashtag streams
3. example analysis based on the above
.
The goal is...
..
.... to reverse engineer hashtag algorithm
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 2/21
...
2/21
.
Twitter Hashtags
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 3/21
...
3/21
.
Hashtag Streams
.
Hashtag Streams are ...
..
.... streams of tweets that show up when people search Twitter
• hashtag is the best way to search
• note: Twitter tries to phase out hashtags (and mentions), so search may find
tweets even without hashtags
.
Hashtags are Important...
..
.... because they are used by social media to promote events, products, etc.
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 4/21
...
4/21
.
Twitter Infographics
• Twitter promotes hashtags by releasing
infographics
• the content is very confusing for social
media
• hard to translate into numbers, concrete
actions, etc.
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 5/21
...
5/21
.
Twitter Infographics (2) : Zoom-Ins
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 6/21
...
6/21
.
Twitter Infographics (3) : Cleanup
YES
Decide
New Tag?
Will you
promote it?
Will you
add value?
Add to
hashtag
stream
Out Out
NO
NO
NO
YESYES
• all the garbage cleaned out, a much
clearer decision algorithms
• does not clarify what the value or
promotion mean in practice
• since Twitter does not help, we need to
reverse engineer the algorithm
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 7/21
...
7/21
.
Crawling vs Sampling
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 8/21
...
8/21
.
Crawling : Practice and Problems
• traditional crawling is done in commandline using wget or
curl
• problem1: Twitter and others try to avoid being crawled and created fences
(login, cookies, forwarding, JS post-loading, etc.)
• problem2: official APis are very restricted, Twitter API does not cover
search
• problem3: hard to use other services while crawling .... Twitter +
YouTube
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 9/21
...
9/21
.
Snowball Sampling
• the new way to look at sampling
• done in cycles:
1. sample something
2. select a wanted subset
3. sample the subset at a higher
depth
4. .... repeat
• snowball sampling is directly applicable
to crawling Twitter
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 10/21
...
10/21
.
Crawling : Two Approaches
• approach 1 (traditional) : use APIs (HTTP,
OAuth, etc.) to get data
• approach 2 (proposed) : attach your robot
to a working Twitter webapp in browser
◦ interaction is via clicks, just like human
◦ more natural
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 11/21
...
11/21
.
Implementation
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 12/21
...
12/21
.
Implementation : Twaater
• Chrome extension, auto-triggered by
loading a Twitter page
• storing logs in one's own Dropbox drive
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 13/21
...
13/21
.
Implementation : Twaater
• https://github.com/maratishe/twaater
• personalization
1. need to change Dropbox auth tokens to point to one's own drive
2. enter Twitter under own account and let Twaater pick up from here
• runs continuously, close browser when want to stop
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 14/21
...
14/21
.
Example Analysis
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 15/21
...
15/21
.
Twaater : Metric Space
• tweet metrics/counts: links, retweets,
favorites, tags, tagstatus, mentions
• + account metrics/counts: tweets, following,
followers
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 16/21
...
16/21
.
Twaater : Tweet Timelin
• all metrics change in time
• timeline of one tweet is very
important
• aggregates tweet status and its
position (if any) in hashtag streams
◦ for each hashtag contained in a tweet
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 17/21
...
17/21
.
Analysis : Rules and CCF
• lists : time serious of metrics versus time series ouf positions in hashtag
streams
◦ ccf( metric values, hashtag positions)
◦ note that there are alland tophashtag streams
• selection : pick a max in time series, and filter lists by threshold
◦ thresholds are different for each metric
◦ helps to filter out noise or focus only on large (important) values
• view showing up in hashtag streams as binary (yes/no) versus analog
(list position) values
• extras (future work) : analysis along the timeline, much higher complexity
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 18/21
...
18/21
.
Analysis : Results
0 0.1 0.2 0.3 0.4 0.5
Threshold (% of max)
-1.05
-0.7
-0.35
0.35
0.7
1.05
ccf
tags
links
mentions
retweets favorites
tweets
following
followers
tagstatus
all/binary
0 0.1 0.2 0.3 0.4 0.5
Threshold (% of max)
-1.05
-0.7
-0.35
0.35
0.7
1.05
ccf
tagslinks
mentions
retweets
favorites
tweets
following
followers
tagstatus
top/binary
0 0.1 0.2 0.3 0.4 0.5
Threshold (% of max)
-1.05
-0.7
-0.35
0.35
0.7
1.05
ccf
tags
links
mentions
retweets
favorites
tweets
following
followers
tagstatus
all/actual
0 0.1 0.2 0.3 0.4 0.5
Threshold (% of max)
-1.05
-0.7
-0.35
0.35
0.7
1.05
ccf
tagslinks
mentions
retweets
favorites
tweets following
followerstagstatus
top/actual
• binary: useless
• analog: filtering out
very low values (most)
helps reveal good
correlation
◦ for example,
favorites
contributes to tweets
showing up closer
to top in lists
• account metrics:
show no effect
• among large values,
tagstatus (topic
popularity) becomes
prominent
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 19/21
...
19/21
.
Future Work
• Twaater is own-centric, makes is possible to crowdsource/distribute
crawling
◦ fits the description of snowball sampling
• 2nd order statistics (CCF) did not reveal a simple hashtag algorithm
◦ more complicated models have to be tested
• alternatively smarter filtering can also help
◦ ... select a subset of important tweets to subject to analysis
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 20/21
...
20/21
.
That’s all, thank you ...
M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 21/21
...
21/21

Contenu connexe

En vedette

Reverse Engineering
Reverse EngineeringReverse Engineering
Reverse Engineeringdswanson
 
Reverse engineering & its application
Reverse engineering & its applicationReverse engineering & its application
Reverse engineering & its applicationmapqrs
 
Reverse Engineering Design
Reverse Engineering Design Reverse Engineering Design
Reverse Engineering Design Alston Menezes
 
Lan, man and wan ppt final
Lan, man and wan ppt finalLan, man and wan ppt final
Lan, man and wan ppt finalArushi Garg
 

En vedette (6)

Lan,wan and man
Lan,wan and manLan,wan and man
Lan,wan and man
 
Reverse Engineering
Reverse EngineeringReverse Engineering
Reverse Engineering
 
Reverse engineering & its application
Reverse engineering & its applicationReverse engineering & its application
Reverse engineering & its application
 
Reverse Engineering Design
Reverse Engineering Design Reverse Engineering Design
Reverse Engineering Design
 
Twitter PPT
Twitter PPTTwitter PPT
Twitter PPT
 
Lan, man and wan ppt final
Lan, man and wan ppt finalLan, man and wan ppt final
Lan, man and wan ppt final
 

Similaire à Reverse Engineering Twitter Hashtag Algorithm

The Mechanics of Social Media
The Mechanics of Social MediaThe Mechanics of Social Media
The Mechanics of Social MediaMatthew Gerrior
 
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Matthew Russell
 
How to Use Google Analytics to Drive SEO Benefit?
How to Use Google Analytics to  Drive SEO Benefit?How to Use Google Analytics to  Drive SEO Benefit?
How to Use Google Analytics to Drive SEO Benefit?Tatvic Analytics
 
Ad Council Adv Twitter
Ad Council Adv TwitterAd Council Adv Twitter
Ad Council Adv TwitterJenni Brand
 
Unit 28 Week 5
Unit 28 Week 5Unit 28 Week 5
Unit 28 Week 5MrJRogers
 
Schema and Open Graph 101 - SMX Munich
Schema and Open Graph 101 - SMX MunichSchema and Open Graph 101 - SMX Munich
Schema and Open Graph 101 - SMX MunichMatthew Brown
 
Requirements management in open source
Requirements management in open sourceRequirements management in open source
Requirements management in open sourceAliaksandr Astapenka
 
Product Tank Jakarta - Communication
Product Tank Jakarta - CommunicationProduct Tank Jakarta - Communication
Product Tank Jakarta - CommunicationArtanto Ishaam
 
Social Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - HandoutSocial Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - HandoutMatthew J. Kushin, Ph.D.
 
Alan Morte: Making Conversion Rate Your B**** With Google Analytics
Alan Morte: Making Conversion Rate Your B**** With Google AnalyticsAlan Morte: Making Conversion Rate Your B**** With Google Analytics
Alan Morte: Making Conversion Rate Your B**** With Google AnalyticsDFWSEM
 
Magento 2 Best Practice Workfow // David Lambauer // Meet Magento 2017 // Lei...
Magento 2 Best Practice Workfow // David Lambauer // Meet Magento 2017 // Lei...Magento 2 Best Practice Workfow // David Lambauer // Meet Magento 2017 // Lei...
Magento 2 Best Practice Workfow // David Lambauer // Meet Magento 2017 // Lei...AOE
 
With Great Nerdery Comes Great Responsibility
With Great Nerdery Comes Great Responsibility With Great Nerdery Comes Great Responsibility
With Great Nerdery Comes Great Responsibility John Anderson
 
Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insightDigital Reasoning
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightMatthew Russell
 
TechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO ExperimentationTechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO ExperimentationCatalyst
 
Maximising Your SERP Potential - Enhance your listings with Rich Snippets
Maximising Your SERP Potential - Enhance your listings with Rich Snippets Maximising Your SERP Potential - Enhance your listings with Rich Snippets
Maximising Your SERP Potential - Enhance your listings with Rich Snippets Peter Handley
 
The Future of https in Search
The Future of https in SearchThe Future of https in Search
The Future of https in Searchsemrush_webinars
 
MidwestPHP - Getting Started with Magento 2
MidwestPHP - Getting Started with Magento 2MidwestPHP - Getting Started with Magento 2
MidwestPHP - Getting Started with Magento 2Mathew Beane
 

Similaire à Reverse Engineering Twitter Hashtag Algorithm (20)

The Mechanics of Social Media
The Mechanics of Social MediaThe Mechanics of Social Media
The Mechanics of Social Media
 
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
 
How to Use Google Analytics to Drive SEO Benefit?
How to Use Google Analytics to  Drive SEO Benefit?How to Use Google Analytics to  Drive SEO Benefit?
How to Use Google Analytics to Drive SEO Benefit?
 
Ad Council Adv Twitter
Ad Council Adv TwitterAd Council Adv Twitter
Ad Council Adv Twitter
 
Unit 28 Week 5
Unit 28 Week 5Unit 28 Week 5
Unit 28 Week 5
 
Schema and Open Graph 101 - SMX Munich
Schema and Open Graph 101 - SMX MunichSchema and Open Graph 101 - SMX Munich
Schema and Open Graph 101 - SMX Munich
 
Requirements management in open source
Requirements management in open sourceRequirements management in open source
Requirements management in open source
 
Product Tank Jakarta - Communication
Product Tank Jakarta - CommunicationProduct Tank Jakarta - Communication
Product Tank Jakarta - Communication
 
Social Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - HandoutSocial Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - Handout
 
Alan Morte: Making Conversion Rate Your B**** With Google Analytics
Alan Morte: Making Conversion Rate Your B**** With Google AnalyticsAlan Morte: Making Conversion Rate Your B**** With Google Analytics
Alan Morte: Making Conversion Rate Your B**** With Google Analytics
 
Magento 2 Best Practice Workfow // David Lambauer // Meet Magento 2017 // Lei...
Magento 2 Best Practice Workfow // David Lambauer // Meet Magento 2017 // Lei...Magento 2 Best Practice Workfow // David Lambauer // Meet Magento 2017 // Lei...
Magento 2 Best Practice Workfow // David Lambauer // Meet Magento 2017 // Lei...
 
With Great Nerdery Comes Great Responsibility
With Great Nerdery Comes Great Responsibility With Great Nerdery Comes Great Responsibility
With Great Nerdery Comes Great Responsibility
 
Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insight
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and Insight
 
TechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO ExperimentationTechSEO Boost 2021 - SEO Experimentation
TechSEO Boost 2021 - SEO Experimentation
 
Maximising Your SERP Potential - Enhance your listings with Rich Snippets
Maximising Your SERP Potential - Enhance your listings with Rich Snippets Maximising Your SERP Potential - Enhance your listings with Rich Snippets
Maximising Your SERP Potential - Enhance your listings with Rich Snippets
 
The Future of https in Search
The Future of https in SearchThe Future of https in Search
The Future of https in Search
 
MidwestPHP - Getting Started with Magento 2
MidwestPHP - Getting Started with Magento 2MidwestPHP - Getting Started with Magento 2
MidwestPHP - Getting Started with Magento 2
 
CustomThesis
CustomThesisCustomThesis
CustomThesis
 
CustomThesis
CustomThesisCustomThesis
CustomThesis
 

Plus de Tokyo University of Science

A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...Tokyo University of Science
 
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless SpacesUltrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless SpacesTokyo University of Science
 
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...Tokyo University of Science
 
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?Tokyo University of Science
 
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...Tokyo University of Science
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsTokyo University of Science
 
Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...Tokyo University of Science
 
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...Tokyo University of Science
 
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through NetworkingThe Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through NetworkingTokyo University of Science
 
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...Tokyo University of Science
 
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless SpacesBulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless SpacesTokyo University of Science
 
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness SpacesFog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness SpacesTokyo University of Science
 
On a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching LogicOn a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching LogicTokyo University of Science
 
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless BeaconsImage-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless BeaconsTokyo University of Science
 
Complexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsComplexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsTokyo University of Science
 
The Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service NetworksThe Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service NetworksTokyo University of Science
 
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in CloudsTokyo University of Science
 
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out Code3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out CodeTokyo University of Science
 
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor FeedbackTowards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor FeedbackTokyo University of Science
 
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...Tokyo University of Science
 

Plus de Tokyo University of Science (20)

A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
 
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless SpacesUltrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
 
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Tra...
 
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?What if We Atomize Student Data and Apps and Put Them on Docker Containers?
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
 
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...Taking the Step from Software to Product Development \\ when teaching PBL at ...
Taking the Step from Software to Product Development \\ when teaching PBL at ...
 
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
 
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through NetworkingThe Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
 
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
 
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless SpacesBulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
 
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness SpacesFog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
 
On a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching LogicOn a Hybrid Packets-and-Circuits Switching Logic
On a Hybrid Packets-and-Circuits Switching Logic
 
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless BeaconsImage-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
 
Complexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on MetromapsComplexity Resolution Control for Context Based on Metromaps
Complexity Resolution Control for Context Based on Metromaps
 
The Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service NetworksThe Declarative-Coordinated Model for Self-Optimization of Service Networks
The Declarative-Coordinated Model for Self-Optimization of Service Networks
 
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
 
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out Code3-Way Scripts as a Base Unit for Flexible Scale-Out Code
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
 
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor FeedbackTowards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
 
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
 

Dernier

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Dernier (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Reverse Engineering Twitter Hashtag Algorithm

  • 1.
  • 2. . Contributions 1. a brand new method for crawling social networks 2. a framework that can be used by social media to evaluate impact ◦ = probability for tweets to show up in hashtag streams 3. example analysis based on the above . The goal is... .. .... to reverse engineer hashtag algorithm M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 2/21 ... 2/21
  • 3. . Twitter Hashtags M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 3/21 ... 3/21
  • 4. . Hashtag Streams . Hashtag Streams are ... .. .... streams of tweets that show up when people search Twitter • hashtag is the best way to search • note: Twitter tries to phase out hashtags (and mentions), so search may find tweets even without hashtags . Hashtags are Important... .. .... because they are used by social media to promote events, products, etc. M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 4/21 ... 4/21
  • 5. . Twitter Infographics • Twitter promotes hashtags by releasing infographics • the content is very confusing for social media • hard to translate into numbers, concrete actions, etc. M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 5/21 ... 5/21
  • 6. . Twitter Infographics (2) : Zoom-Ins M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 6/21 ... 6/21
  • 7. . Twitter Infographics (3) : Cleanup YES Decide New Tag? Will you promote it? Will you add value? Add to hashtag stream Out Out NO NO NO YESYES • all the garbage cleaned out, a much clearer decision algorithms • does not clarify what the value or promotion mean in practice • since Twitter does not help, we need to reverse engineer the algorithm M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 7/21 ... 7/21
  • 8. . Crawling vs Sampling M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 8/21 ... 8/21
  • 9. . Crawling : Practice and Problems • traditional crawling is done in commandline using wget or curl • problem1: Twitter and others try to avoid being crawled and created fences (login, cookies, forwarding, JS post-loading, etc.) • problem2: official APis are very restricted, Twitter API does not cover search • problem3: hard to use other services while crawling .... Twitter + YouTube M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 9/21 ... 9/21
  • 10. . Snowball Sampling • the new way to look at sampling • done in cycles: 1. sample something 2. select a wanted subset 3. sample the subset at a higher depth 4. .... repeat • snowball sampling is directly applicable to crawling Twitter M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 10/21 ... 10/21
  • 11. . Crawling : Two Approaches • approach 1 (traditional) : use APIs (HTTP, OAuth, etc.) to get data • approach 2 (proposed) : attach your robot to a working Twitter webapp in browser ◦ interaction is via clicks, just like human ◦ more natural M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 11/21 ... 11/21
  • 12. . Implementation M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 12/21 ... 12/21
  • 13. . Implementation : Twaater • Chrome extension, auto-triggered by loading a Twitter page • storing logs in one's own Dropbox drive M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 13/21 ... 13/21
  • 14. . Implementation : Twaater • https://github.com/maratishe/twaater • personalization 1. need to change Dropbox auth tokens to point to one's own drive 2. enter Twitter under own account and let Twaater pick up from here • runs continuously, close browser when want to stop M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 14/21 ... 14/21
  • 15. . Example Analysis M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 15/21 ... 15/21
  • 16. . Twaater : Metric Space • tweet metrics/counts: links, retweets, favorites, tags, tagstatus, mentions • + account metrics/counts: tweets, following, followers M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 16/21 ... 16/21
  • 17. . Twaater : Tweet Timelin • all metrics change in time • timeline of one tweet is very important • aggregates tweet status and its position (if any) in hashtag streams ◦ for each hashtag contained in a tweet M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 17/21 ... 17/21
  • 18. . Analysis : Rules and CCF • lists : time serious of metrics versus time series ouf positions in hashtag streams ◦ ccf( metric values, hashtag positions) ◦ note that there are alland tophashtag streams • selection : pick a max in time series, and filter lists by threshold ◦ thresholds are different for each metric ◦ helps to filter out noise or focus only on large (important) values • view showing up in hashtag streams as binary (yes/no) versus analog (list position) values • extras (future work) : analysis along the timeline, much higher complexity M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 18/21 ... 18/21
  • 19. . Analysis : Results 0 0.1 0.2 0.3 0.4 0.5 Threshold (% of max) -1.05 -0.7 -0.35 0.35 0.7 1.05 ccf tags links mentions retweets favorites tweets following followers tagstatus all/binary 0 0.1 0.2 0.3 0.4 0.5 Threshold (% of max) -1.05 -0.7 -0.35 0.35 0.7 1.05 ccf tagslinks mentions retweets favorites tweets following followers tagstatus top/binary 0 0.1 0.2 0.3 0.4 0.5 Threshold (% of max) -1.05 -0.7 -0.35 0.35 0.7 1.05 ccf tags links mentions retweets favorites tweets following followers tagstatus all/actual 0 0.1 0.2 0.3 0.4 0.5 Threshold (% of max) -1.05 -0.7 -0.35 0.35 0.7 1.05 ccf tagslinks mentions retweets favorites tweets following followerstagstatus top/actual • binary: useless • analog: filtering out very low values (most) helps reveal good correlation ◦ for example, favorites contributes to tweets showing up closer to top in lists • account metrics: show no effect • among large values, tagstatus (topic popularity) becomes prominent M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 19/21 ... 19/21
  • 20. . Future Work • Twaater is own-centric, makes is possible to crowdsource/distribute crawling ◦ fits the description of snowball sampling • 2nd order statistics (CCF) did not reveal a simple hashtag algorithm ◦ more complicated models have to be tested • alternatively smarter filtering can also help ◦ ... select a subset of important tweets to subject to analysis M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 20/21 ... 20/21
  • 21. . That’s all, thank you ... M.Zhanikeev -- maratishe@gmail.com -- Reverse Engineering Twitter Hashtag Algorithm -- http://bit.do/marat140614 21/21 ... 21/21