SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
SEARCH'YOUR'TWEETS
SEARCH'LIKE'A'PROFESSIONAL
Motivation
• Twitter'represents'a'rich'flow'of'information
• Lack'of'an'effective'way'to'query'the'twitter
• Hard'to'monitor'interested'topics'at'real'time
Search'Tweets'Like'a'Professional
A'Real'Time'Twitter'Search'Engine'That'
Allows'you'to'Search'based'on:
•Keywords
◦Country
◦Language
◦Negative'words
Demo(http://searchyourtweet.info:5000/input)
Keep'an'eye'on'your'interested'topic
•Express'your'interest,'we'will'keep'you'update'on'the'newest'event
•Video'(https://youtu.be/GdRmXNfukos)
Data'pipeline
Query'Controller
Backend'Database
percolator
Logic'Layer Frontend
Searching'database
Data'Backup
Pub/Sub
Publish
Matching'query
Register'query
searching
Real'Time'Monitor'on'Twitter
◦Implemented'using'ElasticSearch Percolator
◦Think'it'as'“search'in'reverse”
◦ User'register'queries'into'percolator
◦ Percolator'match'incoming'documents'with'registered'queries
◦Challenge:
◦ How'to'design'the'percolator'data'pipeline?
◦ How'to'decouple'the'backend'database'with'frontend'server?
◦ Use'publish'/'subscribe'design'pattern
Real'Time'Monitor'Data'Flow
Percolator
Query'database
Twitter'database
Controller
Pub/Sub
New'incoming'tweets
publish
subscribe
Open'channel
Challenge
Build'a'high'throughput'real'time'
backend'data'pipeline?
• Use'Logstash!
◦ Highly Scalable
◦ Compatiblewith'different'sources'and'
destination
A'scalable'high'throughput' pipelineCurrent'backend'pipeline
Challenge
• Real'time'update'on'frontend'client:
• Instead'of'using'“setInterval()”'javascript function,'I'use'“socketIO”'to'keep'
socket'open'between'front^end'client'and'flask'server'
• Construct'ElasticSearch query
• Use'python'requests'library'to'query'ElasticSearch
• Fine'tuning'on'ElasticSearch
About'Me
M.Math,'University'of'Waterloo
◦ Field:'Statistics'and'Machine'Learning
B.S.,'University'of'Toronto
◦ Field:'Applied'Mathematics
Data'Scientist'Intern,'Neon'Inc.,'San'Francisco
Back^end'Model'Developer,'MetricAid Inc.,'Toronto
Experience'in'Deep'Learning:'
◦ Convolutional'Network,'Recurrent'Network
•OS/161'(a'simplified'POSIX'OS)
Questions?
Thank'you!'
Parallelization'of'percolator
• Will'consumes'a'lot'
hardware:'O(mn)
• Another'choice:
Luwak +'Samza

Contenu connexe

Similaire à Jinchao demo v6

Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, RedditMaking Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
Lucidworks
 

Similaire à Jinchao demo v6 (20)

Jinchao demo v3
Jinchao demo v3Jinchao demo v3
Jinchao demo v3
 
Jinchao demo
Jinchao demoJinchao demo
Jinchao demo
 
Connecting to the Pulse of the Planet with the Twitter Platform
Connecting to the Pulse of the Planet with the Twitter PlatformConnecting to the Pulse of the Planet with the Twitter Platform
Connecting to the Pulse of the Planet with the Twitter Platform
 
Building Social Tools
Building Social ToolsBuilding Social Tools
Building Social Tools
 
Unleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and InsightUnleashing Twitter Data for Fun and Insight
Unleashing Twitter Data for Fun and Insight
 
Unleashing twitter data for fun and insight
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insight
 
Working With Facebook, Twitter, et al. - Social Media Camp
Working With Facebook, Twitter, et al. - Social Media CampWorking With Facebook, Twitter, et al. - Social Media Camp
Working With Facebook, Twitter, et al. - Social Media Camp
 
Growth Hacking with Data: How to Find Big Growth with Deep Data Dives
Growth Hacking with Data: How to Find Big Growth with Deep Data DivesGrowth Hacking with Data: How to Find Big Growth with Deep Data Dives
Growth Hacking with Data: How to Find Big Growth with Deep Data Dives
 
How to Uncover Big Growth Opportunities with Data
How to Uncover Big Growth Opportunities with DataHow to Uncover Big Growth Opportunities with Data
How to Uncover Big Growth Opportunities with Data
 
Twitter for trainers webcast
Twitter for trainers webcastTwitter for trainers webcast
Twitter for trainers webcast
 
NPTs
NPTsNPTs
NPTs
 
PlayFab ugc gdc
PlayFab ugc gdcPlayFab ugc gdc
PlayFab ugc gdc
 
Everything You Wish You Knew About Search
Everything You Wish You Knew About SearchEverything You Wish You Knew About Search
Everything You Wish You Knew About Search
 
Mining Georeferenced Data
Mining Georeferenced DataMining Georeferenced Data
Mining Georeferenced Data
 
Tickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDBTickery, Pyjamas and FluidDB
Tickery, Pyjamas and FluidDB
 
South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...
South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...
South JVM Users Group Talk - Building Social Media Tools using JVM Supported ...
 
Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, RedditMaking Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
Making Reddit Search Relevant and Scalable - Anupama Joshi & Jerry Bao, Reddit
 
xAPI Camp-Correlating Results with xAPI
xAPI Camp-Correlating Results with xAPIxAPI Camp-Correlating Results with xAPI
xAPI Camp-Correlating Results with xAPI
 
Zemanta Fast Track To Social Publishing
Zemanta Fast Track To Social PublishingZemanta Fast Track To Social Publishing
Zemanta Fast Track To Social Publishing
 
Real-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsReal-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 chars
 

Dernier

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Jinchao demo v6