SlideShare une entreprise Scribd logo
1  sur  25
An Efficient incremental indexing
mechanism for extracting Top-k
representative queries over continuous
data streams
Y.S. Horawalavithana, D.N. Ranasinghe
Adaptive and Reflective Middleware (ARM)
ACM/IFIP/USENIX Middleware
Vancouver, BC, Canada
December 08, 2015
1
University of Colombo School of Computing,
Sri Lanka
2
Overview
• Motivation
• Adaptive Diversification
• Incremental Top-k
• Evaluation
• Conclusion
• Future work
3
4
Diversity: Top-k representative set
Representative Top-kDrawback
(without diversity)
What we want
(with diversity)
Method to retrieve Top-k publications from matching publications
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
5
Minimum independent-dominating set
𝑝1
𝑝2
𝑝3
𝑝4
𝑝5
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2
𝛼
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2

𝑣1
𝑣4
𝑣3
𝑣2
𝑣5
𝑣1
𝑣4
𝑣3
𝑣2
𝑣5
  jijiji ppppdppodNeighborho  ,|)(
𝑣1
𝑣4
𝑣3𝑣2
𝑣5
Publication
space
Graph
model
Independent, dominating Independent, dominating Independent, dominating Dominating, not independent
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
6
NAÏVE Greedy argmax
𝑟(𝑝𝑖)2
𝑝 𝑗∈𝑁(𝑝 𝑖) 𝑟(𝑝𝑗) × 𝑑(𝑝𝑖, 𝑝𝑗)
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
7
Handling streaming publications
𝑝1
𝑝2
𝑝3
𝑝4
𝑝5
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2𝛼
𝑝6
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2𝑣6
Continuity Requirements
1. Durability
an item is selected as diversified in 𝑖 𝑡ℎ window may still have the chance to be in 𝑖 + 1 𝑡ℎ window
if it's not expired & other valid items in 𝑖 + 1 𝑡ℎ
window are failed to compete with it.
2. Order
Publication stream follow the chronological order
We avoid the selection of item j as diverse later, when we already selected an item i which is not-
older than j.
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
8
Adaptive Diversification
𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. ....
Matching publication stream
𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. ....
ith window
(i+1)th window
𝑆𝑖
∗
𝑆𝑖+1
∗
Independence
Dominance
Durability
Order
 Straightforward solution:
 Apply naïve greedy method at each instance
 Propose incremental index mechanism!
 Avoid the curse of re-calculating neighborhood
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
9
Locality Sensitive Hashing (LSH)
 Simple Idea
 if two points are close together, then after a “projection” operation these two
points will remain close together
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
10
LSH in Adaptive Diversification:
Publications as categorical data
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
11
LSH in Adaptive Diversification:
Characteristic Matrix
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
12
LSH in Adaptive Diversification:
Minhashing
 No Publications any more!
 Signature to represent
 Technique
 Randomly permute the rows at
characteristic matrix m times
 Take the number of the 1st row, in
the permuted order,
 which the column has a 1 for
the correspondent column of
publications.
First permutation of rows at characteristic matrix
 Advantage:
 Reduce the dimensions into a small
minhash signature
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
13
LSH in Adaptive Diversification:
Signature Matrix
Fast-minhashing
Select m number of random hash
functions
To model the effect of m number of
random permutation
Mathematically proved only when,
The number of rows is a prime.
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
14
LSH in Adaptive Diversification:
LSH Buckets
 Take r sized
signature vectors
 From m sized
minhash-
signature
 Map them into,
 L Hash-Tables
 Each with
arbitrary b
number of
buckets
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
15
LSH in Adaptive Diversification:
Batch-wise Top-k computation
 Bucket “Winner” – a publication which has the
highest relevancy score
 Winner is dominant to represent it's bucket
neighborhood
 Top-k "winners“ that have a majority of votes
 k winners are independent
𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . .
ith
window
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
16
LSH in Dynamic Diversification:
Incremental Top-k computation
𝑁𝑒𝑤 𝑝𝑢𝑏𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑖 𝑈𝑝𝑑𝑎𝑡𝑒 𝑖 𝑡ℎ
𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑣𝑒𝑐𝑡𝑜𝑟
Characteristic
Matrix
𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑖 𝑡ℎ
𝑚𝑖𝑛ℎ𝑎𝑠ℎ 𝑠𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒
Signature
Matrix
Map 𝑖 𝑡ℎ
signature
into L hash-tables
Update “Winner” at
bucket 𝑖 𝑡ℎ
signature
maps into
Vote 𝑇𝑜𝑝 − 𝑘 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
17
LSH in Dynamic Diversification:
When new publication F arrives…
 Only buckets 𝐵13
, 𝐵23
, 𝐵32
, 𝐵43
will vote
 Follow continuity requirements
 Durability
 Order
𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . .
ith
window
(i+1)th
window

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
18
LSH in Adaptive Diversification:
Analysis
For two vectors x,y
𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ;
𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 =
𝑥 ∩ 𝑦
𝑥 ∪ 𝑦
 For publications x & y
𝐽𝑆𝐼𝑀 𝑥, 𝑦 ∝ 𝑃𝑟𝑜𝑏 𝐻 𝑥 = 𝐻 𝑦
 At a particular hash table
 x & y map into the same bucket:
𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
 x & y does not map into the same bucket:
1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
 At L Hash-tables
 x & y does not map into the same bucket:
(1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
) 𝐿 1 − (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏) 𝐿
True near neighbors will
be unlikely to be unlucky
in all the projections
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
Publication Stream  Zipfian subscriptions
 Normalized preferences
19
Evaluation:
Dataset
Amazon on-line market place data available at 17th – 19th November 2014
𝑧𝑖𝑝𝑓 𝑘: 𝑠, 𝑁 =
1
𝑘 𝑠
𝑛=1
𝑁
(
1
𝑛 𝑠)
N - number of elements in distribution,
k - rank of element
s - value of exponent
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑏𝑐𝑟𝑖𝑏𝑒𝑟 𝑣𝑖𝑒𝑤𝑠
=
𝑖=2
32
48 𝑐 𝑖
+ 42 𝑐 𝑖
+ 54 𝑐 𝑖
+ 66 𝑐 𝑖
+ 57 𝑐 𝑖
+ 67 𝑐 𝑖
20
Terminology
ILSH, BLSH and NAÏVE
𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 . .
BLSH
or
NAIVE
BLSH
or
NAIVE
BLSH
or
NAIVE
BLSH
or
NAIVE
ILSH
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
21
Accuracy:
ILSH vs. NAÏVE
Probability of producing optimal diverse set of results by ILSH under Jaccard similarity threshold (s)
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
22
Performance & Efficiency:
ILSH vs. BLSH vs. NAÏVE
log (Top-k matching time) on number of publications with D=500
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
23
Conclusions
 Locality Sensitive Hashing (LSH) indexing method
 Produce diverse set of results at average 70% accuracy over naïve method
 Reduce the matching time very significantly over NAÏVE method
 Further, refine by it’s incremental version
 For handling streaming publications
 Avoid the curse of re-computing neighborhoods
 Top k to restrict the delivery of Top publications
 Given a window size & delivery method
 Model can produce best diverse set of personalized results
 To represent the set of all matching publications at given instance
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
24
Future work
 Explore other suitable use-cases to apply proposed model & develop
prototype applications, E.g.
 Personalized newspaper for every Facebook user
 Adaptive resource scheduling in large scale distributed system
 Exploit overlap among diversified results of users who have similar interest
 Develop LSH based index over multi-threaded distributed environment
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
25
Q&A
THANK YOU!

Contenu connexe

Similaire à [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Po-Chuan Chen
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Po-Chuan Chen
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015
Big Data Spain
 
Group713_ProgressDraft
Group713_ProgressDraftGroup713_ProgressDraft
Group713_ProgressDraft
Sarp Uzel
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
Shrayes Ramesh
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab
 

Similaire à [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation (20)

[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015
 
Group713_ProgressDraft
Group713_ProgressDraftGroup713_ProgressDraft
Group713_ProgressDraft
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
 
Scalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHScalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSH
 
Deep learning Unit1 BasicsAllllllll.pptx
Deep learning Unit1 BasicsAllllllll.pptxDeep learning Unit1 BasicsAllllllll.pptx
Deep learning Unit1 BasicsAllllllll.pptx
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
nber_slides.pdf
nber_slides.pdfnber_slides.pdf
nber_slides.pdf
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Stack and Hash Table
Stack and Hash TableStack and Hash Table
Stack and Hash Table
 
The data, they are a-changin’
The data, they are a-changin’The data, they are a-changin’
The data, they are a-changin’
 
key.net
key.netkey.net
key.net
 

Plus de Sameera Horawalavithana

Plus de Sameera Horawalavithana (16)

Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and Simulation
 
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 
Dancing with Stream Processing
Dancing with Stream ProcessingDancing with Stream Processing
Dancing with Stream Processing
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015
 
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Zipf distribution
Zipf distributionZipf distribution
Zipf distribution
 
Query personalization
Query personalizationQuery personalization
Query personalization
 
Dancing with publish/subscribe
Dancing with publish/subscribeDancing with publish/subscribe
Dancing with publish/subscribe
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
 

Dernier

怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 

Dernier (20)

Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 

[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

  • 1. An Efficient incremental indexing mechanism for extracting Top-k representative queries over continuous data streams Y.S. Horawalavithana, D.N. Ranasinghe Adaptive and Reflective Middleware (ARM) ACM/IFIP/USENIX Middleware Vancouver, BC, Canada December 08, 2015 1 University of Colombo School of Computing, Sri Lanka
  • 2. 2 Overview • Motivation • Adaptive Diversification • Incremental Top-k • Evaluation • Conclusion • Future work
  • 3. 3
  • 4. 4 Diversity: Top-k representative set Representative Top-kDrawback (without diversity) What we want (with diversity) Method to retrieve Top-k publications from matching publications 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 5. 5 Minimum independent-dominating set 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2 𝛼 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2  𝑣1 𝑣4 𝑣3 𝑣2 𝑣5 𝑣1 𝑣4 𝑣3 𝑣2 𝑣5   jijiji ppppdppodNeighborho  ,|)( 𝑣1 𝑣4 𝑣3𝑣2 𝑣5 Publication space Graph model Independent, dominating Independent, dominating Independent, dominating Dominating, not independent 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 6. 6 NAÏVE Greedy argmax 𝑟(𝑝𝑖)2 𝑝 𝑗∈𝑁(𝑝 𝑖) 𝑟(𝑝𝑗) × 𝑑(𝑝𝑖, 𝑝𝑗) 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 7. 7 Handling streaming publications 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2𝛼 𝑝6 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2𝑣6 Continuity Requirements 1. Durability an item is selected as diversified in 𝑖 𝑡ℎ window may still have the chance to be in 𝑖 + 1 𝑡ℎ window if it's not expired & other valid items in 𝑖 + 1 𝑡ℎ window are failed to compete with it. 2. Order Publication stream follow the chronological order We avoid the selection of item j as diverse later, when we already selected an item i which is not- older than j. 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 8. 8 Adaptive Diversification 𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. .... Matching publication stream 𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. .... ith window (i+1)th window 𝑆𝑖 ∗ 𝑆𝑖+1 ∗ Independence Dominance Durability Order  Straightforward solution:  Apply naïve greedy method at each instance  Propose incremental index mechanism!  Avoid the curse of re-calculating neighborhood 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 9. 9 Locality Sensitive Hashing (LSH)  Simple Idea  if two points are close together, then after a “projection” operation these two points will remain close together 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 10. 10 LSH in Adaptive Diversification: Publications as categorical data 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 11. 11 LSH in Adaptive Diversification: Characteristic Matrix 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 12. 12 LSH in Adaptive Diversification: Minhashing  No Publications any more!  Signature to represent  Technique  Randomly permute the rows at characteristic matrix m times  Take the number of the 1st row, in the permuted order,  which the column has a 1 for the correspondent column of publications. First permutation of rows at characteristic matrix  Advantage:  Reduce the dimensions into a small minhash signature 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 13. 13 LSH in Adaptive Diversification: Signature Matrix Fast-minhashing Select m number of random hash functions To model the effect of m number of random permutation Mathematically proved only when, The number of rows is a prime. 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 14. 14 LSH in Adaptive Diversification: LSH Buckets  Take r sized signature vectors  From m sized minhash- signature  Map them into,  L Hash-Tables  Each with arbitrary b number of buckets 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 15. 15 LSH in Adaptive Diversification: Batch-wise Top-k computation  Bucket “Winner” – a publication which has the highest relevancy score  Winner is dominant to represent it's bucket neighborhood  Top-k "winners“ that have a majority of votes  k winners are independent 𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . . ith window 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 16. 16 LSH in Dynamic Diversification: Incremental Top-k computation 𝑁𝑒𝑤 𝑝𝑢𝑏𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑖 𝑈𝑝𝑑𝑎𝑡𝑒 𝑖 𝑡ℎ 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑣𝑒𝑐𝑡𝑜𝑟 Characteristic Matrix 𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑖 𝑡ℎ 𝑚𝑖𝑛ℎ𝑎𝑠ℎ 𝑠𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒 Signature Matrix Map 𝑖 𝑡ℎ signature into L hash-tables Update “Winner” at bucket 𝑖 𝑡ℎ signature maps into Vote 𝑇𝑜𝑝 − 𝑘 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 17. 17 LSH in Dynamic Diversification: When new publication F arrives…  Only buckets 𝐵13 , 𝐵23 , 𝐵32 , 𝐵43 will vote  Follow continuity requirements  Durability  Order 𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . . ith window (i+1)th window  1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 18. 18 LSH in Adaptive Diversification: Analysis For two vectors x,y 𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ; 𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 = 𝑥 ∩ 𝑦 𝑥 ∪ 𝑦  For publications x & y 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ∝ 𝑃𝑟𝑜𝑏 𝐻 𝑥 = 𝐻 𝑦  At a particular hash table  x & y map into the same bucket: 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏  x & y does not map into the same bucket: 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏  At L Hash-tables  x & y does not map into the same bucket: (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏 ) 𝐿 1 − (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏) 𝐿 True near neighbors will be unlikely to be unlucky in all the projections 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 19. Publication Stream  Zipfian subscriptions  Normalized preferences 19 Evaluation: Dataset Amazon on-line market place data available at 17th – 19th November 2014 𝑧𝑖𝑝𝑓 𝑘: 𝑠, 𝑁 = 1 𝑘 𝑠 𝑛=1 𝑁 ( 1 𝑛 𝑠) N - number of elements in distribution, k - rank of element s - value of exponent 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑏𝑐𝑟𝑖𝑏𝑒𝑟 𝑣𝑖𝑒𝑤𝑠 = 𝑖=2 32 48 𝑐 𝑖 + 42 𝑐 𝑖 + 54 𝑐 𝑖 + 66 𝑐 𝑖 + 57 𝑐 𝑖 + 67 𝑐 𝑖
  • 20. 20 Terminology ILSH, BLSH and NAÏVE 𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 . . BLSH or NAIVE BLSH or NAIVE BLSH or NAIVE BLSH or NAIVE ILSH 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 21. 21 Accuracy: ILSH vs. NAÏVE Probability of producing optimal diverse set of results by ILSH under Jaccard similarity threshold (s) 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 22. 22 Performance & Efficiency: ILSH vs. BLSH vs. NAÏVE log (Top-k matching time) on number of publications with D=500 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 23. 23 Conclusions  Locality Sensitive Hashing (LSH) indexing method  Produce diverse set of results at average 70% accuracy over naïve method  Reduce the matching time very significantly over NAÏVE method  Further, refine by it’s incremental version  For handling streaming publications  Avoid the curse of re-computing neighborhoods  Top k to restrict the delivery of Top publications  Given a window size & delivery method  Model can produce best diverse set of personalized results  To represent the set of all matching publications at given instance 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 24. 24 Future work  Explore other suitable use-cases to apply proposed model & develop prototype applications, E.g.  Personalized newspaper for every Facebook user  Adaptive resource scheduling in large scale distributed system  Exploit overlap among diversified results of users who have similar interest  Develop LSH based index over multi-threaded distributed environment 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Notes de l'éditeur

  1. each user gets exposed to more than 1,500 stories each day, but an average user would only get to see about 
  2. Since similar publications have the tendency to map into same bucket at probability 1 − d, dominance condition can be well served. Because the "winner" publication as the most relevant publication at each bucket, can cover it's neighborhood. Also two buckets represent two separate neighborhoods. That results all "winner" publications to be dis-similar from each other by at least d distance. So it also satises the independence condition
  3. Talk on ILSH update cost, because of maintaining a large characteristic matrix