SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Addressing scalability challenges
in peer-to-peer search
PhD seminar
4 Feb, 2014
Harisankar H,
PhD scholar,
DOS lab, Dept. of CSE
Advisor: Prof. D. Janakiram
http://harisankarh.wordpress.com
Outline
• Issues with centralized search
– Can peer-to-peer search help?

• Scalability challenges in peer-to-peer search
• Proposed architectural extensions
– Two-layered architecture for peer-to-peer concept
search
– Cloud-assisted approach to handle query spikes
Centralized search scenario
• Scenario
– Search engines crawl available content, index and
maintain it in data centers
– User queries directed to data centers, processed
internally and results sent back
– Centrally managed by single company

Content

End users

Datacenters
Some issues with centralized search
– Privacy concerns
• All user queries accessible from a single location

– Centralized control
• Individual companies decide what to(not to) index, rank
etc.

– Transparency
• Complete details of ranking, pre-processing etc. not
made available publicly
• Concerns of censorship and doctoring of results
Some issues with centralized search contd..
• Uses mostly syntactic search techniques
– Based on word or multi-word phrases
– Low quality of results due to ambiguity of natural language

• Issues with centralized semantic search
– Difficult to capture long tail of niche interests of users
• Requires rich human generated knowledge bases in numerous
niche areas
Peer-to-peer search approach
• Edge nodes in the internet participate in
providing and using the search service
• Search as a collaborative service
• Crawling, indexing and search distributed
across the peers
How could peer-to-peer search help?
• Each user query can be sent to a different peer among
millions
– Obtaining query logs in a single location difficult
– Reduced privacy concerns

• Distributed control across numerous peers
– Avoids centralized control

• Search application available with all peers
– Better transparency in ranking etc.

• Background knowledge of peers can be utilized for
effective semantic search
– Can help improve quality of results

• Led to lot of academic research in the area as well as
real world p2p search engines*
* e.g., faroo.com, yacy.net; YacyPi Kickstarter project
Realizing peer-to-peer search
• Distribution of search
index
– Term partitioning
• Responsibility of individual
terms assigned to different
peers
– E.g., peer1 is currently
responsible for term
“computer”

• Term-to-peer mapping
achieved through a
structured overlay(e.g., DHT)
Image src: http://wwarodomfr.blogspot.in/2008/09/chord-routing.html
Scalability challenges in peer-to-peer
search
•
•
•
•

Peers share only idle resources
Peers join/leave autonomously
Limited individual resources
leads to

No SLA

– Peer bandwidth bottleneck during query processing
• Particularly queries involving multiple terms(index transfer
between multiple peers)

– Instability during query spikes

• Knowledge management issues at large scale
– Difficult to have consensus at large scale
– Need wide understanding and have to meet requirements of
large diverse group
Two-layered architecture for peer-topeer concept search*
• Peers organized as communities based on common
interest
• Each community maintains its own background
knowledge to use in semantic search
– Maintained in a distributed manner

• A global layer with aggregate information to facilitate
search across communities
• Background knowledge bases extend from minimal
universally accepted knowledge in upper layer
• Search, indexing and knowledge management
proceeds independently in each community
*joint work with Prof. Fausto Guinchiglia and Uladzimir, Univ. of Trento
Two-layered architecture for peer-to-peer
concept search
GLOBAL
Comm: index

UK

BK-1

doc index -1

BK-3

Community-1

doc index -3

Community-3

BK-2

doc index -2

Community-2
Two-layered architecture
• Global layer
– retrieves relevant communities for query based on
universal knowledge

• Community layer
– retrieves relevant documents for query based on
background knowledge of community
Overcoming the shortcomings of singlelayered approaches
• Search can be scoped only to the relevant
communities for a query
– Results in less bandwidth-related issues

• Two layers make knowledge management scalable
and interoperable
– Niche interests supported at community-level background
knowledge bases
– Minimal universal knowledge for interoperability

• Search within community based on community’s
background knowledge
– Focused interest of community helps in better term-toconcept mapping
Two-layered approach
• Index partitioning
– Uses partition-by-term
• Posting list for each term stored in different peers

– Uses Distributed Hash Table(DHT) to realize dynamic termto-peer mapping
• O(logN) hops for each lookup

• Overlay network
– Communities and global layer maintained using twolayered overlay
• Based on our earlier work on computational grids*

– O(logN) hops for lookup even with two-layers
*M.V. Reddy, A.V. Srinivas, T. Gopinath, and D. Janakiram, “Vishwa: A reconfigurable P2P
middleware for Grid Computations,” in ICPP'06
Two-layered approach
• Community management
– Similar to public communities in flickr, facebook
etc.

• Search within community
– Uses Concept Search* as underlying semantic
search scheme
• Extends syntactic search with available knowledge to
realize semantic search
• Falls back to syntactic search when no knowledge is
available
*Fausto Giunchiglia, Uladzimir Kharkevich, Ilya Zaihrayeu, “Concept search”, ESWC 2009
Two-layered approach
• Knowledge representation
– Term -> concept mapping
– Concept hierarchy
• Concept relations expressed as subsumption relations

• Concepts in documents/queries extracted
– by analyzing words and natural language phrases
– Nounphrases translated into conjunctions of atomic
concepts (complex concepts)
• Small-1Λdog-2

– Documents/queries represented as enumerated
sequences of complex concepts
• Eg: 1:small-1Λdog-2 2:big-1Λanimal-3
Two-layered approach
• Relevance model
– Documents having more specific concepts than query
concepts considered relevant
• Eg: poodle-1 relevant when searching for dog-2

– Ranking done by extending tf-idf relevance model
• Incorporates term-concept and concept-concept similarities also

• Distributed knowledge maintenance
– Each atomic concept indexed on DHT with id
– Node responsible for each atomic concept id also stores
ids of
• All immediate more specific atomic concepts
• All concepts in the path to root of the atomic concept
Two-layered approach
• Document indexing and search
– Concepts mapped to peer using DHT
– Query routed to peers responsible for the query concepts
and related concepts
– Results from multiple peers combined to give final results

• Global search
– The popularity(document frequency) of each concept
indexed in upper layer
– Tf-idf extended with universal knowledge to search for
communities
– Combined score of doc = (score of community)*(score of
doc within community)
Experiments
• Single layer syntactic vs semantic: TREC ad-hoc,TREC8 (
simulated with 10,000 peers)
– Wordnet as knowledge base

• Single vs 2 layer
– 18 communities (doc: categories in dMoz*)
• 18*1000 = 18,000 peers simulated

–
–
–
–
–

UK = domain-independent concepts and relations from wordnet
BK = UK + wordnet domains + YAGO
BK mapped to communities
Queries selected as directory path to a specific subdirectory
Standard result: documents in that subdirectory

*http://www.dmoz.org/
Experiments
• Tools
– GATE(NLP), Lucene(search library), PeerSim(peer-topeer system simulator)

• Performance metrics
– Quality
• Precision @10, precision @20
• Mean average precision, MAP

– Network bandwidth
• Average number of postings transferred

– Response time
• s-postings, s-hops
Results (1 layer syntactic vs semantic)

• Quality improved
• But, cost also increased
Results (1 layer vs 2 layer)

• Quality improved
• Cost decreased
– 94% decrease in posting transfer for opt. case
Two-layered approach results
• Proposed approach gives better quality and
performance over single-layered approaches
– Performance can further improved using
optimizations like early termination

• But, issue of query spikes remain
Query spikes in peer-to-peer search
• Query spikes can lead to instability
– Replication/caching insufficient due to high
document creation rate*
rate of queries related to “Bin laden” increased by
10,000 times within one hour in Google on May 1, 2011
after Operation Geronimo.
Some background
• Term-partitioned search
– Term/popular query responsibility assigned to individual peers
• Updates and queries are sent to peer responsible which process them

– Term -> peer mapping done using a Distributed Hash
Table(DHTs)

top-k result list of q
Cloud-assisted p2p search(CAPS)

• Offload responsibilities of spiking queries to
public cloud
Issues in realizing CAPS
• Maintaining full index copy in cloud is very
expensive
– Storage alone will cost more than 5 million dollars per
month*

• Approach: transfer only relevant index portion to
cloud
– Need to be performed fast considering effect on user
experience(result quality, response time)

• Effect on the desirable properties of peer-to-peer
search
– Privacy, transparency, decentralized control etc.
CAPS components
• Switching decision maker
– Decide when to switch
– Simple e.g., “switch when query rate increases by
X% within last Y seconds”

• Switching implementor
– Switching algorithm to seamlessly transfer index
partition
– Dynamic creation of cloud instances
CAPS Switching algorithm

• Ensures that result quality is not affected
• Controlled bandwidth usage at peer
Addressing additional concerns
• Transparency
– Index resides both among peers and cloud

• Centralized control
– Query can switched back to peers or other clouds

• Privacy
– Only spiking queries(less revealing) are forwarded to
cloud

• Cost
– Cloud used only transiently for spiking queries

• Cloud payment model
– Anonymous keyword-based advertising model*
CAPS Evaluation
• Experimental setup
– Target system consists of millions of peers
– Implemented the relevant components in a
realistic network
• Responsible peer, preceding peers, cloud instance

• Datasets
– Real datasets on query/corresponding
updates(rates) not publicly available
– Used synthetic queries and updates with expected
query/update rates/ratio
Experimental setup

• 6 heterogeneous workstations with 4-6 cores,
8-16GB RAM used
Experiments
• Two sets of experiments
1. Demonstrate effect of query spike with and
without cloud-assistance
2. Effect of switching on user experience
• Response time and result quality
• Switching time
Results-1
With cloud assistance
Without cloud assistance
Results-2(effect of switching on user
experience)

• Result freshness

• Response time
Switching time
Conclusions
• Peer-to-peer search has many advantages by
design compared to centralized search
• But, peer-to-peer search approaches have
scalability issues
• Two-layered approach to peer-to-peer search can
improve efficiency and result quality of peer-topeer search
• Offloading queries to cloud can be an effective
method to handle query spikes
– Desirable properties of p2p systems not lost
Publications
• Janakiram Dharanipragada and Harisankar Haridas, “Stabilizing
peer-to-peer systems using public cloud: A case study of peer-topeer search”, In the The 11th International Symposium on Parallel
and Distributed Computing(ISPDC 2012), held at Munich, Germany.
• Janakiram Dharanipragada, Fausto Giunchiglia, Harisankar Haridas
and Uladzimir Kharkevich, “Two-layered architecture for peer-topeer concept search”, In the 4th International Semantic Search
Workshop located at the 20th Int. World Wide Web
Conference(WWW 2011), 2011), held at Hyderabad, India.
• Harisankar Haridas, Sriram Kailasam, Prateek Dhawalia, Prateek
Shrivastava, Santosh Kumar and Janakiram Dharanipragada, “Vcloud: A Peer-to-peer Video Storage-Compute Cloud”, In the 21st
International ACM Symposium on High-Performance Parallel and
Distributed Computing(HPDC 2012), held at Delft, The
Netherlands[Poster].
THANK YOU

Questions/Suggestions

harisankarh[ at ]gmail.com

Contenu connexe

Tendances

The Digital Library Federation Aquifer Initiative
The Digital Library Federation Aquifer InitiativeThe Digital Library Federation Aquifer Initiative
The Digital Library Federation Aquifer InitiativeJenn Riley
 
Increasing NUS Libraries' Visibility in the Virtual World - Updated
Increasing NUS Libraries' Visibility in the Virtual World - UpdatedIncreasing NUS Libraries' Visibility in the Virtual World - Updated
Increasing NUS Libraries' Visibility in the Virtual World - UpdatedKC Tan
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...Ben Blaiszik
 
Web Scale Discovery Vs Federated Search
Web Scale Discovery Vs Federated SearchWeb Scale Discovery Vs Federated Search
Web Scale Discovery Vs Federated SearchNikesh Narayanan
 
Web scale discovery service
Web scale discovery serviceWeb scale discovery service
Web scale discovery serviceKankana Baishya
 
Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Nikesh Narayanan
 
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTMETADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTVikas Bhushan
 
RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesASIS&T
 
Open minted content_provision
Open minted content_provisionOpen minted content_provision
Open minted content_provisionLucas anastasiou
 
Evaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery ServicesEvaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery ServicesNikesh Narayanan
 
Cloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overviewCloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overviewNikesh Narayanan
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfomsNikesh Narayanan
 
NISO Standards update: KBart and Demand Driven Acquisitions Best Practices
NISO Standards update: KBart and Demand Driven Acquisitions Best PracticesNISO Standards update: KBart and Demand Driven Acquisitions Best Practices
NISO Standards update: KBart and Demand Driven Acquisitions Best PracticesJason Price, PhD
 
The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...Neuroscience Information Framework
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterRobert H. McDonald
 

Tendances (20)

The Digital Library Federation Aquifer Initiative
The Digital Library Federation Aquifer InitiativeThe Digital Library Federation Aquifer Initiative
The Digital Library Federation Aquifer Initiative
 
Web scale discovery tools
Web scale discovery tools Web scale discovery tools
Web scale discovery tools
 
Increasing NUS Libraries' Visibility in the Virtual World - Updated
Increasing NUS Libraries' Visibility in the Virtual World - UpdatedIncreasing NUS Libraries' Visibility in the Virtual World - Updated
Increasing NUS Libraries' Visibility in the Virtual World - Updated
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
 
Text Indexing and Retrieval
Text Indexing and RetrievalText Indexing and Retrieval
Text Indexing and Retrieval
 
Web Scale Discovery Vs Federated Search
Web Scale Discovery Vs Federated SearchWeb Scale Discovery Vs Federated Search
Web Scale Discovery Vs Federated Search
 
Web scale discovery service
Web scale discovery serviceWeb scale discovery service
Web scale discovery service
 
Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...Implementing web scale discovery services: special reference to Indian Librar...
Implementing web scale discovery services: special reference to Indian Librar...
 
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTMETADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
 
RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue Libraries
 
Open minted content_provision
Open minted content_provisionOpen minted content_provision
Open minted content_provision
 
Evaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery ServicesEvaluation of Web Scale Discovery Services
Evaluation of Web Scale Discovery Services
 
Ir1
Ir1Ir1
Ir1
 
Cloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overviewCloud web scale discovery services landscape an overview
Cloud web scale discovery services landscape an overview
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
 
NISO Standards update: KBart and Demand Driven Acquisitions Best Practices
NISO Standards update: KBart and Demand Driven Acquisitions Best PracticesNISO Standards update: KBart and Demand Driven Acquisitions Best Practices
NISO Standards update: KBart and Demand Driven Acquisitions Best Practices
 
The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...
 
Hansen Metadata for Institutional Repositories
Hansen Metadata for Institutional RepositoriesHansen Metadata for Institutional Repositories
Hansen Metadata for Institutional Repositories
 
Cohn "Publishing platforms as metadata hubs"
Cohn "Publishing platforms as metadata hubs"Cohn "Publishing platforms as metadata hubs"
Cohn "Publishing platforms as metadata hubs"
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 

En vedette

áLbum de imajenes
áLbum de imajenesáLbum de imajenes
áLbum de imajenesjuan340209
 
Sortida de figueres
Sortida de figueresSortida de figueres
Sortida de figueresmontboro
 
Airplanes: Sailboats :: Mobile : Desktop
Airplanes: Sailboats :: Mobile : DesktopAirplanes: Sailboats :: Mobile : Desktop
Airplanes: Sailboats :: Mobile : DesktopAmye Scavarda
 
Econatres.minyb1950.bjohnson3
Econatres.minyb1950.bjohnson3Econatres.minyb1950.bjohnson3
Econatres.minyb1950.bjohnson3Anochi.com.
 
Dr. Yaron Brook on TheMarker 12/2013
Dr. Yaron Brook on TheMarker 12/2013Dr. Yaron Brook on TheMarker 12/2013
Dr. Yaron Brook on TheMarker 12/2013Anochi.com.
 
SW Drupal Summit - Upgrading 6 to 7
SW Drupal Summit - Upgrading 6 to 7SW Drupal Summit - Upgrading 6 to 7
SW Drupal Summit - Upgrading 6 to 7Amye Scavarda
 
דן נבות מיה מחשבים
דן נבות מיה מחשביםדן נבות מיה מחשבים
דן נבות מיה מחשביםAnochi.com.
 
עוד לא אבדה תקוותנו יאיר קליין
עוד לא אבדה תקוותנו יאיר קלייןעוד לא אבדה תקוותנו יאיר קליין
עוד לא אבדה תקוותנו יאיר קלייןAnochi.com.
 
Screen guide for americans
Screen guide for americansScreen guide for americans
Screen guide for americansAnochi.com.
 
סמית טרה חזון לא שאול מראי מקום
סמית טרה חזון לא שאול מראי מקוםסמית טרה חזון לא שאול מראי מקום
סמית טרה חזון לא שאול מראי מקוםAnochi.com.
 
Racisme i la xenofobia
Racisme i la xenofobiaRacisme i la xenofobia
Racisme i la xenofobialalorena1995
 
עובדים מהגרים בישראל JIMS
עובדים מהגרים בישראל JIMSעובדים מהגרים בישראל JIMS
עובדים מהגרים בישראל JIMSAnochi.com.
 
Global petroleum-survey-2013
Global petroleum-survey-2013Global petroleum-survey-2013
Global petroleum-survey-2013Anochi.com.
 
Seeking Feedback While Writing Your Dissertation
Seeking Feedback While Writing Your DissertationSeeking Feedback While Writing Your Dissertation
Seeking Feedback While Writing Your DissertationIlene Dawn Alexander
 
כך יובס הימין
כך יובס הימיןכך יובס הימין
כך יובס הימיןAnochi.com.
 
06 14 284b השכלה גבוהה בישראל
06 14 284b השכלה גבוהה בישראל 06 14 284b השכלה גבוהה בישראל
06 14 284b השכלה גבוהה בישראל Anochi.com.
 
Limited Budgets Presentation (Oct 20, 2010) for Download
Limited Budgets Presentation (Oct 20, 2010) for DownloadLimited Budgets Presentation (Oct 20, 2010) for Download
Limited Budgets Presentation (Oct 20, 2010) for DownloadJ Grant Mizell
 
Palestinian Refugies
Palestinian  RefugiesPalestinian  Refugies
Palestinian RefugiesAnochi.com.
 
בגצ 5799 12 לורך נ משרד הביטחון ואח
בגצ 5799 12 לורך נ משרד הביטחון ואחבגצ 5799 12 לורך נ משרד הביטחון ואח
בגצ 5799 12 לורך נ משרד הביטחון ואחAnochi.com.
 

En vedette (20)

áLbum de imajenes
áLbum de imajenesáLbum de imajenes
áLbum de imajenes
 
Sortida de figueres
Sortida de figueresSortida de figueres
Sortida de figueres
 
Airplanes: Sailboats :: Mobile : Desktop
Airplanes: Sailboats :: Mobile : DesktopAirplanes: Sailboats :: Mobile : Desktop
Airplanes: Sailboats :: Mobile : Desktop
 
Econatres.minyb1950.bjohnson3
Econatres.minyb1950.bjohnson3Econatres.minyb1950.bjohnson3
Econatres.minyb1950.bjohnson3
 
Dr. Yaron Brook on TheMarker 12/2013
Dr. Yaron Brook on TheMarker 12/2013Dr. Yaron Brook on TheMarker 12/2013
Dr. Yaron Brook on TheMarker 12/2013
 
SW Drupal Summit - Upgrading 6 to 7
SW Drupal Summit - Upgrading 6 to 7SW Drupal Summit - Upgrading 6 to 7
SW Drupal Summit - Upgrading 6 to 7
 
דן נבות מיה מחשבים
דן נבות מיה מחשביםדן נבות מיה מחשבים
דן נבות מיה מחשבים
 
Diurbook
DiurbookDiurbook
Diurbook
 
עוד לא אבדה תקוותנו יאיר קליין
עוד לא אבדה תקוותנו יאיר קלייןעוד לא אבדה תקוותנו יאיר קליין
עוד לא אבדה תקוותנו יאיר קליין
 
Screen guide for americans
Screen guide for americansScreen guide for americans
Screen guide for americans
 
סמית טרה חזון לא שאול מראי מקום
סמית טרה חזון לא שאול מראי מקוםסמית טרה חזון לא שאול מראי מקום
סמית טרה חזון לא שאול מראי מקום
 
Racisme i la xenofobia
Racisme i la xenofobiaRacisme i la xenofobia
Racisme i la xenofobia
 
עובדים מהגרים בישראל JIMS
עובדים מהגרים בישראל JIMSעובדים מהגרים בישראל JIMS
עובדים מהגרים בישראל JIMS
 
Global petroleum-survey-2013
Global petroleum-survey-2013Global petroleum-survey-2013
Global petroleum-survey-2013
 
Seeking Feedback While Writing Your Dissertation
Seeking Feedback While Writing Your DissertationSeeking Feedback While Writing Your Dissertation
Seeking Feedback While Writing Your Dissertation
 
כך יובס הימין
כך יובס הימיןכך יובס הימין
כך יובס הימין
 
06 14 284b השכלה גבוהה בישראל
06 14 284b השכלה גבוהה בישראל 06 14 284b השכלה גבוהה בישראל
06 14 284b השכלה גבוהה בישראל
 
Limited Budgets Presentation (Oct 20, 2010) for Download
Limited Budgets Presentation (Oct 20, 2010) for DownloadLimited Budgets Presentation (Oct 20, 2010) for Download
Limited Budgets Presentation (Oct 20, 2010) for Download
 
Palestinian Refugies
Palestinian  RefugiesPalestinian  Refugies
Palestinian Refugies
 
בגצ 5799 12 לורך נ משרד הביטחון ואח
בגצ 5799 12 לורך נ משרד הביטחון ואחבגצ 5799 12 לורך נ משרד הביטחון ואח
בגצ 5799 12 לורך נ משרד הביטחון ואח
 

Similaire à Addressing scalability challenges in peer-to-peer search

Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Practical Information Architecture
Practical Information ArchitecturePractical Information Architecture
Practical Information ArchitectureRob Bogue
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarFAIRDOM
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyIndiana Online Users Group
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Riccardo Albertoni
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Dataaba-sah
 
Data accessibilityandchallenges
Data accessibilityandchallengesData accessibilityandchallenges
Data accessibilityandchallengesjyotikhadake
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypresNekoGato
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM
 
WWW'15: A Hybrid Resource Recommender Mimicking Attention-Interpretation Dyna...
WWW'15: A Hybrid Resource Recommender Mimicking Attention-Interpretation Dyna...WWW'15: A Hybrid Resource Recommender Mimicking Attention-Interpretation Dyna...
WWW'15: A Hybrid Resource Recommender Mimicking Attention-Interpretation Dyna...Dominik Kowald
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
 
A Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to SustainA Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to SustainDaniel S. Katz
 

Similaire à Addressing scalability challenges in peer-to-peer search (20)

Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
The Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web InitiativeThe Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web Initiative
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Practical Information Architecture
Practical Information ArchitecturePractical Information Architecture
Practical Information Architecture
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 
IRT Unit_4.pptx
IRT Unit_4.pptxIRT Unit_4.pptx
IRT Unit_4.pptx
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled Technology
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Data accessibilityandchallenges
Data accessibilityandchallengesData accessibilityandchallenges
Data accessibilityandchallenges
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypres
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
 
WWW'15: A Hybrid Resource Recommender Mimicking Attention-Interpretation Dyna...
WWW'15: A Hybrid Resource Recommender Mimicking Attention-Interpretation Dyna...WWW'15: A Hybrid Resource Recommender Mimicking Attention-Interpretation Dyna...
WWW'15: A Hybrid Resource Recommender Mimicking Attention-Interpretation Dyna...
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
A Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to SustainA Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to Sustain
 

Dernier

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Dernier (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Addressing scalability challenges in peer-to-peer search

  • 1. Addressing scalability challenges in peer-to-peer search PhD seminar 4 Feb, 2014 Harisankar H, PhD scholar, DOS lab, Dept. of CSE Advisor: Prof. D. Janakiram http://harisankarh.wordpress.com
  • 2. Outline • Issues with centralized search – Can peer-to-peer search help? • Scalability challenges in peer-to-peer search • Proposed architectural extensions – Two-layered architecture for peer-to-peer concept search – Cloud-assisted approach to handle query spikes
  • 3. Centralized search scenario • Scenario – Search engines crawl available content, index and maintain it in data centers – User queries directed to data centers, processed internally and results sent back – Centrally managed by single company Content End users Datacenters
  • 4. Some issues with centralized search – Privacy concerns • All user queries accessible from a single location – Centralized control • Individual companies decide what to(not to) index, rank etc. – Transparency • Complete details of ranking, pre-processing etc. not made available publicly • Concerns of censorship and doctoring of results
  • 5. Some issues with centralized search contd.. • Uses mostly syntactic search techniques – Based on word or multi-word phrases – Low quality of results due to ambiguity of natural language • Issues with centralized semantic search – Difficult to capture long tail of niche interests of users • Requires rich human generated knowledge bases in numerous niche areas
  • 6. Peer-to-peer search approach • Edge nodes in the internet participate in providing and using the search service • Search as a collaborative service • Crawling, indexing and search distributed across the peers
  • 7. How could peer-to-peer search help? • Each user query can be sent to a different peer among millions – Obtaining query logs in a single location difficult – Reduced privacy concerns • Distributed control across numerous peers – Avoids centralized control • Search application available with all peers – Better transparency in ranking etc. • Background knowledge of peers can be utilized for effective semantic search – Can help improve quality of results • Led to lot of academic research in the area as well as real world p2p search engines* * e.g., faroo.com, yacy.net; YacyPi Kickstarter project
  • 8. Realizing peer-to-peer search • Distribution of search index – Term partitioning • Responsibility of individual terms assigned to different peers – E.g., peer1 is currently responsible for term “computer” • Term-to-peer mapping achieved through a structured overlay(e.g., DHT) Image src: http://wwarodomfr.blogspot.in/2008/09/chord-routing.html
  • 9. Scalability challenges in peer-to-peer search • • • • Peers share only idle resources Peers join/leave autonomously Limited individual resources leads to No SLA – Peer bandwidth bottleneck during query processing • Particularly queries involving multiple terms(index transfer between multiple peers) – Instability during query spikes • Knowledge management issues at large scale – Difficult to have consensus at large scale – Need wide understanding and have to meet requirements of large diverse group
  • 10. Two-layered architecture for peer-topeer concept search* • Peers organized as communities based on common interest • Each community maintains its own background knowledge to use in semantic search – Maintained in a distributed manner • A global layer with aggregate information to facilitate search across communities • Background knowledge bases extend from minimal universally accepted knowledge in upper layer • Search, indexing and knowledge management proceeds independently in each community *joint work with Prof. Fausto Guinchiglia and Uladzimir, Univ. of Trento
  • 11. Two-layered architecture for peer-to-peer concept search GLOBAL Comm: index UK BK-1 doc index -1 BK-3 Community-1 doc index -3 Community-3 BK-2 doc index -2 Community-2
  • 12. Two-layered architecture • Global layer – retrieves relevant communities for query based on universal knowledge • Community layer – retrieves relevant documents for query based on background knowledge of community
  • 13. Overcoming the shortcomings of singlelayered approaches • Search can be scoped only to the relevant communities for a query – Results in less bandwidth-related issues • Two layers make knowledge management scalable and interoperable – Niche interests supported at community-level background knowledge bases – Minimal universal knowledge for interoperability • Search within community based on community’s background knowledge – Focused interest of community helps in better term-toconcept mapping
  • 14. Two-layered approach • Index partitioning – Uses partition-by-term • Posting list for each term stored in different peers – Uses Distributed Hash Table(DHT) to realize dynamic termto-peer mapping • O(logN) hops for each lookup • Overlay network – Communities and global layer maintained using twolayered overlay • Based on our earlier work on computational grids* – O(logN) hops for lookup even with two-layers *M.V. Reddy, A.V. Srinivas, T. Gopinath, and D. Janakiram, “Vishwa: A reconfigurable P2P middleware for Grid Computations,” in ICPP'06
  • 15. Two-layered approach • Community management – Similar to public communities in flickr, facebook etc. • Search within community – Uses Concept Search* as underlying semantic search scheme • Extends syntactic search with available knowledge to realize semantic search • Falls back to syntactic search when no knowledge is available *Fausto Giunchiglia, Uladzimir Kharkevich, Ilya Zaihrayeu, “Concept search”, ESWC 2009
  • 16. Two-layered approach • Knowledge representation – Term -> concept mapping – Concept hierarchy • Concept relations expressed as subsumption relations • Concepts in documents/queries extracted – by analyzing words and natural language phrases – Nounphrases translated into conjunctions of atomic concepts (complex concepts) • Small-1Λdog-2 – Documents/queries represented as enumerated sequences of complex concepts • Eg: 1:small-1Λdog-2 2:big-1Λanimal-3
  • 17. Two-layered approach • Relevance model – Documents having more specific concepts than query concepts considered relevant • Eg: poodle-1 relevant when searching for dog-2 – Ranking done by extending tf-idf relevance model • Incorporates term-concept and concept-concept similarities also • Distributed knowledge maintenance – Each atomic concept indexed on DHT with id – Node responsible for each atomic concept id also stores ids of • All immediate more specific atomic concepts • All concepts in the path to root of the atomic concept
  • 18. Two-layered approach • Document indexing and search – Concepts mapped to peer using DHT – Query routed to peers responsible for the query concepts and related concepts – Results from multiple peers combined to give final results • Global search – The popularity(document frequency) of each concept indexed in upper layer – Tf-idf extended with universal knowledge to search for communities – Combined score of doc = (score of community)*(score of doc within community)
  • 19. Experiments • Single layer syntactic vs semantic: TREC ad-hoc,TREC8 ( simulated with 10,000 peers) – Wordnet as knowledge base • Single vs 2 layer – 18 communities (doc: categories in dMoz*) • 18*1000 = 18,000 peers simulated – – – – – UK = domain-independent concepts and relations from wordnet BK = UK + wordnet domains + YAGO BK mapped to communities Queries selected as directory path to a specific subdirectory Standard result: documents in that subdirectory *http://www.dmoz.org/
  • 20. Experiments • Tools – GATE(NLP), Lucene(search library), PeerSim(peer-topeer system simulator) • Performance metrics – Quality • Precision @10, precision @20 • Mean average precision, MAP – Network bandwidth • Average number of postings transferred – Response time • s-postings, s-hops
  • 21. Results (1 layer syntactic vs semantic) • Quality improved • But, cost also increased
  • 22. Results (1 layer vs 2 layer) • Quality improved • Cost decreased – 94% decrease in posting transfer for opt. case
  • 23. Two-layered approach results • Proposed approach gives better quality and performance over single-layered approaches – Performance can further improved using optimizations like early termination • But, issue of query spikes remain
  • 24. Query spikes in peer-to-peer search • Query spikes can lead to instability – Replication/caching insufficient due to high document creation rate* rate of queries related to “Bin laden” increased by 10,000 times within one hour in Google on May 1, 2011 after Operation Geronimo.
  • 25. Some background • Term-partitioned search – Term/popular query responsibility assigned to individual peers • Updates and queries are sent to peer responsible which process them – Term -> peer mapping done using a Distributed Hash Table(DHTs) top-k result list of q
  • 26. Cloud-assisted p2p search(CAPS) • Offload responsibilities of spiking queries to public cloud
  • 27. Issues in realizing CAPS • Maintaining full index copy in cloud is very expensive – Storage alone will cost more than 5 million dollars per month* • Approach: transfer only relevant index portion to cloud – Need to be performed fast considering effect on user experience(result quality, response time) • Effect on the desirable properties of peer-to-peer search – Privacy, transparency, decentralized control etc.
  • 28. CAPS components • Switching decision maker – Decide when to switch – Simple e.g., “switch when query rate increases by X% within last Y seconds” • Switching implementor – Switching algorithm to seamlessly transfer index partition – Dynamic creation of cloud instances
  • 29. CAPS Switching algorithm • Ensures that result quality is not affected • Controlled bandwidth usage at peer
  • 30. Addressing additional concerns • Transparency – Index resides both among peers and cloud • Centralized control – Query can switched back to peers or other clouds • Privacy – Only spiking queries(less revealing) are forwarded to cloud • Cost – Cloud used only transiently for spiking queries • Cloud payment model – Anonymous keyword-based advertising model*
  • 31. CAPS Evaluation • Experimental setup – Target system consists of millions of peers – Implemented the relevant components in a realistic network • Responsible peer, preceding peers, cloud instance • Datasets – Real datasets on query/corresponding updates(rates) not publicly available – Used synthetic queries and updates with expected query/update rates/ratio
  • 32. Experimental setup • 6 heterogeneous workstations with 4-6 cores, 8-16GB RAM used
  • 33. Experiments • Two sets of experiments 1. Demonstrate effect of query spike with and without cloud-assistance 2. Effect of switching on user experience • Response time and result quality • Switching time
  • 35. Results-2(effect of switching on user experience) • Result freshness • Response time
  • 37. Conclusions • Peer-to-peer search has many advantages by design compared to centralized search • But, peer-to-peer search approaches have scalability issues • Two-layered approach to peer-to-peer search can improve efficiency and result quality of peer-topeer search • Offloading queries to cloud can be an effective method to handle query spikes – Desirable properties of p2p systems not lost
  • 38. Publications • Janakiram Dharanipragada and Harisankar Haridas, “Stabilizing peer-to-peer systems using public cloud: A case study of peer-topeer search”, In the The 11th International Symposium on Parallel and Distributed Computing(ISPDC 2012), held at Munich, Germany. • Janakiram Dharanipragada, Fausto Giunchiglia, Harisankar Haridas and Uladzimir Kharkevich, “Two-layered architecture for peer-topeer concept search”, In the 4th International Semantic Search Workshop located at the 20th Int. World Wide Web Conference(WWW 2011), 2011), held at Hyderabad, India. • Harisankar Haridas, Sriram Kailasam, Prateek Dhawalia, Prateek Shrivastava, Santosh Kumar and Janakiram Dharanipragada, “Vcloud: A Peer-to-peer Video Storage-Compute Cloud”, In the 21st International ACM Symposium on High-Performance Parallel and Distributed Computing(HPDC 2012), held at Delft, The Netherlands[Poster].