SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
FUJITSU RESTRICTED - UK & IRELAND EYES ONLY © Copyright 2014 Fujitsu (Ireland) Limited
Traffic Analytics for
Linked Data Publishers
Luca Costabello, Pierre-Yves Vandenbussche, Gofran Shukair
Fujitsu Ireland
Corine Deliot, Neil Wilson
British Library
2
The Problem: Measuring Traffic on RDF Datasets
Linked Data publishers have limited awareness of how
datasets are accessed by visitors.
No tool to mine Linked Data servers access logs
Why is this such a big deal?
Justify investment in Linked Data IT infrastructure
Cost control
Identify abuses
Interpret access peaks
3
Which traffic metrics?
 Adapt conventional web analytics metrics
 Define Linked-Data specific extensions
How to extract and compute such metrics?
 Which data sources? (client tracking? server access logs mining? both?)
 Need to support dual data access protocol (HTTP operations + SPARQL)
 How to filter noise? (i.e. robots, search engines crawlers)
 How to detect client sessions? (no client tracking, dual data access protocol)
 How to detect SPARQL activity peaks?
Challenges
4
Existing tools do not include Linked Data-specific metrics:
Linked data-specific metrics, but no platform [Moller et al, WebScience 2010]
Filling a Gap in Prior Art
5
• Traffic analytics platform for LD servers
• Metrics
• Metrics Extraction
• Visitor Sessions
• Heavy/Light SPARQL Queries
• Results & British Library Trial Insight
Our Contribution / Agenda
6
• Traffic analytics platform for LD servers
• Metrics
• Metrics Extraction
• Visitor Sessions
• Heavy/Light SPARQL Queries
• Results & British Library Trial Insight
7
8
9
• Traffic Analytics Platform for LD Servers
• Metrics
• Metrics Extraction
• Visitor Sessions
• Heavy/Light SPARQL Queries
• Results & British Library Trial Insight
10
Metrics
* Linked Data-specific
11
Metrics
* Linked Data-specific
12
Metrics
* Linked Data-specific
13
• Traffic Analytics Platform for LD Servers
• Metrics
• Metrics Extraction
• Visitor Sessions
• Heavy/Light SPARQL Queries
• Results & British Library Trial Insight
14
Visitor Session Detection
Session: sequence of requests issued with no significant
interruptions by a uniquely identified visitor. Expires after a
period of inactivity.
We use the HAC variant by [Murray et al. 2006, Mehrzadi et al. 2012]
Unsupervised, gap-based session boundary detection
Traditional web logs analysis
Benefit: visitor-specific temporal cut-off
Two-step procedure:
Set visitor-specific session cut-off as time interval that significantly
increases the variance.
Group HTTP/SPARQL requests into sessions according to the cut-off
15
• Traffic Analytics Platform for LD Servers
• Metrics
• Metrics Extraction
• Visitor Sessions
• Heavy/Light SPARQL Queries
• Results & British Library Trial Insight
16
Heavy/Light SPARQL Queries Binary Classifier
 Rough estimate of heavy and light queries with supervised
binary classification.
 Heavy SPARQL Query: if it requires considerable computational
and memory resources.
Light
Heavy
17
Heavy/Light SPARQL Queries Binary Classifier
 Feature vectors: SPARQL 1.1 syntactic features only:
18
• Traffic Analytics Platform for LD Servers
• Metrics
• Metrics Extraction
• Visitor Sessions
• Heavy/Light SPARQL Queries
• Results & British Library Trial Insight
19
British National Bibliography access logs
bnb.data.bl.uk (access logs are not public)
13 months
~ 10M HTTP requests/month
DBpedia 3.9 access logs
USEWOD 2015 Dataset
Datasets
20
Visitor Session Detection: Results
How well do we detect the beginning of a new session?
Dataset
British National Bibliography access logs (3 consecutive days)
~16k HTTP/SPARQL requests
• 32% Desktop browsers (115 visitors)
• 68% Software libraries (10 visitors)
Manually annotated records
• 1=session_start | 0=internal
Baseline: fixed-length cut-offs
HAC outperforms fixed-length cut-offs
21
Random distinct queries from DBpedia 3.9 access logs
Run the queries multiple times on local clone of DBpedia
Kept ~3.7k queries with low variance (3.1k light, 600 heavy)
Cut-off threshold: 100ms
Naïve Bayes and SVM
Grid search & randomized search w/ 10-fold CV
Heavy/Light SPARQL: Experiment Protocol
22
Heavy/Light SPARQL: Results
23
Genuine calls account for 0.6% of total traffic!
+30% of HTTP/SPARQL traffic over the observed 13 months
Sharp increase in requests from Software Libraries (95x)
SPARQL accounts for 29% of traffic
6% of heavy SPARQL queries
37 days have unusual traffic spikes
Bounce rate: 48%
Software Libraries have bigger, deeper, and longer sessions.
Some Insights on BL Traffic Logs
24
We relieve publishers from manual and time-consuming
access log mining
Support Linked Data-specific metrics
 Break down traffic by RDF content
 Capture SPARQL insights
 Properly interpret 303 patterns
Reconstruction of Linked Data visitors sessions
Heavy/light SPARQL classifier w/ SPARQL syntax +
supervised learning
Revealed hidden insights on 13 months of access logs of the
British Library
Summary
25
Statistics on noise (i.e. web crawlers)
Heavy/light classifier
 Feature set refinements
 Does it generalize to other datasets?
Enhance session detection with content-based heuristics
 Relatedness of subsequent SPARQL queries
 Structure and type of requested RDF entities
Future Work
26
Public Demo: bit.ly/ld-traffic
innovation.ie.fujitsu.com/kedi

Contenu connexe

Similaire à Traffic Analytics for Linked Data Publishers

Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl newGreenM
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study Seeling Cheung
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Datafreshdatabos
 
Improving Reporting Performance
Improving Reporting PerformanceImproving Reporting Performance
Improving Reporting PerformanceDhiren Gala
 
Introducing the IRUSdataUK pilot webinar
Introducing the IRUSdataUK pilot webinarIntroducing the IRUSdataUK pilot webinar
Introducing the IRUSdataUK pilot webinarJisc
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesJohn Mulhall
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisAlex Henthorn-Iwane
 
Free Netflow analyzer training - diagnosing_and_troubleshooting
Free Netflow analyzer  training - diagnosing_and_troubleshootingFree Netflow analyzer  training - diagnosing_and_troubleshooting
Free Netflow analyzer training - diagnosing_and_troubleshootingManageEngine, Zoho Corporation
 
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...ManageEngine, Zoho Corporation
 
Tufts Research: Strategies from Data Management Leaders to Speed Clinical Trials
Tufts Research: Strategies from Data Management Leaders to Speed Clinical TrialsTufts Research: Strategies from Data Management Leaders to Speed Clinical Trials
Tufts Research: Strategies from Data Management Leaders to Speed Clinical TrialsVeeva Systems
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Maya Lumbroso
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Dataconomy Media
 
Web Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der PraxisWeb Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der PraxisFelix Gessert
 

Similaire à Traffic Analytics for Linked Data Publishers (20)

Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl new
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
nstitutional repositories, item and research data metrics
nstitutional repositories, item and research data metricsnstitutional repositories, item and research data metrics
nstitutional repositories, item and research data metrics
 
Introduction to Tideways
Introduction to TidewaysIntroduction to Tideways
Introduction to Tideways
 
Improving Reporting Performance
Improving Reporting PerformanceImproving Reporting Performance
Improving Reporting Performance
 
Introducing the IRUSdataUK pilot webinar
Introducing the IRUSdataUK pilot webinarIntroducing the IRUSdataUK pilot webinar
Introducing the IRUSdataUK pilot webinar
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation Slides
 
Psicquic
PsicquicPsicquic
Psicquic
 
Qo comparision
Qo comparisionQo comparision
Qo comparision
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
 
Free Netflow analyzer training - diagnosing_and_troubleshooting
Free Netflow analyzer  training - diagnosing_and_troubleshootingFree Netflow analyzer  training - diagnosing_and_troubleshooting
Free Netflow analyzer training - diagnosing_and_troubleshooting
 
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
 
Tufts Research: Strategies from Data Management Leaders to Speed Clinical Trials
Tufts Research: Strategies from Data Management Leaders to Speed Clinical TrialsTufts Research: Strategies from Data Management Leaders to Speed Clinical Trials
Tufts Research: Strategies from Data Management Leaders to Speed Clinical Trials
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Web Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der PraxisWeb Performance – die effektivsten Techniken aus der Praxis
Web Performance – die effektivsten Techniken aus der Praxis
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 

Plus de Luca Costabello

Machine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph Embeddings
Machine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph EmbeddingsMachine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph Embeddings
Machine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph EmbeddingsLuca Costabello
 
Error-Tolerant RDF Subgraph Matching for Adaptive Presentation of Linked Data...
Error-Tolerant RDF Subgraph Matching for Adaptive Presentation of Linked Data...Error-Tolerant RDF Subgraph Matching for Adaptive Presentation of Linked Data...
Error-Tolerant RDF Subgraph Matching for Adaptive Presentation of Linked Data...Luca Costabello
 
Context-Aware Access Control and Presentation of Linked Data
Context-Aware Access Control and Presentation of Linked DataContext-Aware Access Control and Presentation of Linked Data
Context-Aware Access Control and Presentation of Linked DataLuca Costabello
 
Access Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked DataAccess Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked DataLuca Costabello
 
Linked Data Access Goes Mobile: Context Aware Authorization for Graph Stores
Linked Data Access Goes Mobile: Context Aware Authorization for Graph StoresLinked Data Access Goes Mobile: Context Aware Authorization for Graph Stores
Linked Data Access Goes Mobile: Context Aware Authorization for Graph StoresLuca Costabello
 
PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA,Towards Mobile Adaptive Presentation of the Web of DataPRISSMA,Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA, Towards Mobile Adaptive Presentation of the Web of DataLuca Costabello
 
Time Based Cluster Analysis for Automatic Blog Generation
Time Based Cluster Analysis for Automatic Blog GenerationTime Based Cluster Analysis for Automatic Blog Generation
Time Based Cluster Analysis for Automatic Blog GenerationLuca Costabello
 

Plus de Luca Costabello (7)

Machine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph Embeddings
Machine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph EmbeddingsMachine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph Embeddings
Machine Learning on Knowledge Graphs: a Quick Tour of Knowledge Graph Embeddings
 
Error-Tolerant RDF Subgraph Matching for Adaptive Presentation of Linked Data...
Error-Tolerant RDF Subgraph Matching for Adaptive Presentation of Linked Data...Error-Tolerant RDF Subgraph Matching for Adaptive Presentation of Linked Data...
Error-Tolerant RDF Subgraph Matching for Adaptive Presentation of Linked Data...
 
Context-Aware Access Control and Presentation of Linked Data
Context-Aware Access Control and Presentation of Linked DataContext-Aware Access Control and Presentation of Linked Data
Context-Aware Access Control and Presentation of Linked Data
 
Access Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked DataAccess Control for HTTP Operations on Linked Data
Access Control for HTTP Operations on Linked Data
 
Linked Data Access Goes Mobile: Context Aware Authorization for Graph Stores
Linked Data Access Goes Mobile: Context Aware Authorization for Graph StoresLinked Data Access Goes Mobile: Context Aware Authorization for Graph Stores
Linked Data Access Goes Mobile: Context Aware Authorization for Graph Stores
 
PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA,Towards Mobile Adaptive Presentation of the Web of DataPRISSMA,Towards Mobile Adaptive Presentation of the Web of Data
PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
 
Time Based Cluster Analysis for Automatic Blog Generation
Time Based Cluster Analysis for Automatic Blog GenerationTime Based Cluster Analysis for Automatic Blog Generation
Time Based Cluster Analysis for Automatic Blog Generation
 

Dernier

Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 

Dernier (20)

Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 

Traffic Analytics for Linked Data Publishers

  • 1. FUJITSU RESTRICTED - UK & IRELAND EYES ONLY © Copyright 2014 Fujitsu (Ireland) Limited Traffic Analytics for Linked Data Publishers Luca Costabello, Pierre-Yves Vandenbussche, Gofran Shukair Fujitsu Ireland Corine Deliot, Neil Wilson British Library
  • 2. 2 The Problem: Measuring Traffic on RDF Datasets Linked Data publishers have limited awareness of how datasets are accessed by visitors. No tool to mine Linked Data servers access logs Why is this such a big deal? Justify investment in Linked Data IT infrastructure Cost control Identify abuses Interpret access peaks
  • 3. 3 Which traffic metrics?  Adapt conventional web analytics metrics  Define Linked-Data specific extensions How to extract and compute such metrics?  Which data sources? (client tracking? server access logs mining? both?)  Need to support dual data access protocol (HTTP operations + SPARQL)  How to filter noise? (i.e. robots, search engines crawlers)  How to detect client sessions? (no client tracking, dual data access protocol)  How to detect SPARQL activity peaks? Challenges
  • 4. 4 Existing tools do not include Linked Data-specific metrics: Linked data-specific metrics, but no platform [Moller et al, WebScience 2010] Filling a Gap in Prior Art
  • 5. 5 • Traffic analytics platform for LD servers • Metrics • Metrics Extraction • Visitor Sessions • Heavy/Light SPARQL Queries • Results & British Library Trial Insight Our Contribution / Agenda
  • 6. 6 • Traffic analytics platform for LD servers • Metrics • Metrics Extraction • Visitor Sessions • Heavy/Light SPARQL Queries • Results & British Library Trial Insight
  • 7. 7
  • 8. 8
  • 9. 9 • Traffic Analytics Platform for LD Servers • Metrics • Metrics Extraction • Visitor Sessions • Heavy/Light SPARQL Queries • Results & British Library Trial Insight
  • 13. 13 • Traffic Analytics Platform for LD Servers • Metrics • Metrics Extraction • Visitor Sessions • Heavy/Light SPARQL Queries • Results & British Library Trial Insight
  • 14. 14 Visitor Session Detection Session: sequence of requests issued with no significant interruptions by a uniquely identified visitor. Expires after a period of inactivity. We use the HAC variant by [Murray et al. 2006, Mehrzadi et al. 2012] Unsupervised, gap-based session boundary detection Traditional web logs analysis Benefit: visitor-specific temporal cut-off Two-step procedure: Set visitor-specific session cut-off as time interval that significantly increases the variance. Group HTTP/SPARQL requests into sessions according to the cut-off
  • 15. 15 • Traffic Analytics Platform for LD Servers • Metrics • Metrics Extraction • Visitor Sessions • Heavy/Light SPARQL Queries • Results & British Library Trial Insight
  • 16. 16 Heavy/Light SPARQL Queries Binary Classifier  Rough estimate of heavy and light queries with supervised binary classification.  Heavy SPARQL Query: if it requires considerable computational and memory resources. Light Heavy
  • 17. 17 Heavy/Light SPARQL Queries Binary Classifier  Feature vectors: SPARQL 1.1 syntactic features only:
  • 18. 18 • Traffic Analytics Platform for LD Servers • Metrics • Metrics Extraction • Visitor Sessions • Heavy/Light SPARQL Queries • Results & British Library Trial Insight
  • 19. 19 British National Bibliography access logs bnb.data.bl.uk (access logs are not public) 13 months ~ 10M HTTP requests/month DBpedia 3.9 access logs USEWOD 2015 Dataset Datasets
  • 20. 20 Visitor Session Detection: Results How well do we detect the beginning of a new session? Dataset British National Bibliography access logs (3 consecutive days) ~16k HTTP/SPARQL requests • 32% Desktop browsers (115 visitors) • 68% Software libraries (10 visitors) Manually annotated records • 1=session_start | 0=internal Baseline: fixed-length cut-offs HAC outperforms fixed-length cut-offs
  • 21. 21 Random distinct queries from DBpedia 3.9 access logs Run the queries multiple times on local clone of DBpedia Kept ~3.7k queries with low variance (3.1k light, 600 heavy) Cut-off threshold: 100ms Naïve Bayes and SVM Grid search & randomized search w/ 10-fold CV Heavy/Light SPARQL: Experiment Protocol
  • 23. 23 Genuine calls account for 0.6% of total traffic! +30% of HTTP/SPARQL traffic over the observed 13 months Sharp increase in requests from Software Libraries (95x) SPARQL accounts for 29% of traffic 6% of heavy SPARQL queries 37 days have unusual traffic spikes Bounce rate: 48% Software Libraries have bigger, deeper, and longer sessions. Some Insights on BL Traffic Logs
  • 24. 24 We relieve publishers from manual and time-consuming access log mining Support Linked Data-specific metrics  Break down traffic by RDF content  Capture SPARQL insights  Properly interpret 303 patterns Reconstruction of Linked Data visitors sessions Heavy/light SPARQL classifier w/ SPARQL syntax + supervised learning Revealed hidden insights on 13 months of access logs of the British Library Summary
  • 25. 25 Statistics on noise (i.e. web crawlers) Heavy/light classifier  Feature set refinements  Does it generalize to other datasets? Enhance session detection with content-based heuristics  Relatedness of subsequent SPARQL queries  Structure and type of requested RDF entities Future Work