SlideShare une entreprise Scribd logo
1  sur  29
Statistical Analysis of Web of
                          Data Usage
                          Towards (Visual) Maintenance
                          Support for Dataset Publishers
                      Markus Luczak-Rösch, Markus Bischoff


Freie Universität Berlin, Networked Information Systems (www.ag-nbi.de)
Who is addressed?
• rather small/simple ontologies
  – min. effort for OE
  – “under-engineered”
• unknown user requirements
We propose: A Usage-dependent Life
              Cycle


                                Requests and
   • RDB2RDF                      Queries         • Re-engineering
   • Crawling &           • SELECT * WHERE ?t     • Re-population
     transformation          a:madeOf a:Plastic   •…
   •…                     • SELECT * WHERE ?t
                            b:madeOf b:Wood
                                                         Negotiate
        Initial Release
                                                       understanding


                           USAGE
(Very) Quick Example
           • Out of which
             instruments consists
             The Beatles band?
           • Are the Beatles a “Big Band”?
           • What are “british” bands?
• Is it what the user expected
  to see?
• Did you know that
  this happens and
  do you know what
  to do now?
Survey covering approx.
                               25% of all cloud datasets


•   size
•   complexity
•   engineering methodology
•   …
     Publishers of most of the dataset do not
    have any (structured) idea how to maintain
                    their data.      Survey ran in October 2010, not yet
                                     published officially
Role of the dataset publisher
               (more general)
                   Effort Distribution between Publisher and Consumer

• use common
  vocabularies
• provide RDF
                   Consumer generates/
  links to other     data mines links



  resources                      Effort

• provide                     Distribution



  schema            Publisher provides       Links as
                           links
  mappings                                   hints



                                                        Christian Bizer: Pay-as-you-go Data Integration (21/9/2010)




                                                                          Source: Talk of Chris Bizer
Role of the dataset publisher
               (more specific)*
•   Reliability  Is the data valid and complete?
•   Peak-load  Temporal profiles of important data?
•   Performance  Are caches and indexes optimal?
•   Usefulness  What do people find and use frequently?
•   Attacks  Is the data threatened by spam?




                                * w.r.t. Möller et al.: Learning from Linked
                                Open Data Usage: Patterns & Metrics.
Our Usage-based Approach




digging in log files
How do people access resources on the Web of Data?

xxx.xxx.xxx.xxx - - [21/Sep/2009:00:00:00 -0600]
    "GET /page/Jeroen_Simaeys HTTP/1.1"
    200 26777 "" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
xxx.xxx.xxx.xxx - - [21/Sep/2009:00:00:00 -0600]
    "GET /resource/Guano_Apes HTTP/1.1"
    303 0 "" "Mozilla/5.0 (compatible; Googlebot/2.1;
    +http://www.google.com/bot.html)"
xxx.xxx.xxx.xxx - - [21/Sep/2009:00:00:01 -0600]
    "GET /sparql?query=PREFIX+rdfs%3A+%...“
    200 1844 "" ""


                         What do they get?
                         • RDF-Graphs
                         • SPARQL Query Results XML Format
                         • …, HTML, JSON, … serialization of results
                         • …, HTML, JSON, … serialization of no results


                              204 would be great but for now the usage
                              mining process should respect this 
Adapted from Myra Spilipoulou: “Web usage mining for
                                                                         Web site evaluation”, 2000, Commun. ACM


                      Log
                      File                                       Result Patterns
                                       Instructions

                                                                                 Visualization Tool
          Preparation Tool
                                                      Mining Query
                                                                                    Mining Results
      Access Methods and Patterns

                                                                             Navigation
                                                                              Patterns

Queries    Patterns          Triples    Filters                       Sessions
                                                                        and                               Statistics
                                                                     Sequences




                                                                                   Usage Mining
                                                                                     Methods
          Prepared Log Data



   Preparation Phase                                                             Mining Phase
Preparation Process
xxx.xxx.xxx.xxx - - [21/Sep/2009:00:00:01 -0600]
     "GET /sparql?query=PREFIX+rdfs%3A+%...“
     200 1844 "" ""



                            SPARQL Query           Basic Graph
        Log Entry                                                Triple Pattern
                            Selection and            Pattern
        Extraction                                                 Selection
                              Validation            Selection



                                 Query Partitions Database



                                       Query Partition
     Query Partition                                             Query Filter
                                          Success
      Re-Execution                                               Evaluation
                                       Determination
Usage Analysis

• queries
   • patterns
      • triples
         • primitives
   ns1:A


    rdf:type

                        Reference for details: M. Luczak-Rösch and H. Mühleisen,
           ns2:B        "Log File Analysis for Web of Data Endpoints ," in Proc. of
                        the 8th Extended Semantic Web Conference (ESWC)
                        Poster-Session, 2011.
Metrics
• Ontology heat map          • Resource usage
  – the amount a class or      – triple combinations in
    a predicate is used in       which a resource is
    queries                      used


• Primitive usage
  – position in triples
  – triple combinations
Metrics
• Time statistics           • Error statistics
  – hourly accesses            – triple patterns that
                                 contradict the schema
                                 but succeeded
• Hosts statistics             – triples patterns that
  – hourly accesses per          fail due to the
    host                         modelling
  – primitives and triple
    patterns requested by
    host
Visualizations

                                network
• weighted nodes                overview

  and edges
  (depending on
  the applied
  metric) represent
  the amount of
  usage               zoom in and see
                          details
Evaluation Dataset
• Dbpedia 3.3 log files
  – 1.700.000 requests from two randomly chosen
    days (07/2009)
  – analysis against a mirror of the 3.3 dataset
    (inconsistent dataset)
  – performance issues of dynamic network
    visualization and reprocessing of queries 
    limited number of analyzed logs
Starting Point for Visual Analysis
Resource Analysis
Predicate Analysis
Access Time and Hosts Analysis
    All hosts        Specific host
Hosts and Primitives Analysis
           Specific host
Inconsitencies & Weaknesses
                                                                            • ns:Band ns:instrument ?x
                                                        inconsistent        • ns:Band ns:genre ?y
                                                            data
                                                                            • ns:Band ns:associatedBand ?z




    • ns:Band ns:knownFor ?x                    missing facts
    • ns:Band ns:nationality ?y
    •…
Complete analysis can be found at http://page.mi.fu-berlin.de/mluczak/pub/visual-analysis-of-web-of-data-usage-dbpedia33/
What to learn from usage analysis?
• ontology maintenance
  – schema evolution
  – instance population
  – ontology modularization
  – error detection




                              Image source http://mrg.bz/GgaxPB
What else to learn?
• performance scaling
  – index generation
  – store architecture based on frequent SPARQL
    patterns
  – hardware scaling at peak times
  – modularization of data for different hosts
This is ok for the beginning but…




… SONIVIS can do more
 evaluate (with users!) various network visualizations
 and find the best one for specific context
More for the Future

• Generic patterns for the metrics
   + resolution/evolution patterns
• Common sense of statistics
   + Quality-of-dataset index
                                     Central conclusion:
• Temporal analysis                  Calculate statistics,
• Network metrics (degree,…)         weaknesses and
                                     inconsistencies first and
• Visualize the effects of change    do visual editing
                                     afterwards!

                                           Image source: http://mrg.bz/8Co9lA
• usage-dependent life cycle support for
                                 LOD vocabularies and the populated
                                 instances
      T           A            • (visual) usage analysis can help to plan
                                 and perform maintenance activities
                               • this is a benefit for the dataset publisher
      a           w              and the Web of data as a whole

      k           a
      e           y

Markus Luczak-Rösch (luczak@inf.fu-berlin.de)
Freie Universität Berlin, Networked Information Systems (www.ag-nbi.de)   Image source: http://mrg.bz/jlObbL

Contenu connexe

Similaire à Statistical Analysis of Web of Data Usage

Geo-referenced human-activity-data; access, processing and knowledge extraction
Geo-referenced human-activity-data; access, processing and knowledge extractionGeo-referenced human-activity-data; access, processing and knowledge extraction
Geo-referenced human-activity-data; access, processing and knowledge extraction
Conor Mc Elhinney
 
agINFRA Agricultural Ontology Workshop Presentation
agINFRA Agricultural Ontology Workshop PresentationagINFRA Agricultural Ontology Workshop Presentation
agINFRA Agricultural Ontology Workshop Presentation
Benjamin Cave
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
raj_vij
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
David Walker
 

Similaire à Statistical Analysis of Web of Data Usage (20)

Dm4
Dm4Dm4
Dm4
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching Data
 
Crushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional DataCrushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional Data
 
Geo-referenced human-activity-data; access, processing and knowledge extraction
Geo-referenced human-activity-data; access, processing and knowledge extractionGeo-referenced human-activity-data; access, processing and knowledge extraction
Geo-referenced human-activity-data; access, processing and knowledge extraction
 
STI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-smSTI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-sm
 
By
ByBy
By
 
agINFRA Agricultural Ontology Workshop Presentation
agINFRA Agricultural Ontology Workshop PresentationagINFRA Agricultural Ontology Workshop Presentation
agINFRA Agricultural Ontology Workshop Presentation
 
2012 02 aos-johanneskeizer
2012 02 aos-johanneskeizer2012 02 aos-johanneskeizer
2012 02 aos-johanneskeizer
 
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Exploratory Search upon Semantically Described Web Data Sources: Service regi...Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno Solutions
 
Actions speak louder than words: Analyzing large-scale query logs to improve ...
Actions speak louder than words: Analyzing large-scale query logs to improve ...Actions speak louder than words: Analyzing large-scale query logs to improve ...
Actions speak louder than words: Analyzing large-scale query logs to improve ...
 
Siddhi: A Second Look at Complex Event Processing Implementations
Siddhi: A Second Look at Complex Event Processing ImplementationsSiddhi: A Second Look at Complex Event Processing Implementations
Siddhi: A Second Look at Complex Event Processing Implementations
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
 
Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 

Plus de Markus Luczak-Rösch

Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
Markus Luczak-Rösch
 

Plus de Markus Luczak-Rösch (12)

Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...
Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...
Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...
 
Analysing literature through the lens of information theory and network science
Analysing literature through the lens of information theory and network scienceAnalysing literature through the lens of information theory and network science
Analysing literature through the lens of information theory and network science
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
 
Transcending our views to sequential data
Transcending our views to sequential data Transcending our views to sequential data
Transcending our views to sequential data
 
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
 
Context-free data analysis with Transcendental Information Cascades.
Context-free data analysis with Transcendental Information Cascades.Context-free data analysis with Transcendental Information Cascades.
Context-free data analysis with Transcendental Information Cascades.
 
From coincidence to purposeful flow? Properties of transcendental information...
From coincidence to purposeful flow? Properties of transcendental information...From coincidence to purposeful flow? Properties of transcendental information...
From coincidence to purposeful flow? Properties of transcendental information...
 
When resources collide: Towards a theory of coincidence in information spaces...
When resources collide: Towards a theory of coincidence in information spaces...When resources collide: Towards a theory of coincidence in information spaces...
When resources collide: Towards a theory of coincidence in information spaces...
 
Observation and Analysis of Social Machines
Observation and Analysis of Social MachinesObservation and Analysis of Social Machines
Observation and Analysis of Social Machines
 
Zooniverse - Through the Observatory
Zooniverse - Through the ObservatoryZooniverse - Through the Observatory
Zooniverse - Through the Observatory
 
loomp - semantic content authoring
loomp - semantic content authoringloomp - semantic content authoring
loomp - semantic content authoring
 

Dernier

Dernier (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Statistical Analysis of Web of Data Usage

  • 1. Statistical Analysis of Web of Data Usage Towards (Visual) Maintenance Support for Dataset Publishers Markus Luczak-Rösch, Markus Bischoff Freie Universität Berlin, Networked Information Systems (www.ag-nbi.de)
  • 2. Who is addressed? • rather small/simple ontologies – min. effort for OE – “under-engineered” • unknown user requirements
  • 3. We propose: A Usage-dependent Life Cycle Requests and • RDB2RDF Queries • Re-engineering • Crawling & • SELECT * WHERE ?t • Re-population transformation a:madeOf a:Plastic •… •… • SELECT * WHERE ?t b:madeOf b:Wood Negotiate Initial Release understanding USAGE
  • 4. (Very) Quick Example • Out of which instruments consists The Beatles band? • Are the Beatles a “Big Band”? • What are “british” bands?
  • 5.
  • 6. • Is it what the user expected to see? • Did you know that this happens and do you know what to do now?
  • 7. Survey covering approx. 25% of all cloud datasets • size • complexity • engineering methodology • …  Publishers of most of the dataset do not have any (structured) idea how to maintain their data. Survey ran in October 2010, not yet published officially
  • 8. Role of the dataset publisher (more general) Effort Distribution between Publisher and Consumer • use common vocabularies • provide RDF Consumer generates/ links to other data mines links resources Effort • provide Distribution schema Publisher provides Links as links mappings hints Christian Bizer: Pay-as-you-go Data Integration (21/9/2010) Source: Talk of Chris Bizer
  • 9. Role of the dataset publisher (more specific)* • Reliability  Is the data valid and complete? • Peak-load  Temporal profiles of important data? • Performance  Are caches and indexes optimal? • Usefulness  What do people find and use frequently? • Attacks  Is the data threatened by spam? * w.r.t. Möller et al.: Learning from Linked Open Data Usage: Patterns & Metrics.
  • 11. How do people access resources on the Web of Data? xxx.xxx.xxx.xxx - - [21/Sep/2009:00:00:00 -0600] "GET /page/Jeroen_Simaeys HTTP/1.1" 200 26777 "" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" xxx.xxx.xxx.xxx - - [21/Sep/2009:00:00:00 -0600] "GET /resource/Guano_Apes HTTP/1.1" 303 0 "" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" xxx.xxx.xxx.xxx - - [21/Sep/2009:00:00:01 -0600] "GET /sparql?query=PREFIX+rdfs%3A+%...“ 200 1844 "" "" What do they get? • RDF-Graphs • SPARQL Query Results XML Format • …, HTML, JSON, … serialization of results • …, HTML, JSON, … serialization of no results 204 would be great but for now the usage mining process should respect this 
  • 12. Adapted from Myra Spilipoulou: “Web usage mining for Web site evaluation”, 2000, Commun. ACM Log File Result Patterns Instructions Visualization Tool Preparation Tool Mining Query Mining Results Access Methods and Patterns Navigation Patterns Queries Patterns Triples Filters Sessions and Statistics Sequences Usage Mining Methods Prepared Log Data Preparation Phase Mining Phase
  • 13. Preparation Process xxx.xxx.xxx.xxx - - [21/Sep/2009:00:00:01 -0600] "GET /sparql?query=PREFIX+rdfs%3A+%...“ 200 1844 "" "" SPARQL Query Basic Graph Log Entry Triple Pattern Selection and Pattern Extraction Selection Validation Selection Query Partitions Database Query Partition Query Partition Query Filter Success Re-Execution Evaluation Determination
  • 14. Usage Analysis • queries • patterns • triples • primitives ns1:A rdf:type Reference for details: M. Luczak-Rösch and H. Mühleisen, ns2:B "Log File Analysis for Web of Data Endpoints ," in Proc. of the 8th Extended Semantic Web Conference (ESWC) Poster-Session, 2011.
  • 15. Metrics • Ontology heat map • Resource usage – the amount a class or – triple combinations in a predicate is used in which a resource is queries used • Primitive usage – position in triples – triple combinations
  • 16. Metrics • Time statistics • Error statistics – hourly accesses – triple patterns that contradict the schema but succeeded • Hosts statistics – triples patterns that – hourly accesses per fail due to the host modelling – primitives and triple patterns requested by host
  • 17. Visualizations network • weighted nodes overview and edges (depending on the applied metric) represent the amount of usage zoom in and see details
  • 18. Evaluation Dataset • Dbpedia 3.3 log files – 1.700.000 requests from two randomly chosen days (07/2009) – analysis against a mirror of the 3.3 dataset (inconsistent dataset) – performance issues of dynamic network visualization and reprocessing of queries  limited number of analyzed logs
  • 19. Starting Point for Visual Analysis
  • 22. Access Time and Hosts Analysis All hosts Specific host
  • 23. Hosts and Primitives Analysis Specific host
  • 24. Inconsitencies & Weaknesses • ns:Band ns:instrument ?x inconsistent • ns:Band ns:genre ?y data • ns:Band ns:associatedBand ?z • ns:Band ns:knownFor ?x missing facts • ns:Band ns:nationality ?y •… Complete analysis can be found at http://page.mi.fu-berlin.de/mluczak/pub/visual-analysis-of-web-of-data-usage-dbpedia33/
  • 25. What to learn from usage analysis? • ontology maintenance – schema evolution – instance population – ontology modularization – error detection Image source http://mrg.bz/GgaxPB
  • 26. What else to learn? • performance scaling – index generation – store architecture based on frequent SPARQL patterns – hardware scaling at peak times – modularization of data for different hosts
  • 27. This is ok for the beginning but… … SONIVIS can do more  evaluate (with users!) various network visualizations and find the best one for specific context
  • 28. More for the Future • Generic patterns for the metrics + resolution/evolution patterns • Common sense of statistics + Quality-of-dataset index Central conclusion: • Temporal analysis Calculate statistics, • Network metrics (degree,…) weaknesses and inconsistencies first and • Visualize the effects of change do visual editing afterwards! Image source: http://mrg.bz/8Co9lA
  • 29. • usage-dependent life cycle support for LOD vocabularies and the populated instances T A • (visual) usage analysis can help to plan and perform maintenance activities • this is a benefit for the dataset publisher a w and the Web of data as a whole k a e y Markus Luczak-Rösch (luczak@inf.fu-berlin.de) Freie Universität Berlin, Networked Information Systems (www.ag-nbi.de) Image source: http://mrg.bz/jlObbL

Notes de l'éditeur

  1. This is not an approach for all kind of domains but within LOD we find characteristic ontologies and vocabulariesdataset hosts do not know the requirements of the dataset users necessarily
  2. round about 25 per cent of alldatsets were covered by the survey.that relates to the absolute number of datsets and not the amount of triples servedsome of the bigger ones replied such as dbpedia and bio2rdf