SlideShare une entreprise Scribd logo
1  sur  14
Commercially empowered Linked Open Data
Ecosystems in Research
           Towards unfolding todays and tomorrows
           scientific treasures

           Michael Granitzer
           University of Passau




                                          FP 7 Strep No. 296150
                                                                  1
nani gigantum humeris insidentes
   Standing on the shouldes of giants
     – Research builds on the past
     – We pass on knowledge, to create
       new knowledge




     Root of (Western) Society




                                         2
Lying under a pile of text documents
   .. with varying quality
   .. with contradicting facts
   .. with missing data
   .. labour intensive to compare results
   Some examples
    – “Improvements that don’t add up”
       Armstrong et. al. 2009

    – “Why most research results are false”
       Ioannidis, 2005




       Can we do better?


                                              3
Yes, we (think) we can...
   Make Facts and Figures explicit, discoveralbe and comparable

   Giving textually enCODED scientific knowledge, we can
    –   Extract facts from research papers
    –   Integrate those facts with existing knowledge
    –   Make it available for (visual) analysis
    –   Crowdsource


   Focus on
    – Empirical observations/facts
    – Linked Open Data
    – Computer Science and Biomedical Domain



                                                                   4
That‘s nice, but how?

      Extract                                                          Analyse &                                                        Share &
                         Aggregate
    & Integrate                                                        Organise                                                       Commercialise




                                             Dependency and Frequency Analysis

                                                                                        Graph Depencies
                                                                                                                           Machine
                                                                                              Algorithm
                                                                                                                           Learning




                                                                                                                  CRF        SVM


                                                                                              Biomedical
                                                                                                                        Data Set 1




                                               Gesamtergebnis"
                                                                                                Algorithms"
                                                        (Leer)"
                                                         SVM"                                   Domain"
                                                     DataSet2"
                                                                                                Experiment"
                                                     DataSet1"
                                                          CRF"                                  (Leer)"
                                                   Biomedical"                                  Gesamtergebnis"
                                                                  0"   5"   10"   15"   20"




Text, Linked Data   Linked Scientific Fact                    Visual Analytics &                                                        Crowdsourcing &
  Experiments          Data Warehouse                           Collaborative                                                             Marketplace
                                                               mind-mapping
                                                                                                                                                          5
Extract & Integrate: Approach and Challenges
   Extracting Structural Elements
     – Tables
     – Figures
     – Sections and sub-sections
   Extracting Facts from Structural Elements
     – Entity extraction (e.g. algorithms, data sets, genes, significance levels etc.)
     – Fact extraction – <Entity, Relation, Measure>
     – Table Triplification
   Crowdsourcing Extraction
     – Extraction quality and domain knowledge remains a key issue
      Empower users to maintain their own extraction model
      Allow to semantically annotate research papers (e.g. entities, facts)


   Result: Semantically annotated scientific data as LOD Endpoint


                                                                                         6
Extract & Integrate: Example
                               Numerical Facts

                                 Dimension/
                                   Entity

                                 In-Document
                                    Context




               Ranking Facts




                                                 7
Extract & Integrate: Current Status
                                                                                    TeamBeam -PDF
                                                                                     Structure Extraction
                                                                                      – Structural elements
                                                                                      – Focusing now on
                                                                                         tables

                                                                                    Entity Extraction in work



                                                                                    First Prototypes for
                                                                                     Table2RDFDataCube




         TeamBeam — Meta-Data Extraction from Scientific Literature
         By Roman Kern, Graz University of Technology; Kris Jack and Maya Hristakeva, Mendeley Ltd.; Michael
         Granitzer, University of Passau                                                                         8
Aggregate: Approach and Challenges
   Representation and Storage
     – Representation using the RDF Data Cube Vocabulary
         • Dimensions (e.g. Algorithms, Genes)
         • Measures (e.g. 0.3, 37) and Attributes (e.g. %, °)
     – Challenge 1: Ensure independency of dimensions
     – Challenge 2: Decentralized querying and aggregation
                                                                http://www.w3.org/TR/vocab-data-cube/#ref_qb_measureType




   SPARQL Data Warehousing Wizard
     – Provide simple and intuitive Wizard for creating aggregation queries
         • Google-like starting point
         • Pivot table creation similar like in Spreadsheets
     – Store using RDF Data Cube Vocabulary

 Linked Scientific Fact Data Warehouse for non-IT Experts


                                                                                                                           9
Aggregate: Current Status
   Representation and Storage
     – Data Model implemented
     – Triplification of Benchmarking Data (e.g. CLEF, TPC-H etc.)
     We are looking for data

   SPARQL Data Warehousing Wizard




                                                                     10
Analyse: Approach and Challenges
   Visual Analytics for Linked Scientific Facts
     – RDF based description of visualisations
         • Glue between data and single visualisations
         • Make visualisation state explicit
         • Share visualisation state

     – HTML 5 based visualisations and visualisation wizard




                                                              11
Share: Approach and Challenges
   Provenance
     – Who published data?
     – Who modified data?

   Share aggregated data sets and annotation models
     – Build on insights created by others
     – Re-use text annotation models

   Share visual analytics applications
     – Simple visualisations might be misleading
     – Sharing whole states of a visual analysis will reveal
       more details on certain decisions




                                                               12
Why should YOU do it?




Marketplace concept for research data
 Users (=researchers) will be enabled to “sell” their analysis results
  (or give it away for free)
 Serveral concepts to be investigated: Revenue chains, roles, models
  (donations, paid subscription for data feeds, purchase etc.)
 Increased opportunities for researchers and research data
                                                                          13
integrate    crowdsource




      extract &
                      organise
      visualise




 Find us, join us, ask us, help us
         http://code-research.eu/
http://www.facebook.com/CODEresearchEU
           #CODEresearchEU

Contenu connexe

Similaire à I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research

Business Intelligence Applications: Build or Buy Evaluation and IBM Cognos Demo
Business Intelligence Applications:  Build or Buy Evaluation and IBM Cognos DemoBusiness Intelligence Applications:  Build or Buy Evaluation and IBM Cognos Demo
Business Intelligence Applications: Build or Buy Evaluation and IBM Cognos DemoSenturus
 
March 2009 DIA Janus Update
March 2009 DIA Janus UpdateMarch 2009 DIA Janus Update
March 2009 DIA Janus Updateolivaa
 
Semantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web ApplicationsSemantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web ApplicationsAmit Sheth
 
Analytics capability framework viramdas 201212 ssnet
Analytics capability framework viramdas 201212 ssnetAnalytics capability framework viramdas 201212 ssnet
Analytics capability framework viramdas 201212 ssnetVishwanath Ramdas
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata Gruter
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...RIA RUI Society
 
"Cost/Benefit Case for Enterprise Warehouse Solutions"
"Cost/Benefit Case for Enterprise Warehouse Solutions""Cost/Benefit Case for Enterprise Warehouse Solutions"
"Cost/Benefit Case for Enterprise Warehouse Solutions"IBM India Smarter Computing
 
Data mining process powerpoint presentation templates.
Data mining process powerpoint presentation templates.Data mining process powerpoint presentation templates.
Data mining process powerpoint presentation templates.SlideTeam.net
 
Data mining strategy powerpoint ppt templates.
Data mining strategy powerpoint ppt templates.Data mining strategy powerpoint ppt templates.
Data mining strategy powerpoint ppt templates.SlideTeam.net
 
Data mining process powerpoint ppt templates.
Data mining process powerpoint ppt templates.Data mining process powerpoint ppt templates.
Data mining process powerpoint ppt templates.SlideTeam.net
 
Data mining strategy powerpoint presentation templates.
Data mining strategy powerpoint presentation templates.Data mining strategy powerpoint presentation templates.
Data mining strategy powerpoint presentation templates.SlideTeam.net
 
Data mining process powerpoint presentation slides.
Data mining process powerpoint presentation slides.Data mining process powerpoint presentation slides.
Data mining process powerpoint presentation slides.SlideTeam.net
 
Data mining strategy powerpoint ppt slides.
Data mining strategy powerpoint ppt slides.Data mining strategy powerpoint ppt slides.
Data mining strategy powerpoint ppt slides.SlideTeam.net
 
Infosys - Supply Chain Analytics Services | Solution
Infosys - Supply Chain Analytics Services | SolutionInfosys - Supply Chain Analytics Services | Solution
Infosys - Supply Chain Analytics Services | SolutionInfosys
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningSalford Systems
 

Similaire à I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research (20)

Business Intelligence Applications: Build or Buy Evaluation and IBM Cognos Demo
Business Intelligence Applications:  Build or Buy Evaluation and IBM Cognos DemoBusiness Intelligence Applications:  Build or Buy Evaluation and IBM Cognos Demo
Business Intelligence Applications: Build or Buy Evaluation and IBM Cognos Demo
 
March 2009 DIA Janus Update
March 2009 DIA Janus UpdateMarch 2009 DIA Janus Update
March 2009 DIA Janus Update
 
Data mining
Data miningData mining
Data mining
 
Semantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web ApplicationsSemantic Web powering Enterprise and Web Applications
Semantic Web powering Enterprise and Web Applications
 
101 ab 1345-1415
101 ab 1345-1415101 ab 1345-1415
101 ab 1345-1415
 
101 ab 1345-1415
101 ab 1345-1415101 ab 1345-1415
101 ab 1345-1415
 
Analytics capability framework viramdas 201212 ssnet
Analytics capability framework viramdas 201212 ssnetAnalytics capability framework viramdas 201212 ssnet
Analytics capability framework viramdas 201212 ssnet
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...
Bug deBug Chennai 2012 Talk - Driving innovation using pattern based thinking...
 
"Cost/Benefit Case for Enterprise Warehouse Solutions"
"Cost/Benefit Case for Enterprise Warehouse Solutions""Cost/Benefit Case for Enterprise Warehouse Solutions"
"Cost/Benefit Case for Enterprise Warehouse Solutions"
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Data mining process powerpoint presentation templates.
Data mining process powerpoint presentation templates.Data mining process powerpoint presentation templates.
Data mining process powerpoint presentation templates.
 
Data mining strategy powerpoint ppt templates.
Data mining strategy powerpoint ppt templates.Data mining strategy powerpoint ppt templates.
Data mining strategy powerpoint ppt templates.
 
Data mining process powerpoint ppt templates.
Data mining process powerpoint ppt templates.Data mining process powerpoint ppt templates.
Data mining process powerpoint ppt templates.
 
Data mining strategy powerpoint presentation templates.
Data mining strategy powerpoint presentation templates.Data mining strategy powerpoint presentation templates.
Data mining strategy powerpoint presentation templates.
 
Data mining process powerpoint presentation slides.
Data mining process powerpoint presentation slides.Data mining process powerpoint presentation slides.
Data mining process powerpoint presentation slides.
 
Data mining strategy powerpoint ppt slides.
Data mining strategy powerpoint ppt slides.Data mining strategy powerpoint ppt slides.
Data mining strategy powerpoint ppt slides.
 
Infosys - Supply Chain Analytics Services | Solution
Infosys - Supply Chain Analytics Services | SolutionInfosys - Supply Chain Analytics Services | Solution
Infosys - Supply Chain Analytics Services | Solution
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
 

Dernier

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystems in Research

  • 1. Commercially empowered Linked Open Data Ecosystems in Research Towards unfolding todays and tomorrows scientific treasures Michael Granitzer University of Passau FP 7 Strep No. 296150 1
  • 2. nani gigantum humeris insidentes  Standing on the shouldes of giants – Research builds on the past – We pass on knowledge, to create new knowledge Root of (Western) Society 2
  • 3. Lying under a pile of text documents  .. with varying quality  .. with contradicting facts  .. with missing data  .. labour intensive to compare results  Some examples – “Improvements that don’t add up” Armstrong et. al. 2009 – “Why most research results are false” Ioannidis, 2005 Can we do better? 3
  • 4. Yes, we (think) we can...  Make Facts and Figures explicit, discoveralbe and comparable  Giving textually enCODED scientific knowledge, we can – Extract facts from research papers – Integrate those facts with existing knowledge – Make it available for (visual) analysis – Crowdsource  Focus on – Empirical observations/facts – Linked Open Data – Computer Science and Biomedical Domain 4
  • 5. That‘s nice, but how? Extract Analyse & Share & Aggregate & Integrate Organise Commercialise Dependency and Frequency Analysis Graph Depencies Machine Algorithm Learning CRF SVM Biomedical Data Set 1 Gesamtergebnis" Algorithms" (Leer)" SVM" Domain" DataSet2" Experiment" DataSet1" CRF" (Leer)" Biomedical" Gesamtergebnis" 0" 5" 10" 15" 20" Text, Linked Data Linked Scientific Fact Visual Analytics & Crowdsourcing & Experiments Data Warehouse Collaborative Marketplace mind-mapping 5
  • 6. Extract & Integrate: Approach and Challenges  Extracting Structural Elements – Tables – Figures – Sections and sub-sections  Extracting Facts from Structural Elements – Entity extraction (e.g. algorithms, data sets, genes, significance levels etc.) – Fact extraction – <Entity, Relation, Measure> – Table Triplification  Crowdsourcing Extraction – Extraction quality and domain knowledge remains a key issue  Empower users to maintain their own extraction model  Allow to semantically annotate research papers (e.g. entities, facts)  Result: Semantically annotated scientific data as LOD Endpoint 6
  • 7. Extract & Integrate: Example Numerical Facts Dimension/ Entity In-Document Context Ranking Facts 7
  • 8. Extract & Integrate: Current Status  TeamBeam -PDF Structure Extraction – Structural elements – Focusing now on tables  Entity Extraction in work  First Prototypes for Table2RDFDataCube TeamBeam — Meta-Data Extraction from Scientific Literature By Roman Kern, Graz University of Technology; Kris Jack and Maya Hristakeva, Mendeley Ltd.; Michael Granitzer, University of Passau 8
  • 9. Aggregate: Approach and Challenges  Representation and Storage – Representation using the RDF Data Cube Vocabulary • Dimensions (e.g. Algorithms, Genes) • Measures (e.g. 0.3, 37) and Attributes (e.g. %, °) – Challenge 1: Ensure independency of dimensions – Challenge 2: Decentralized querying and aggregation http://www.w3.org/TR/vocab-data-cube/#ref_qb_measureType  SPARQL Data Warehousing Wizard – Provide simple and intuitive Wizard for creating aggregation queries • Google-like starting point • Pivot table creation similar like in Spreadsheets – Store using RDF Data Cube Vocabulary  Linked Scientific Fact Data Warehouse for non-IT Experts 9
  • 10. Aggregate: Current Status  Representation and Storage – Data Model implemented – Triplification of Benchmarking Data (e.g. CLEF, TPC-H etc.) We are looking for data  SPARQL Data Warehousing Wizard 10
  • 11. Analyse: Approach and Challenges  Visual Analytics for Linked Scientific Facts – RDF based description of visualisations • Glue between data and single visualisations • Make visualisation state explicit • Share visualisation state – HTML 5 based visualisations and visualisation wizard 11
  • 12. Share: Approach and Challenges  Provenance – Who published data? – Who modified data?  Share aggregated data sets and annotation models – Build on insights created by others – Re-use text annotation models  Share visual analytics applications – Simple visualisations might be misleading – Sharing whole states of a visual analysis will reveal more details on certain decisions 12
  • 13. Why should YOU do it? Marketplace concept for research data  Users (=researchers) will be enabled to “sell” their analysis results (or give it away for free)  Serveral concepts to be investigated: Revenue chains, roles, models (donations, paid subscription for data feeds, purchase etc.)  Increased opportunities for researchers and research data 13
  • 14. integrate crowdsource extract & organise visualise Find us, join us, ask us, help us http://code-research.eu/ http://www.facebook.com/CODEresearchEU #CODEresearchEU