SlideShare a Scribd company logo
1 of 52
CORE: Aggregating and Enriching
Content to Support Open Access
            Petr Knoth
        The Open University




              1/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (DiggiCORE)




                              2/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (DiggiCORE)




                             3/52
Growth of items in Open Access repositories




                         4/52
Growth of Open Access repositories




                         5/52
Growth of articles in OA journals




                           6/52
Growth of OA journals




                        7/52
Green Open Access - statistics




                       8/52
Why we need aggregations?
“Each individual repository is of limited value for research: the real
power of Open Access lies in the possibility of connecting and tying
together repositories, which is why we need interoperability. In
order to create a seamless layer of content through connected
repositories from around the world, Open Access relies on
interoperability, the ability for systems to communicate with each
other and pass information back and forth in a usable format.
Interoperability allows us to exploit today's computational power so
that we can aggregate, data mine, create new tools and
services, and generate new knowledge from repository content.’’
                                                   [COAR manifesto]


                                9/52
Access to information according to the level of abstraction




                  Metadata Transfer
                   Interoperability


                                      Metadata



                                                                         OLTP
                                                                                                  Analytical



                                                 Semantic Enrichment
Repository
                                                                                             information access




                                                                                Interfaces
                                         Aggregation
                                                                                                 Transaction
  Repository                                                                                 information access
                                      Content



                                                                         OLAP



                                                                                              Raw data access
Repository


                                                                       10/52
Who should be supported by aggregations?

The following users groups (divided according to the level of
abstraction of information they need):
   •   Raw data access.
   •   Transaction information access.
   •   Analytical information access.




                                    11/52
Who should be supported by aggregations?

• The following users groups (divided according to the level of
  abstraction of information they need):
   •   Raw data access. Developers, DLs, DL researchers, companies …
   •   Transaction information access. Researchers, students, life-long learners …
   •   Analytical information access. Funders, government, bussiness intelligence
       …




                                     12/52
Layers of an aggregation system


                                Interfaces

                 OLTP                           OLAP

                                  Enrichment

              Metadata                          Content

   Metadata Transfer Interoperability




                                        13/52
Layers of an aggregation system
                   APIs (REST, SOAP, XML-RPC), UIs, Dashboards    Statistics


                                Interfaces

                 OLTP                                OLAP

                                  Enrichment
                                                                 Catalog records
              Metadata                               Content

   Metadata Transfer Interoperability
                                                                   Annotations

    OAI-PMH, OAI-ORE …             Dublin Core, XML, RDF …       PDF, Word …


                                        14/52
Access to information according to the level of abstraction




                  Metadata Transfer
                   Interoperability


                                      Metadata



                                                              OLTP
Repository                                                                             Analytical
                                                                                  information access




                                                                     Interfaces
                                                 Enrichment
                                                                                      Transaction
  Repository                                                                      information access
                                      Content



                                                              OLAP


                                                                                   Raw data access
Repository


                                                          15/52
Related systems




     16/52
Aggregation projects – BASE



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       17/52
Aggregation projects – OAISter/WorldCAT



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       18/52
Aggregation projects – RepUK



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       19/52
Aggregations need access to content, not just metadata!

• Certain metadata types can be created only at the level of the
  aggregation
• Certain metadata can be changing in time
• Ensuring content:
   • accessibility
   • availability
   • validity
   • quality
   • …



                               20/52
Aggregation projects – CiteSeerX



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       21/52
Should an aggregation system support all three user types?

            Can be realised by more than one system
                          providing that
                    the dataset is the same!




                             22/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (DiggiCORE)




                              23/52
CORE objectives
• CORE aims to provide a comprehensive technical infrastructure
  for Open Access scholarly publications that will support access
  and reuse of scholarly materials at different levels of abstraction.
• A nation-wide aggregation system that will improve the discovery
  of publications stored in British Open Access Repositories (OARs).




                                24/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         25/52
CORE functionality




                     26/52
CORE functionality
Step 1: Metadata and full-text harvesting



                       Content harvesting, processing




                                    27/52
What does CORE provide at different aggregation levels?
                                                                    Semantic similarity, Citation
                                                                    extraction, classsification, …



                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                                Analytical
                                                                                     information access




                                                                      Interfaces
                                                Enrichment
                                                                                         Transaction
  Repository                                                                         information access
                                     Content



                                                             OLAP


                                                                                       Raw data access
Repository


                                                         28/52
CORE functionality
Step 2: Semantic enrichment




                                      Semantic enrichment




                              29/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         30/52
CORE functionality
Step 3: Providing a set of services on top of the aggregation




                        Providing services




                                    31/52
CORE applications

 •   CORE Portal
 •   CORE Mobile
 •   CORE Plugin
 •   CORE API
 •   Repository Analytics




                            32/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         33/52
CORE Applications
CORE Portal – Allows searching and navigating scientific publications
aggregated from Open Access repositories




                                   34/52
CORE Applications

CORE Mobile – Allows searching and
navigating scientific publications
aggregated from Open Access
repositories




                                35/52
CORE Applications
CORE Plugin – A plugin to system that recommendations for related
items.




                                 36/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         37/52
CORE Applications
CORE API – Enables external systems and services to interact with the
CORE repository.




                                  38/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         39/52
CORE Applications
Repository Analytics – is an analytical tool supporting providers of
open access content (in particular repository managers).




                                   40/52
What does CORE provide at different aggregation levels?

                                                                    Repository Analytics


                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                              Analytical
                                                                                   information access




                                                                     Interfaces
                                                Enrichment
                                                                     CORE Portal, CORE
                                                                     Mobile, CORE Plugin
                                                                                      Transaction
  Repository                                                                      information access
                                     Content



                                                             OLAP
                                                                                   CORE API

                                                                                    Raw data access
Repository


                                                         41/52
CORE statistics
• Content
   • 5.4M records
   • 192 repositories
   • 402k full-texts
• Started: February 2011
• Budget: 140k£




                           42/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (          )




                              43/52
Partners




Advisory Board



                 44/52
Objective


Software for exploration and analysis of very large and
fast-growing amounts of research publications stored
across Open Access Repositories (OAR).




                           45/52
DiggiCORE networks




Three networks: (a) semantically related papers,
(b) citation network, (c) author citation network


                          46/52
DiggiCORE objectives

Allow researchers to use this platform to analyse
publications.
Why?
•   To identifying patterns in the behaviour of research
    communities
•   To detect trends in research disciplines
•   To gain new insights into the citation behaviour of researchers
•   To discover features that distinguish papers with high impact



                               47/52
Questions the system can help answering?
•   What are the attributes of impact publications?
•   Do these attributes differ in the humanities, social sciences and
    computer sciences?
•    What are the features of research groups within disciplines and
    how do these features relate to contributions generated by the
    group?
•   What are the attributes of high-impact authors and what is their
    role within the group?
•    What are the dynamics of successful research groups?



                                48/52
Questions the system can help answering?
•   What is the mechanism of cross-fertilisation within
    disciplines, especially between the humanities and the
    sciences?
•   Who are the authors whose work is worth monitoring because
    they contribute to the achievements of their own discipline and
    also inspire other disciplines?
•   How should the novice in the discipline get acquainted with key
    achievements in the discipline?
•    How should he/she search for the most important publications?



                               49/52
Summary
•   The rapid growth of OA content provides both an opportunity as
    well as a challenge.
•   Aggregations should serve the needs of different user groups.
•   Aggregations need to aggregate content, not just metadata.
•   We can have many services that are part of the
    infrastructure, but should work with the same data.




                               50/52
Thank you!




Yes we can!
   51/52
52/52

More Related Content

Similar to CORE: Aggregating and Enriching Content to Support Open Access

Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata HarvestingNikesh Narayanan
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Jian Qin
 
Organic.Edunet Repository Tools
Organic.Edunet Repository ToolsOrganic.Edunet Repository Tools
Organic.Edunet Repository ToolsHannes Ebner
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data WarehouseZalpa Rathod
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data cloudsdamienjoyce
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataMarcia Zeng
 
Text mining in CORE (OR2012)
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)petrknoth
 
Net flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_finalNet flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_finalYeounhee Lee
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides DuraSpace
 
CETIS09 OER Technical Roundtable
CETIS09 OER Technical Roundtable  CETIS09 OER Technical Roundtable
CETIS09 OER Technical Roundtable R. John Robertson
 
Data Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsData Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsDatamining Tools
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assetsJon Morley
 
The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...ariadnenetwork
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceRobert H. McDonald
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 

Similar to CORE: Aggregating and Enriching Content to Support Open Access (20)

Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
 
Metasearchers Benchmarking
Metasearchers BenchmarkingMetasearchers Benchmarking
Metasearchers Benchmarking
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Organic.Edunet Repository Tools
Organic.Edunet Repository ToolsOrganic.Edunet Repository Tools
Organic.Edunet Repository Tools
 
Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data Warehouse
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data clouds
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library Data
 
OAI and OAI-PMH
OAI and OAI-PMHOAI and OAI-PMH
OAI and OAI-PMH
 
Text mining in CORE (OR2012)
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)
 
Net flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_finalNet flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_final
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides
 
CETIS09 OER Technical Roundtable
CETIS09 OER Technical Roundtable  CETIS09 OER Technical Roundtable
CETIS09 OER Technical Roundtable
 
Data Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsData Mining: Data mining and key definitions
Data Mining: Data mining and key definitions
 
Data Mining: Key definitions
Data Mining: Key definitionsData Mining: Key definitions
Data Mining: Key definitions
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assets
 
The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 

More from petrknoth

Qui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingpetrknoth
 
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositoriespetrknoth
 
UKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet thempetrknoth
 
Enabling Educators to Locate High-Quality Teaching Resources
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resourcespetrknoth
 
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboardpetrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...petrknoth
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboardpetrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...petrknoth
 
Analysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolspetrknoth
 
Assessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policypetrknoth
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)petrknoth
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure petrknoth
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriespetrknoth
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...petrknoth
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncpetrknoth
 
Semantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluationpetrknoth
 
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...petrknoth
 
My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?petrknoth
 
FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)petrknoth
 

More from petrknoth (20)

Qui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishing
 
CORE APIv3
CORE APIv3CORE APIv3
CORE APIv3
 
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
 
UKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet them
 
Enabling Educators to Locate High-Quality Teaching Resources
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resources
 
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboard
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Analysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery tools
 
Assessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policy
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositories
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
 
Semantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluation
 
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...
 
My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?
 
FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 

Recently uploaded (20)

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 

CORE: Aggregating and Enriching Content to Support Open Access

  • 1. CORE: Aggregating and Enriching Content to Support Open Access Petr Knoth The Open University 1/52
  • 2. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications (DiggiCORE) 2/52
  • 3. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications (DiggiCORE) 3/52
  • 4. Growth of items in Open Access repositories 4/52
  • 5. Growth of Open Access repositories 5/52
  • 6. Growth of articles in OA journals 6/52
  • 7. Growth of OA journals 7/52
  • 8. Green Open Access - statistics 8/52
  • 9. Why we need aggregations? “Each individual repository is of limited value for research: the real power of Open Access lies in the possibility of connecting and tying together repositories, which is why we need interoperability. In order to create a seamless layer of content through connected repositories from around the world, Open Access relies on interoperability, the ability for systems to communicate with each other and pass information back and forth in a usable format. Interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.’’ [COAR manifesto] 9/52
  • 10. Access to information according to the level of abstraction Metadata Transfer Interoperability Metadata OLTP Analytical Semantic Enrichment Repository information access Interfaces Aggregation Transaction Repository information access Content OLAP Raw data access Repository 10/52
  • 11. Who should be supported by aggregations? The following users groups (divided according to the level of abstraction of information they need): • Raw data access. • Transaction information access. • Analytical information access. 11/52
  • 12. Who should be supported by aggregations? • The following users groups (divided according to the level of abstraction of information they need): • Raw data access. Developers, DLs, DL researchers, companies … • Transaction information access. Researchers, students, life-long learners … • Analytical information access. Funders, government, bussiness intelligence … 12/52
  • 13. Layers of an aggregation system Interfaces OLTP OLAP Enrichment Metadata Content Metadata Transfer Interoperability 13/52
  • 14. Layers of an aggregation system APIs (REST, SOAP, XML-RPC), UIs, Dashboards Statistics Interfaces OLTP OLAP Enrichment Catalog records Metadata Content Metadata Transfer Interoperability Annotations OAI-PMH, OAI-ORE … Dublin Core, XML, RDF … PDF, Word … 14/52
  • 15. Access to information according to the level of abstraction Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 15/52
  • 17. Aggregation projects – BASE Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 17/52
  • 18. Aggregation projects – OAISter/WorldCAT Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 18/52
  • 19. Aggregation projects – RepUK Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 19/52
  • 20. Aggregations need access to content, not just metadata! • Certain metadata types can be created only at the level of the aggregation • Certain metadata can be changing in time • Ensuring content: • accessibility • availability • validity • quality • … 20/52
  • 21. Aggregation projects – CiteSeerX Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 21/52
  • 22. Should an aggregation system support all three user types? Can be realised by more than one system providing that the dataset is the same! 22/52
  • 23. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications (DiggiCORE) 23/52
  • 24. CORE objectives • CORE aims to provide a comprehensive technical infrastructure for Open Access scholarly publications that will support access and reuse of scholarly materials at different levels of abstraction. • A nation-wide aggregation system that will improve the discovery of publications stored in British Open Access Repositories (OARs). 24/52
  • 25. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 25/52
  • 27. CORE functionality Step 1: Metadata and full-text harvesting Content harvesting, processing 27/52
  • 28. What does CORE provide at different aggregation levels? Semantic similarity, Citation extraction, classsification, … Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 28/52
  • 29. CORE functionality Step 2: Semantic enrichment Semantic enrichment 29/52
  • 30. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 30/52
  • 31. CORE functionality Step 3: Providing a set of services on top of the aggregation Providing services 31/52
  • 32. CORE applications • CORE Portal • CORE Mobile • CORE Plugin • CORE API • Repository Analytics 32/52
  • 33. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 33/52
  • 34. CORE Applications CORE Portal – Allows searching and navigating scientific publications aggregated from Open Access repositories 34/52
  • 35. CORE Applications CORE Mobile – Allows searching and navigating scientific publications aggregated from Open Access repositories 35/52
  • 36. CORE Applications CORE Plugin – A plugin to system that recommendations for related items. 36/52
  • 37. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 37/52
  • 38. CORE Applications CORE API – Enables external systems and services to interact with the CORE repository. 38/52
  • 39. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 39/52
  • 40. CORE Applications Repository Analytics – is an analytical tool supporting providers of open access content (in particular repository managers). 40/52
  • 41. What does CORE provide at different aggregation levels? Repository Analytics Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment CORE Portal, CORE Mobile, CORE Plugin Transaction Repository information access Content OLAP CORE API Raw data access Repository 41/52
  • 42. CORE statistics • Content • 5.4M records • 192 repositories • 402k full-texts • Started: February 2011 • Budget: 140k£ 42/52
  • 43. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications ( ) 43/52
  • 45. Objective Software for exploration and analysis of very large and fast-growing amounts of research publications stored across Open Access Repositories (OAR). 45/52
  • 46. DiggiCORE networks Three networks: (a) semantically related papers, (b) citation network, (c) author citation network 46/52
  • 47. DiggiCORE objectives Allow researchers to use this platform to analyse publications. Why? • To identifying patterns in the behaviour of research communities • To detect trends in research disciplines • To gain new insights into the citation behaviour of researchers • To discover features that distinguish papers with high impact 47/52
  • 48. Questions the system can help answering? • What are the attributes of impact publications? • Do these attributes differ in the humanities, social sciences and computer sciences? • What are the features of research groups within disciplines and how do these features relate to contributions generated by the group? • What are the attributes of high-impact authors and what is their role within the group? • What are the dynamics of successful research groups? 48/52
  • 49. Questions the system can help answering? • What is the mechanism of cross-fertilisation within disciplines, especially between the humanities and the sciences? • Who are the authors whose work is worth monitoring because they contribute to the achievements of their own discipline and also inspire other disciplines? • How should the novice in the discipline get acquainted with key achievements in the discipline? • How should he/she search for the most important publications? 49/52
  • 50. Summary • The rapid growth of OA content provides both an opportunity as well as a challenge. • Aggregations should serve the needs of different user groups. • Aggregations need to aggregate content, not just metadata. • We can have many services that are part of the infrastructure, but should work with the same data. 50/52
  • 51. Thank you! Yes we can! 51/52
  • 52. 52/52