SlideShare une entreprise Scribd logo
1  sur  19
Biblio-transformation-engine:
       An open source framework
               and use cases
      in the digital libraries domain

       Kostas Stamatis, Nikolaos Konstantinou, Anastasia Manta,
                 Christina Paschou and Nikos Houssos

National Documentation Centre / National Hellenic Research Foundation, Greece




       7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Agenda
• Introduction
• Motivation - the recurring need for data
  transformations
• The proposed solution
• Use cases / experience reports
• Summary – conclusions – future work



        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Motivation
• Data transformations are needed everywhere
  in digital libraries / scholarly communication
  systems
• Painful and tedious procedure
• Many sub-tasks of the entire procedure reoccur
  and could be reused
• Need for systematic framework for data
  transformations to accelerate the process,
  reduce errors and facilitate reuse
        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Analysis – basis steps
              in data transformations
• Retrieve source data records
• Apply processing:
   – (Optionally) Remove data records
   – (Optionally) Add/modify/delete field values within records
   – Transform data source to output format (implement the
     corresponding mapping)
• Generate desired output
   – Export to a file and/or directly update databases / external
     systems
• Need for incremental / selective data loading ->
  processing and output conditions may require repeated
  execution of the loading/processing cycle
          7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Design goals
• Customisable, non-intrusive, easy to use, integrate and
  extend (e.g. support a variety of data source types)
• Separation of concerns in development – e.g.
  development of transformation logic independent of
  data sources
   – Example: No need to be aware of MARC to develop a
     function to harmonise encoding of dates
• Support for recurring execution of the data
  loading/processing cycle according to specific criteria
  (e.g. useful for OAI-PMH)


          7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
The biblio transformation engine




    7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Components of the engine
• Data Loader: Retrieves data from input source(s)
  according to DataLoading Spec
• ProcessingStep: Transforms input in some way
  – Filters: removes records according to specific criteria
  – Modifier: updates records according to specific criteria
  – Initializer: initializes data in processing steps (e.g. load
    author names to Filter)
• Output Generator: Creates the desired output (e.g.
  export file, direct update of database)
• Record abstraction: simple common interface for
  all types of records that allows complex
  transformation functions
         7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Processing workflow
• Load data – transform input to records
• If processing conditions are met, begin
  processing – sequential execution of Filters
  and Modifiers
• If output conditions are met, begin output –
  execution of OutputGenerator(s)



        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Processing workflow
Load source data



                                                      NO                Processing
    Modify
                                                                        conditions
  LoadingSpec
                                                                           OK

                                                                                 YES

                                                                     Apply Filters &
                                                                       Modifiers



                                                         NO              Output
                                                                        conditions
                                                                           OK
                                                                    YES
Generate output

   7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
The transformation engine – data model




      7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Implementation
• FLOSS library developed in Java (maven used
  as a build tool)
• Configuration outside the code - dependency
  injection mechanisms of the Spring framework
  core container
  – Specification of Data Loader, Processing Steps,
    Conditions, OutputGenerator
  – Mapping from source to target format (for one-to-
    one field mappings)



        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Example of mapping configuration




     7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
FLOSS library
• Available at
  http://code.google.com/p/biblio-transformation-engine/
• European Union Public License
• Feel free to download and use it!
• Looking forward to feedback, questions,…
  (contributions also welcome )



         7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Use case 1 – Generate
               Linked Open Data
• Sources: Repository records, legacy cultural
  material records, research information in CERIF
• Corresponding data loaders reused
• Filters/Modifiers can be totally agnostic of RDF
  and input formats
• Use Jena RDF library to generate RDF triples
• Add/generate appropriate identifiers/URI for each
  entity (either at the modifier or output generator
  level)

        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Use case 2 – Import/export
   data/export to/from repositories
• Source record formats: EndNote, RIS, Bibtex,
  UNIMARC
• Developed data loaders for each format, re-
  used output generator for DSpace
• Export to different formats and reference styles
  based on repository records
  – Implemented for DSpace
  – For reference styles uses the citeproc-js library and
    the Citation Style Language (CSL)

        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Use case 3 – Feed
               the VOA3R aggregator
• Get records of the Hellenic National Archive of Doctoral
  Dissertation (HEDI – didaktorika.gr) to the VOA3R
  aggregator (Virtual Open Access Agriculture &
  Aquaculture Repository)
• Developed subject-based filter and injected it into an
  enhanced OAI-PMH server using the library.
• ~1070 of approximately 23.500 records, needed to
  apply techniques to cater for the distribution sparsity of
  “suitable” records combined with resumption token
• Seamless on-the-fly deployment and co-existence with
  sets targeted to other aggregators (DART,
  openarchives.gr)
          7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Use case 4 – Feed Europeana
• Include in Europeana content from the Technical
  Chamber of Greece (TEE)
• Records in TEE library catalog (UNIMARC),
  available through a Z39.50 interface
• Developed Z39.50 data loader, appropriate filters
  and modifiers (independent of UNIMARC)
• Mapping to ESE implemented through a modifier
• ~6800 from the TEE records sent to Europeana
• Repeatable, automated procedure through an
  enhanced OAI-PMH server using the library

    International Conference on Theory and Practice of Digital Libraries (TPDL 2011), Berlin, 26-28 Sept ember 2011
Future work
• Support more types of data transformations
  (contributions welcome )
• Extend declarative specification of mappings to
  cover more sophisticated cases
• Configurable support for common data model
  to facilitate reuse of Filter and Modifier
  implementations
• Systematically study the user experience,
  identify and implement potential improvements
        7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
Thank you for your attention!
• More info:

http://code.google.com/p/biblio-transformation-engine/

                                   kstamatis AT ekt.gr
                                     nkons AT ekt.gr
                                    amanta AT ekt.gr
                                   cpaschou AT ekt.gr
                                   nhoussos AT ekt.gr


         7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012

Contenu connexe

En vedette

Using Open Technologies to Enable Digital Transformation in the Enterprise
Using Open Technologies to Enable Digital Transformation in the EnterpriseUsing Open Technologies to Enable Digital Transformation in the Enterprise
Using Open Technologies to Enable Digital Transformation in the EnterpriseAppnovation Technologies
 
Transforming Your Company with Open Source
Transforming Your Company with Open SourceTransforming Your Company with Open Source
Transforming Your Company with Open SourceKartik Subbarao
 
The City of LA Migrates to Drupal: A Digital Transformation Story
The City of LA Migrates to Drupal: A Digital Transformation StoryThe City of LA Migrates to Drupal: A Digital Transformation Story
The City of LA Migrates to Drupal: A Digital Transformation StoryAcquia
 
Second Life: 20 Lessons
Second Life: 20 LessonsSecond Life: 20 Lessons
Second Life: 20 Lessonsjeremykemp
 
ILNU Library Orientation 2016 Highlights
ILNU Library Orientation 2016 HighlightsILNU Library Orientation 2016 Highlights
ILNU Library Orientation 2016 HighlightsAtul Bhatt
 
Installation of OpenBiblio on Windows XP using EasyPHP
Installation of OpenBiblio on Windows XP using EasyPHPInstallation of OpenBiblio on Windows XP using EasyPHP
Installation of OpenBiblio on Windows XP using EasyPHPRupesh Kumar
 
Mini-Me: Creating A Digital Presence Online
Mini-Me:  Creating A Digital Presence OnlineMini-Me:  Creating A Digital Presence Online
Mini-Me: Creating A Digital Presence OnlineJan McGee
 
Social Media and Your Library: Strategies to Lead the Way
Social Media and Your Library: Strategies to Lead the WaySocial Media and Your Library: Strategies to Lead the Way
Social Media and Your Library: Strategies to Lead the WaySteven Lastres
 
Simple Knowledge Organization System (SKOS) in the Context of Semantic Web De...
Simple Knowledge Organization System (SKOS) in the Context of Semantic Web De...Simple Knowledge Organization System (SKOS) in the Context of Semantic Web De...
Simple Knowledge Organization System (SKOS) in the Context of Semantic Web De...gardensofmeaning
 
Magento as a crucial part of Digital Transformation in B2B Industry.
Magento as a crucial part of Digital Transformation in B2B Industry.Magento as a crucial part of Digital Transformation in B2B Industry.
Magento as a crucial part of Digital Transformation in B2B Industry.Divante
 
Re-Platforming: How to Plan Your Next Multi-Site Digital Platform
Re-Platforming: How to Plan Your Next Multi-Site Digital PlatformRe-Platforming: How to Plan Your Next Multi-Site Digital Platform
Re-Platforming: How to Plan Your Next Multi-Site Digital PlatformJake Borr
 
Web-Vertise Your Classroom and Library
Web-Vertise Your Classroom and LibraryWeb-Vertise Your Classroom and Library
Web-Vertise Your Classroom and LibraryJan McGee
 

En vedette (18)

Using Open Technologies to Enable Digital Transformation in the Enterprise
Using Open Technologies to Enable Digital Transformation in the EnterpriseUsing Open Technologies to Enable Digital Transformation in the Enterprise
Using Open Technologies to Enable Digital Transformation in the Enterprise
 
Transforming Your Company with Open Source
Transforming Your Company with Open SourceTransforming Your Company with Open Source
Transforming Your Company with Open Source
 
The City of LA Migrates to Drupal: A Digital Transformation Story
The City of LA Migrates to Drupal: A Digital Transformation StoryThe City of LA Migrates to Drupal: A Digital Transformation Story
The City of LA Migrates to Drupal: A Digital Transformation Story
 
Second Life: 20 Lessons
Second Life: 20 LessonsSecond Life: 20 Lessons
Second Life: 20 Lessons
 
Social Media Tools for Libraries
Social Media Tools for LibrariesSocial Media Tools for Libraries
Social Media Tools for Libraries
 
ILNU Library Orientation 2016 Highlights
ILNU Library Orientation 2016 HighlightsILNU Library Orientation 2016 Highlights
ILNU Library Orientation 2016 Highlights
 
Installation of OpenBiblio on Windows XP using EasyPHP
Installation of OpenBiblio on Windows XP using EasyPHPInstallation of OpenBiblio on Windows XP using EasyPHP
Installation of OpenBiblio on Windows XP using EasyPHP
 
Mini-Me: Creating A Digital Presence Online
Mini-Me:  Creating A Digital Presence OnlineMini-Me:  Creating A Digital Presence Online
Mini-Me: Creating A Digital Presence Online
 
Social Media and Your Library: Strategies to Lead the Way
Social Media and Your Library: Strategies to Lead the WaySocial Media and Your Library: Strategies to Lead the Way
Social Media and Your Library: Strategies to Lead the Way
 
Open biblio
Open biblioOpen biblio
Open biblio
 
Simple Knowledge Organization System (SKOS) in the Context of Semantic Web De...
Simple Knowledge Organization System (SKOS) in the Context of Semantic Web De...Simple Knowledge Organization System (SKOS) in the Context of Semantic Web De...
Simple Knowledge Organization System (SKOS) in the Context of Semantic Web De...
 
Magento as a crucial part of Digital Transformation in B2B Industry.
Magento as a crucial part of Digital Transformation in B2B Industry.Magento as a crucial part of Digital Transformation in B2B Industry.
Magento as a crucial part of Digital Transformation in B2B Industry.
 
Re-Platforming: How to Plan Your Next Multi-Site Digital Platform
Re-Platforming: How to Plan Your Next Multi-Site Digital PlatformRe-Platforming: How to Plan Your Next Multi-Site Digital Platform
Re-Platforming: How to Plan Your Next Multi-Site Digital Platform
 
Web-Vertise Your Classroom and Library
Web-Vertise Your Classroom and LibraryWeb-Vertise Your Classroom and Library
Web-Vertise Your Classroom and Library
 
Installation of Openbiblio using Easyphp
Installation of Openbiblio using EasyphpInstallation of Openbiblio using Easyphp
Installation of Openbiblio using Easyphp
 
Koha 2.9 Windows Staff Client and Opac
Koha 2.9 Windows Staff Client and OpacKoha 2.9 Windows Staff Client and Opac
Koha 2.9 Windows Staff Client and Opac
 
Open access to scientific databases : a new gateway for information sharing
Open access to scientific databases : a new gateway for information sharingOpen access to scientific databases : a new gateway for information sharing
Open access to scientific databases : a new gateway for information sharing
 
Open source Library Management Systems
Open source Library Management SystemsOpen source Library Management Systems
Open source Library Management Systems
 

Similaire à Biblio-transformation-engine slides in Open Repositories 2012

Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensStoitsis Giannis
 
OpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesOpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesRIANIreland
 
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...rmacneil88
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014ResearchSpace
 
The European Nucleotide Archive
The European Nucleotide ArchiveThe European Nucleotide Archive
The European Nucleotide ArchiveEBI
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLEDB
 
IWMW 2005: Publish And Be Damned Re-purposing In The Real World
IWMW 2005: Publish And Be Damned Re-purposing In The Real WorldIWMW 2005: Publish And Be Damned Re-purposing In The Real World
IWMW 2005: Publish And Be Damned Re-purposing In The Real WorldIWMW
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challengesvty
 
Using oracle12c pluggable databases to archive
Using oracle12c pluggable databases to archiveUsing oracle12c pluggable databases to archive
Using oracle12c pluggable databases to archiveSecure-24
 
ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013Simeon Warner
 
Towards INSPIRE environmental 5* Open Data
Towards INSPIRE environmental 5* Open Data Towards INSPIRE environmental 5* Open Data
Towards INSPIRE environmental 5* Open Data Martin Tuchyna
 
Comsode tools - pushing data to open ecosystem
Comsode tools - pushing data to open ecosystemComsode tools - pushing data to open ecosystem
Comsode tools - pushing data to open ecosystemComsode - FP7 project
 
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...TERN Australia
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSCEUDAT
 

Similaire à Biblio-transformation-engine slides in Open Repositories 2012 (20)

Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
 
OpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesOpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish Repositories
 
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
The European Nucleotide Archive
The European Nucleotide ArchiveThe European Nucleotide Archive
The European Nucleotide Archive
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQL
 
IWMW 2005: Publish And Be Damned Re-purposing In The Real World
IWMW 2005: Publish And Be Damned Re-purposing In The Real WorldIWMW 2005: Publish And Be Damned Re-purposing In The Real World
IWMW 2005: Publish And Be Damned Re-purposing In The Real World
 
Data stage Online Training
Data stage Online TrainingData stage Online Training
Data stage Online Training
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
 
Using oracle12c pluggable databases to archive
Using oracle12c pluggable databases to archiveUsing oracle12c pluggable databases to archive
Using oracle12c pluggable databases to archive
 
ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013
 
Manchesterjan2015
Manchesterjan2015Manchesterjan2015
Manchesterjan2015
 
Towards INSPIRE environmental 5* Open Data
Towards INSPIRE environmental 5* Open Data Towards INSPIRE environmental 5* Open Data
Towards INSPIRE environmental 5* Open Data
 
Comsode tools - pushing data to open ecosystem
Comsode tools - pushing data to open ecosystemComsode tools - pushing data to open ecosystem
Comsode tools - pushing data to open ecosystem
 
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...
Henry&Hobbs, 'Developing long-term agro-ecological trial datasets for C and N...
 
Service Integration to Enhance RDM
Service Integration to Enhance RDMService Integration to Enhance RDM
Service Integration to Enhance RDM
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSC
 
RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015RDM@Edinburgh_interoperation_IDCC2015
RDM@Edinburgh_interoperation_IDCC2015
 

Dernier

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 

Dernier (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Biblio-transformation-engine slides in Open Repositories 2012

  • 1. Biblio-transformation-engine: An open source framework and use cases in the digital libraries domain Kostas Stamatis, Nikolaos Konstantinou, Anastasia Manta, Christina Paschou and Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 2. Agenda • Introduction • Motivation - the recurring need for data transformations • The proposed solution • Use cases / experience reports • Summary – conclusions – future work 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 3. Motivation • Data transformations are needed everywhere in digital libraries / scholarly communication systems • Painful and tedious procedure • Many sub-tasks of the entire procedure reoccur and could be reused • Need for systematic framework for data transformations to accelerate the process, reduce errors and facilitate reuse 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 4. Analysis – basis steps in data transformations • Retrieve source data records • Apply processing: – (Optionally) Remove data records – (Optionally) Add/modify/delete field values within records – Transform data source to output format (implement the corresponding mapping) • Generate desired output – Export to a file and/or directly update databases / external systems • Need for incremental / selective data loading -> processing and output conditions may require repeated execution of the loading/processing cycle 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 5. Design goals • Customisable, non-intrusive, easy to use, integrate and extend (e.g. support a variety of data source types) • Separation of concerns in development – e.g. development of transformation logic independent of data sources – Example: No need to be aware of MARC to develop a function to harmonise encoding of dates • Support for recurring execution of the data loading/processing cycle according to specific criteria (e.g. useful for OAI-PMH) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 6. The biblio transformation engine 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 7. Components of the engine • Data Loader: Retrieves data from input source(s) according to DataLoading Spec • ProcessingStep: Transforms input in some way – Filters: removes records according to specific criteria – Modifier: updates records according to specific criteria – Initializer: initializes data in processing steps (e.g. load author names to Filter) • Output Generator: Creates the desired output (e.g. export file, direct update of database) • Record abstraction: simple common interface for all types of records that allows complex transformation functions 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 8. Processing workflow • Load data – transform input to records • If processing conditions are met, begin processing – sequential execution of Filters and Modifiers • If output conditions are met, begin output – execution of OutputGenerator(s) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 9. Processing workflow Load source data NO Processing Modify conditions LoadingSpec OK YES Apply Filters & Modifiers NO Output conditions OK YES Generate output 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 10. The transformation engine – data model 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 11. Implementation • FLOSS library developed in Java (maven used as a build tool) • Configuration outside the code - dependency injection mechanisms of the Spring framework core container – Specification of Data Loader, Processing Steps, Conditions, OutputGenerator – Mapping from source to target format (for one-to- one field mappings) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 12. Example of mapping configuration 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 13. FLOSS library • Available at http://code.google.com/p/biblio-transformation-engine/ • European Union Public License • Feel free to download and use it! • Looking forward to feedback, questions,… (contributions also welcome ) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 14. Use case 1 – Generate Linked Open Data • Sources: Repository records, legacy cultural material records, research information in CERIF • Corresponding data loaders reused • Filters/Modifiers can be totally agnostic of RDF and input formats • Use Jena RDF library to generate RDF triples • Add/generate appropriate identifiers/URI for each entity (either at the modifier or output generator level) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 15. Use case 2 – Import/export data/export to/from repositories • Source record formats: EndNote, RIS, Bibtex, UNIMARC • Developed data loaders for each format, re- used output generator for DSpace • Export to different formats and reference styles based on repository records – Implemented for DSpace – For reference styles uses the citeproc-js library and the Citation Style Language (CSL) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 16. Use case 3 – Feed the VOA3R aggregator • Get records of the Hellenic National Archive of Doctoral Dissertation (HEDI – didaktorika.gr) to the VOA3R aggregator (Virtual Open Access Agriculture & Aquaculture Repository) • Developed subject-based filter and injected it into an enhanced OAI-PMH server using the library. • ~1070 of approximately 23.500 records, needed to apply techniques to cater for the distribution sparsity of “suitable” records combined with resumption token • Seamless on-the-fly deployment and co-existence with sets targeted to other aggregators (DART, openarchives.gr) 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 17. Use case 4 – Feed Europeana • Include in Europeana content from the Technical Chamber of Greece (TEE) • Records in TEE library catalog (UNIMARC), available through a Z39.50 interface • Developed Z39.50 data loader, appropriate filters and modifiers (independent of UNIMARC) • Mapping to ESE implemented through a modifier • ~6800 from the TEE records sent to Europeana • Repeatable, automated procedure through an enhanced OAI-PMH server using the library International Conference on Theory and Practice of Digital Libraries (TPDL 2011), Berlin, 26-28 Sept ember 2011
  • 18. Future work • Support more types of data transformations (contributions welcome ) • Extend declarative specification of mappings to cover more sophisticated cases • Configurable support for common data model to facilitate reuse of Filter and Modifier implementations • Systematically study the user experience, identify and implement potential improvements 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012
  • 19. Thank you for your attention! • More info: http://code.google.com/p/biblio-transformation-engine/ kstamatis AT ekt.gr nkons AT ekt.gr amanta AT ekt.gr cpaschou AT ekt.gr nhoussos AT ekt.gr 7th International Conference on Open Repositories (OR2012), Edinburgh, 9-13 July 2012