MediaX (Jan 2013) -- PKP XML Parsing

•Télécharger en tant que PPT, PDF•

1 j'aime•455 vues

Alex Garnett

What do we want? XML Publishing!
• When do we want it? 2004 would’ve been
nice…

• We’ve known the value of properly marked up
documents for a few decades now
– Unfortunately, this entails hours of marking.

• Open-source publishers on limited budgets can’t
afford the outsourcing or the grad students that
normally make this possible

The Public Knowledge Project
• Developers of Open Journal Systems &
Open Monograph Press
– Open source software to
support open access
publishing.
– http://pkp.sfu.ca

• Our userbase happens to include many such
small publishers, who publish almost exclusively
in PDF, given its ease.

Nice things that PDF doesn’t have
• Well-structured text mining & indexing
• Rendering in different formats (e.g. mobile)
• Embedded dynamic content
• Citation parsing and lookup
• Reliable metadata

• So why are we still using it, again?

$XML Publishing Workflows • Are complex and underdocumented, requiring lots of manual labour, since no author will ever write in XML, and only a small fraction will use Markdown or LaTeX or some other text format that’s easy to transform, and most automated parsing tools are in deplorable condition anyhow, rant rant rant, despite the fact that there are many very good piecemeal tools available at different stages of these workflows. We put some of them together.$

Toolchain

• External Services:
– LibreOffice – document conversion
– pdfx – fuzzy parsing
– ParsCit – fuzzy citation parsing
– citeproc/CSL – citation transformation

Future Work
• After incorporating upstream changes from pdfx
(fixing punctutation & non-English languages)
we’re aiming to have an OJS plugin by March.
• OMP will follow soon after.

• By the end of our initial funding period in June,
we’ll have a source release (without pdfx) and
plan to be supporting a set of OJS/OMP users.

Future Work not done by us
• Collaborators at Heidelberg University are
working on a WYSIWYG in-browser XML
editor for manually revising article formatting.

• The University of Michigan’s mPach system will
add ePub generation and HathiTrust ingest.

• CrossRef will be contributing functionality to
look up, verify, and link parsed citations.

Thanks
• Damion Dooley, our primary developer
• Steve Pettifer and the University of Manchester
for allowing us to use pdfx
• Juan Alperin and the rest of the PKP team for
their support and earlier work
• Alf Eaton from the NLM for stylesheets
• MediaX for funding this project

Questions?
• If you want to use our service for document
preparation right now, contact me (Alex) at
axfelix@gmail.com.

• We’ll have a stable version available by the end
of January (probably free with registration)

• OJS/OMP integration and standalone release
(without pdfx) coming soon!

Contenu connexe

Tendances

Kafka is simple, it is just an infinite fileGabrielMironBrezai

TypeScript 1.6 - How I learned to Stop Worrying and Love JavaScriptWekoslav Stefanovski

EDF2013: Selected Talk Søren Roug: Reportnet – a Case StudyEuropean Data Forum

.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund

.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund

Client serverMike Feltman

.NET MeetUp Prague 2017 - .NET Standard -- Karel ZikmundKarel Zikmund

Intro to Graphs for FedictRik Van Bruggen

Evalution about programming language part 1Synapseindiappsdevelopment

Benefits of using Ruby on rails for Apps Development Chetu

.Net frameworksanya6900

Salcedo BSI and ISO STSNational Information Standards Organization (NISO)

Backing Library Operations with Open Source ApplicationsMyka Kennedy Stephens

Potential Next Steps for Peering Automation by Martin Levy [APRICOT 2015]APNIC

Plug saikuSkills Matter

Translation Automation Going Cloud: The New Landscape for Professional Transl...ABBYY Language Serivces

#RADC4L16: An API-First Archives Approach at NPRCamille Salas

Developer Conference 1.5 - Making the Move to Visual COBOL (Transvive)Micro Focus

Wei's Self Introsunmast

Apache flinkJanu Jahnavi

Tendances (20)

Kafka is simple, it is just an infinite file

TypeScript 1.6 - How I learned to Stop Worrying and Love JavaScript

EDF2013: Selected Talk Søren Roug: Reportnet – a Case Study

.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund

.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund

Client server

.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund

Intro to Graphs for Fedict

Evalution about programming language part 1

Benefits of using Ruby on rails for Apps Development

.Net framework

Salcedo BSI and ISO STS

Backing Library Operations with Open Source Applications

Potential Next Steps for Peering Automation by Martin Levy [APRICOT 2015]

Plug saiku

Translation Automation Going Cloud: The New Landscape for Professional Transl...

#RADC4L16: An API-First Archives Approach at NPR

Developer Conference 1.5 - Making the Move to Visual COBOL (Transvive)

Wei's Self Intro

Apache flink

En vedette

Informatorkaolinlore

Verd color verdresafer

Ak paris0305Surendra Sirvi

6 15-11 virtual party-updatedJessica Gheiler

8 feb11 net_impactJessica Gheiler

Network Strategy OverviewJessica Gheiler

Informatorkaolinlore

Beyond pdfgarnett2011Alex Garnett

Dabbawalachandni111

Evaluating networks slides_final_monitor3.8.11Jessica Gheiler

23 march the role of network fundersJessica Gheiler

HowArtWorks: Web Conversation PrepJessica Gheiler

3-21-12 How Art Works web conversationJessica Gheiler

10 18-11 nw-leadershipJessica Gheiler

Dzone core java concurrency -_Surendra Sirvi

10 18-11 nw-strategyJessica Gheiler

En vedette (16)

Informator

Verd color verd

Ak paris0305

6 15-11 virtual party-updated

8 feb11 net_impact

Network Strategy Overview

Informator

Beyond pdfgarnett2011

Dabbawala

Evaluating networks slides_final_monitor3.8.11

23 march the role of network funders

HowArtWorks: Web Conversation Prep

3-21-12 How Art Works web conversation

10 18-11 nw-leadership

Dzone core java concurrency -_

10 18-11 nw-strategy

Similaire à MediaX (Jan 2013) -- PKP XML Parsing

Day3 edupub tokyo_idpfJapan Electronic Publishing Association

Julia Computing - an alternative to HadoopShaurya Shekhar

How community software supports language documentation and data analysisPeter Bouda

Managing Complex Print Deliverables with Arbortext - PTC/USER 2010Gareth Oakes

EPUB NOW AND FUTUREJapan Electronic Publishing Association

Kerscher, Gunderson, and Wise "Unprecedented Access: Improving the User Expe...National Information Standards Organization (NISO)

Bill McCoy氏：電子出版の将来展望Japan Electronic Publishing Association

Free Libre Open Source Software at FFZG libraryDobrica Pavlinušić

Introduction to Python Programming BasicsDhana malar

Interactive E-BooksChristian Glahn

Drupal and Apache StanbolAlkuvoima

Application of Library Management Software: NewGenLibDavid Nzoputa Ofili

2015 bioinformatics python_introduction_wim_vancriekinge_vfinalProf. Wim Van Criekinge

The XML Forms ArchitectureiText Group nv

Galichet XML for Standards Publishers October 9National Information Standards Organization (NISO)

Building bridges - Plone Conference 2015 BucharestAndreas Jung

Citizen Developer Tools (session at SharePoint Saturday Houston 4/28/2018) by...Antti Koskela

But we're already open source! Why would I want to bring my code to Apache?gagravarr

Php training in bhubaneswar litbbsr

Similaire à MediaX (Jan 2013) -- PKP XML Parsing (20)

Day3 edupub tokyo_idpf

Julia Computing - an alternative to Hadoop

How community software supports language documentation and data analysis

Managing Complex Print Deliverables with Arbortext - PTC/USER 2010

EPUB NOW AND FUTURE

Kerscher, Gunderson, and Wise "Unprecedented Access: Improving the User Expe...

Bill McCoy氏：電子出版の将来展望

Free Libre Open Source Software at FFZG library

Introduction to Python Programming Basics

Interactive E-Books

Drupal and Apache Stanbol

Application of Library Management Software: NewGenLib

2015 bioinformatics python_introduction_wim_vancriekinge_vfinal

The XML Forms Architecture

Galichet XML for Standards Publishers October 9

Building bridges - Plone Conference 2015 Bucharest

Citizen Developer Tools (session at SharePoint Saturday Houston 4/28/2018) by...

But we're already open source! Why would I want to bring my code to Apache?

Php training in bhubaneswar

MediaX (Jan 2013) -- PKP XML Parsing

1. Left to Their Own Devices: Automating XML Parsing and Rendering for Scholarly Publishing Alex Garnett & John Willinsky Public Knowledge Project

2. What do we want? XML Publishing! • When do we want it? 2004 would’ve been nice… • We’ve known the value of properly marked up documents for a few decades now – Unfortunately, this entails hours of marking. • Open-source publishers on limited budgets can’t afford the outsourcing or the grad students that normally make this possible

3. The Public Knowledge Project • Developers of Open Journal Systems & Open Monograph Press – Open source software to support open access publishing. – http://pkp.sfu.ca • Our userbase happens to include many such small publishers, who publish almost exclusively in PDF, given its ease.

4. Nice things that PDF doesn’t have • Well-structured text mining & indexing • Rendering in different formats (e.g. mobile) • Embedded dynamic content • Citation parsing and lookup • Reliable metadata • So why are we still using it, again?

5. XML Publishing Workflows • Are complex and underdocumented, requiring lots of manual labour, since no author will ever write in XML, and only a small fraction will use Markdown or LaTeX or some other text format that’s easy to transform, and most automated parsing tools are in deplorable condition anyhow, rant rant rant, despite the fact that there are many very good piecemeal tools available at different stages of these workflows. We put some of them together.

7. Toolchain • External Services: – LibreOffice – document conversion – pdfx – fuzzy parsing – ParsCit – fuzzy citation parsing – citeproc/CSL – citation transformation

8. Future Work • After incorporating upstream changes from pdfx (fixing punctutation & non-English languages) we’re aiming to have an OJS plugin by March. • OMP will follow soon after. • By the end of our initial funding period in June, we’ll have a source release (without pdfx) and plan to be supporting a set of OJS/OMP users.

9. Future Work not done by us • Collaborators at Heidelberg University are working on a WYSIWYG in-browser XML editor for manually revising article formatting. • The University of Michigan’s mPach system will add ePub generation and HathiTrust ingest. • CrossRef will be contributing functionality to look up, verify, and link parsed citations.

10. Thanks • Damion Dooley, our primary developer • Steve Pettifer and the University of Manchester for allowing us to use pdfx • Juan Alperin and the rest of the PKP team for their support and earlier work • Alf Eaton from the NLM for stylesheets • MediaX for funding this project

11. Questions? • If you want to use our service for document preparation right now, contact me (Alex) at axfelix@gmail.com. • We’ll have a stable version available by the end of January (probably free with registration) • OJS/OMP integration and standalone release (without pdfx) coming soon!

Notes de l'éditeur

(5 minute demo happens here)

MediaX (Jan 2013) -- PKP XML Parsing

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (16)

Similaire à MediaX (Jan 2013) -- PKP XML Parsing

Similaire à MediaX (Jan 2013) -- PKP XML Parsing (20)

MediaX (Jan 2013) -- PKP XML Parsing

Notes de l'éditeur