SlideShare une entreprise Scribd logo
1  sur  15
Schematron(and other useful tools) Stuart Myles smyles@ap.org
An Aside: AP’s  Ingestion Pipleline ATOM + XHTML One way we ingest content: we transform ATOM and XHTML into our internal XML (APPL)  and NITF XSLT Transform APPL + NITF This is greatly simplified, obviously.
<p>The budget was just £100.</p> <p>How could it be done for so little money? <p>Luckily open source tools were available.</p> These are not new problems.</p> The solutions were even standardized.<p/> Converting from HTML to XML
Hard to enforce rules in the spec “HeadLine - this element must contain the same value as the entry’s <title> element” “summary is required for non-text content items, such as news photos and video. This element is optional for text story content items.” XML structure complies with XSD… …but can fail in downstream systems
Validate and Fix Prior to Ingestion Original ATOM + XHTML Tidy fixes sloppy HTML Custom XSLT tidies up XML W3C schema validates structure & syntax Schematron schema validates business rules Valid ATOM + XHTML, ready for ingestion
HTML Tidy       Fix sloppy HTML HTML -> XHTML
Schematron  Fact checker for XML documents Business rules that can’t be expressed in W3C XSD schema MediaType="Video"  Format="ANPA1312" Previously, we had to inspect new feeds to catch errors The risk is that feeds are approved but errors appear later (Not to mention manual checking of XML is tedious)
Schematron Small, powerful, lightweight fact-checker for XML documents Specify constraints using XPATH rules You write the error messages Schematron Schema One time compile into an XSLT Validation as an XSLT transform Validate Presence or absence of specific content Relationships between elements and attributes Reports Validation reports
Anatomy of a Schematron Rule Establish the context of the rule with an XPATH expression XSLT-style test establishes the constraint for each assert  <sch:rule context="atom:feed/atom:link">       <sch:assert test="starts-with(@href, 'http://')">         The feed/link/@href must contain an http url       </sch:assert>  </sch:rule> You write the error message to be used if the assert fails
DSDL – Pipeline Validation XSD RELAX NG Grammar Schematron Rules NVDL Namespace dispatch DTTL Datatype CRSL Character repertoire DSRL Document Semantic Renaming Still under development
Declaratively specify a pipeline (using XML, naturally) Similar in concept to Yahoo! Pipes     BizTalk But XML specific and a W3C standard
Thanks!

Contenu connexe

Tendances (11)

Basic JSTL
Basic JSTLBasic JSTL
Basic JSTL
 
Html JavaScript and CSS
Html JavaScript and CSSHtml JavaScript and CSS
Html JavaScript and CSS
 
Triggers and Stored Procedures
Triggers and Stored ProceduresTriggers and Stored Procedures
Triggers and Stored Procedures
 
XSLT and XPath - without the pain!
XSLT and XPath - without the pain!XSLT and XPath - without the pain!
XSLT and XPath - without the pain!
 
XML SCHEMAS
XML SCHEMASXML SCHEMAS
XML SCHEMAS
 
Broadleaf Presents Thymeleaf
Broadleaf Presents ThymeleafBroadleaf Presents Thymeleaf
Broadleaf Presents Thymeleaf
 
Sightly - Part 2
Sightly - Part 2Sightly - Part 2
Sightly - Part 2
 
Introduction to Sightly
Introduction to SightlyIntroduction to Sightly
Introduction to Sightly
 
Session six ASP.net (MVC) View
Session six ASP.net (MVC) ViewSession six ASP.net (MVC) View
Session six ASP.net (MVC) View
 
Xslt by asfak mahamud
Xslt by asfak mahamudXslt by asfak mahamud
Xslt by asfak mahamud
 
Jsp intro
Jsp introJsp intro
Jsp intro
 

Similaire à Schematron and Other Useful Tools

Extending Schemas
Extending SchemasExtending Schemas
Extending SchemasLiquidHub
 
XML Transformations With PHP
XML Transformations With PHPXML Transformations With PHP
XML Transformations With PHPStephan Schmidt
 
Transforming Xml Data Into Html
Transforming Xml Data Into HtmlTransforming Xml Data Into Html
Transforming Xml Data Into HtmlKarthikeyan Mkr
 
XML Training Presentation
XML Training PresentationXML Training Presentation
XML Training PresentationSarah Corney
 
Week 12 xml and xsl
Week 12 xml and xslWeek 12 xml and xsl
Week 12 xml and xslhapy
 
Kurzeinführung: Atom Publishing Protocol
Kurzeinführung: Atom Publishing ProtocolKurzeinführung: Atom Publishing Protocol
Kurzeinführung: Atom Publishing ProtocolDirk Haun
 
Improving Soap Message Serialization
Improving Soap Message SerializationImproving Soap Message Serialization
Improving Soap Message SerializationPrabath Siriwardena
 
Inroduction to XSLT with PHP4
Inroduction to XSLT with PHP4Inroduction to XSLT with PHP4
Inroduction to XSLT with PHP4Stephan Schmidt
 
Php Mysql Feedrss
Php Mysql FeedrssPhp Mysql Feedrss
Php Mysql FeedrssRCS&RDS
 

Similaire à Schematron and Other Useful Tools (20)

Extending Schemas
Extending SchemasExtending Schemas
Extending Schemas
 
XML Transformations With PHP
XML Transformations With PHPXML Transformations With PHP
XML Transformations With PHP
 
Creating an RSS feed
Creating an RSS feedCreating an RSS feed
Creating an RSS feed
 
Transforming Xml Data Into Html
Transforming Xml Data Into HtmlTransforming Xml Data Into Html
Transforming Xml Data Into Html
 
XML Training Presentation
XML Training PresentationXML Training Presentation
XML Training Presentation
 
Xml Schema
Xml SchemaXml Schema
Xml Schema
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Xml
XmlXml
Xml
 
Week 12 xml and xsl
Week 12 xml and xslWeek 12 xml and xsl
Week 12 xml and xsl
 
Kurzeinführung: Atom Publishing Protocol
Kurzeinführung: Atom Publishing ProtocolKurzeinführung: Atom Publishing Protocol
Kurzeinführung: Atom Publishing Protocol
 
Xml
XmlXml
Xml
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
Improving Soap Message Serialization
Improving Soap Message SerializationImproving Soap Message Serialization
Improving Soap Message Serialization
 
Xml Zoe
Xml ZoeXml Zoe
Xml Zoe
 
Xml Zoe
Xml ZoeXml Zoe
Xml Zoe
 
Inroduction to XSLT with PHP4
Inroduction to XSLT with PHP4Inroduction to XSLT with PHP4
Inroduction to XSLT with PHP4
 
AK html
AK  htmlAK  html
AK html
 
Php Mysql Feedrss
Php Mysql FeedrssPhp Mysql Feedrss
Php Mysql Feedrss
 
XML and XSLT
XML and XSLTXML and XSLT
XML and XSLT
 
SOAP Overview
SOAP OverviewSOAP Overview
SOAP Overview
 

Plus de Stuart Myles

IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For NewsStuart Myles
 
IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasStuart Myles
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019Stuart Myles
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceStuart Myles
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?Stuart Myles
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated PressStuart Myles
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018Stuart Myles
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeStuart Myles
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?Stuart Myles
 
IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018Stuart Myles
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...Stuart Myles
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesStuart Myles
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018Stuart Myles
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesStuart Myles
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...Stuart Myles
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorStuart Myles
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSONStuart Myles
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017Stuart Myles
 
IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017Stuart Myles
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Stuart Myles
 

Plus de Stuart Myles (20)

IPTC Rights Statements For News
IPTC Rights Statements For NewsIPTC Rights Statements For News
IPTC Rights Statements For News
 
IPTC New Taxonomies Ideas
IPTC New Taxonomies IdeasIPTC New Taxonomies Ideas
IPTC New Taxonomies Ideas
 
IPTC Board Spring 2019
IPTC Board Spring 2019IPTC Board Spring 2019
IPTC Board Spring 2019
 
IPTC Spring 2019 Conference
IPTC Spring 2019 ConferenceIPTC Spring 2019 Conference
IPTC Spring 2019 Conference
 
Photomation or Fauxtomation?
Photomation or Fauxtomation?Photomation or Fauxtomation?
Photomation or Fauxtomation?
 
Image Tagging at the Associated Press
Image Tagging at the Associated PressImage Tagging at the Associated Press
Image Tagging at the Associated Press
 
IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018IPTC Rights Working Group Toronto October 2018
IPTC Rights Working Group Toronto October 2018
 
IPTC AGM 2018 Welcome
IPTC AGM 2018 WelcomeIPTC AGM 2018 Welcome
IPTC AGM 2018 Welcome
 
How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?How Can We Make Algorithmic News More Transparent?
How Can We Make Algorithmic News More Transparent?
 
IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018IPTC EXTRA Spring 2018
IPTC EXTRA Spring 2018
 
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
IPTC Machine Readable Rights for News and Media: Solving Three Challenges wit...
 
Ap Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and ChallengesAp Taxonomy Localization Requirements and Challenges
Ap Taxonomy Localization Requirements and Challenges
 
IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018IPTC Spring Meeting Welcome To Athens April 2018
IPTC Spring Meeting Welcome To Athens April 2018
 
Sustaining Television News Technical Challenges
Sustaining Television News Technical ChallengesSustaining Television News Technical Challenges
Sustaining Television News Technical Challenges
 
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...How to Train Your Classifier: Create a Serverless Machine Learning System wit...
How to Train Your Classifier: Create a Serverless Machine Learning System wit...
 
The Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing DirectorThe Search for IPTC's Next Managing Director
The Search for IPTC's Next Managing Director
 
IPTC Approach to News in JSON
IPTC Approach to News in JSONIPTC Approach to News in JSON
IPTC Approach to News in JSON
 
IPTC News in JSON November 2017
IPTC News in JSON November 2017IPTC News in JSON November 2017
IPTC News in JSON November 2017
 
IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017IPTC EXTRA and EXTRA+ November 2017
IPTC EXTRA and EXTRA+ November 2017
 
Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017Welcome to Barcelona - IPTC November 2017
Welcome to Barcelona - IPTC November 2017
 

Dernier

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Schematron and Other Useful Tools

  • 1.
  • 2. Schematron(and other useful tools) Stuart Myles smyles@ap.org
  • 3.
  • 4. An Aside: AP’s Ingestion Pipleline ATOM + XHTML One way we ingest content: we transform ATOM and XHTML into our internal XML (APPL) and NITF XSLT Transform APPL + NITF This is greatly simplified, obviously.
  • 5. <p>The budget was just £100.</p> <p>How could it be done for so little money? <p>Luckily open source tools were available.</p> These are not new problems.</p> The solutions were even standardized.<p/> Converting from HTML to XML
  • 6. Hard to enforce rules in the spec “HeadLine - this element must contain the same value as the entry’s <title> element” “summary is required for non-text content items, such as news photos and video. This element is optional for text story content items.” XML structure complies with XSD… …but can fail in downstream systems
  • 7.
  • 8. Validate and Fix Prior to Ingestion Original ATOM + XHTML Tidy fixes sloppy HTML Custom XSLT tidies up XML W3C schema validates structure & syntax Schematron schema validates business rules Valid ATOM + XHTML, ready for ingestion
  • 9. HTML Tidy Fix sloppy HTML HTML -> XHTML
  • 10. Schematron Fact checker for XML documents Business rules that can’t be expressed in W3C XSD schema MediaType="Video" Format="ANPA1312" Previously, we had to inspect new feeds to catch errors The risk is that feeds are approved but errors appear later (Not to mention manual checking of XML is tedious)
  • 11. Schematron Small, powerful, lightweight fact-checker for XML documents Specify constraints using XPATH rules You write the error messages Schematron Schema One time compile into an XSLT Validation as an XSLT transform Validate Presence or absence of specific content Relationships between elements and attributes Reports Validation reports
  • 12. Anatomy of a Schematron Rule Establish the context of the rule with an XPATH expression XSLT-style test establishes the constraint for each assert <sch:rule context="atom:feed/atom:link"> <sch:assert test="starts-with(@href, 'http://')"> The feed/link/@href must contain an http url </sch:assert> </sch:rule> You write the error message to be used if the assert fails
  • 13. DSDL – Pipeline Validation XSD RELAX NG Grammar Schematron Rules NVDL Namespace dispatch DTTL Datatype CRSL Character repertoire DSRL Document Semantic Renaming Still under development
  • 14. Declaratively specify a pipeline (using XML, naturally) Similar in concept to Yahoo! Pipes BizTalk But XML specific and a W3C standard