Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

IPTC EXTRA Spring 2018

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 9 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à IPTC EXTRA Spring 2018 (20)

Publicité

Plus par Stuart Myles (20)

Plus récents (20)

Publicité

IPTC EXTRA Spring 2018

  1. 1. “Extra” by Jeremy Brooks https://flic.kr/p/4aKH3c
  2. 2. EXTRA and FRANCIS Stuart Myles * Associated Press * 24th April 2018 © 2018 IPTC (www.iptc.org) All rights reserved https://flic.kr/p/fBshW3 https://flic.kr/p/atFSAr
  3. 3. Rules-Based Classification • Rules better for breaking news than statistical methods – You don’t need 50 examples before you can start tagging – A rule for a new topic doesn’t require other rules to change • More consistent and scalable than hand tagging • Easier to explain why rules classify content – Machine learning methods can be “black boxes” – Easier to precisely explain - and correct - mistakes © 2018 IPTC (www.iptc.org) All rights reserved 3
  4. 4. EXTRA EXTraction Rules Apparatus Rules-based classification of text Open source software https://iptc.github.io/extra/ EXTRA was developed by the IPTC €50,000 Grant from the Digital News Initiative https://www.digitalnewsinitiative.com/fund/ You can use your own taxonomy, rules and formats - Example rules help us drive development of the EXTRA system - You can use the example rules to see how to develop your own - Rules could apply IPTC Media Topics or any other taxonomy © 2018 IPTC (www.iptc.org) All rights reserved 4
  5. 5. Development Process The EXTRA software was developed by Infalia - All software is open source Two linguists creating rules in English and German - Samples rules to apply IPTC Media Topics Example news corpora licensed for EXTRA - English from Thomson Reuters - German from APA © 2018 IPTC (www.iptc.org) All rights reserved 5
  6. 6. EXTRA Components Elasticsearch Percolator + Custom Code Classification Rule authoring Corpus Testing Schema Management © 2018 IPTC (www.iptc.org) All rights reserved 6
  7. 7. Classification using Percolator • Elasticsearch – A sophisticated, open source full-text search engine – Lets you query documents stored in an index • Elasticsearch Percolator – Store queries in an index and match documents to queries – Classification uses the percolator to match documents to rules • EXTRA Rule Language – Rule-writer-friendly language (easier than ES DSL) – Access to all ES features, plus custom operators © 2018 IPTC (www.iptc.org) All rights reserved 7
  8. 8. Schema and Rules Example • Two fields - headline and body- with body allowed to be queried by paragraph headline body body_paragraph • A rule to require that “angela merkel” and “us elections” appear in the same paragraph (prox/unit=paragraph/distance=1 (body adj "angela merkel") (body adj "us elections") ) © 2018 IPTC (www.iptc.org) All rights reserved 8
  9. 9. FRANCIS* Using machine learning to empower rule-based classification of news with semantics. • “aboutness” evaluation – Given that a story is about a topic, how much is it about it? • Rule suggestion – Suggest rules based on a pre-tagged corpus • Enriched rule operators – For example, nested “count” operators – Using EXTRA as the foundation * St Francis de Sales is the patron saint of writers and journalists © 2018 IPTC (www.iptc.org) All rights reserved 9

×