This paper presents a research concerning the conversion of non-accessible web pages containing mathematical formulae into accessible versions through an OCR (Optical Character Recognition) tool. The objective of this research is twofold. First, to establish criteria for evaluating the potential accessibility of mathematical web sites, i.e. the feasibility of converting non-accessible (non-MathML) math sites into accessible ones (Math-ML). Second, to propose a data model and a mechanism to publish evaluation results, making them available to the educational community who may use them as a quality measurement for selecting learning material.
Results show that the conversion using OCR tools is not viable for math web pages mainly due to two reasons: many of these pages are designed to be interactive, making difficult, if not almost impossible, a correct conversion; formula (either images or text) have been written without taking into account standards of math writing, as a consequence OCR tools do not properly recognize math symbols and expressions. In spite of these results, we think the proposed methodology to create and publish evaluation reports may be rather useful in other accessibility assessment scenarios.
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mathematical Webs Accessibility CSUN 2014
1. Miquel Centelles, Mireia Ribera, Inmaculada Rodríguez
Adaptabit Group – University of Barcelona
CSUN Conference 2014
2. The vision
The problem
Setting the stage
Our point of view
Our solution: MathML, OCR (InftyReader)
and linked data
Results and future work
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 2
3. Teaching methodologies have shifted from content-
based to skills-based learning.
A key business for teachers is the selection of web
resources which serve as reference, extension and
motivation to their students.
This selection is mainly based on content and source
quality, but rarely considers accessibility criteria.
Accessibility criteria will be increasingly important.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 3
4. Still lots of non-accessible math web sites.
Why?
Interactivity. Formulae in graphics or videos.
Authoring tools automatic conversion to MathML
▪ Not fully reliable
▪ Often convert to images
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 4
5. Concerning accessibility in maths, several
initiatives have been driven by publishers and
libraries.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 5
6. Concerning the semantic meaning to
mathematical formulae several initiatives link
MathML with the semantic web.
Christoph Lange: a proposal to describe generic
mathematical formulae.
OpenMath: a lightweight ontology to endorse the
meaning of mathematical symbols.
HELM: the pioneer in representing structures of
mathematical knowledge in RDF.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 6
7. To assess the potential accessibility of non-
accessible math web sites?
Definition: Potential accessibility = The feasibility
of converting them into accessible webs
To publish assessment results in a
semantically-empowered way
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 7
8. How to accessibilize a Math web?
Converting to MathML
Re-writing formulae
<mrow>
a ⁢
<msup>x 2</msup> + b ⁢
x + c
</mrow>
Describing formulae in alternative text
“A times square x plus b times x plus c”
Converting through OCR
<mrow>
a ⁢
<msup>x 2</msup> + b ⁢
x + c
</mrow>
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 8
9. OCR is the best option:
Does not require expertise
OK for low resources
Can be done by the student
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 9
10. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 10
11. First
Conceived for Web pages.
After
General format for exchanging mathematics.
Now
Provides accessible content for people with
disabilities.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 11
12. MathML support still incomplete but…
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 12
13. Pros
Recognizes math symbols, converts them into
▪ LaTeX,
▪ XHTML and MathML.
Cons
Strong requirements:
▪ Pure B&W
▪ High resolution (600 dpi)
▪ Standard ISO 80000-2:2009
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 13
14. The four principles of linked data
publishing model:
Use URIs as names for things.
Use HTTP URIs so that people can look up those
names.
When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
Include links to other URIs. so that they can discover
more things.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 14
15. Opportunities for the data in accessibility
reports:
To customize answers to user queries.
To generate data-enriched reports for managers,
technicians (webmasters) and policy decision
makers.
To enrich search engines results with accessibility
results used as website quality indicators.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 15
16. TWO KEY DECISIONS
Reuse a formal
vocabulary: EARL
Use an open-source
CMS: Drupal.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 16
17. EARL = Evaluation and Report Language
It is a simple vocabulary that describes test
results, such as those generated by web
accessibility evaluation tools.
It uses the RDF data model to define terms for
expressing test results.
It is a W3C Working Draft.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 17
18. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 18
19. A new controlled vocabularies for the Test
Case class:
Containing 10 suitable criteria based on
requirements of both InftyReader OCR software
and ISO 80000-2:2009
Examples:
▪ C2-INFTYREADER: Image resolution must be equals or
greater than 600 dpi.
▪ C3-ISO: An explicitly defined function not depending on
the context is printed in Roman (upright) type, e.g. sin,
exp, ln.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 19
20. A new controlled vocabularies for the Test
Result class:
Containing 5 categories, based on the percentage
of formulae correctly converted into MathML.
Examples:
▪ Failed conversion [0%-20%]: This web has a very low
potential accessibility.
▪ Successful conversion [80%-100%]: This web has the
maximum potential accessibility.
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 20
21. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 21
22. Drupal is the first CMS with Semantic Web
services
▪ Use by non-experts
▪ Drupal 7 publishes data in RDF format.
Data model in RDF Data model in Drupal 7
Content type RDF class
Field RDF property
Node RDF resource
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 22
23. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 23
24. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 24
25. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 25
26. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 26
27. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 27
28. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 28
29. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 29
30. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 30
31. Most websites not accessible at all
Interactive
Non-interactive
▪ Formula images
▪ With very poor quality
▪ without alternative text.
▪ Formulae not following standards => OCR not working
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 31
32. March 2014 CSUN 2014 - Potential accessibility of mathematical webs 32
33. Useful methodology?
Future work:
Using math ontologies
▪ OpenMath
▪ OMDoc
Adapt formulas to RDFa
March 2014 CSUN 2014 - Potential accessibility of mathematical webs 33