SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
FinalReport
1. 1
Techniques of Audibly Representing
Mathematical Formulae
for People with Visual or Cognitive Disabilities
Matt Smith
Department of Electronics and Computer Science
University of Southampton
Abstract—The portrayal of math formula has always be well suited to a visual medium. Lines an brackets silently show
the relationships between different parts of the equation and the limitations of their scope. Taking this structural data
away leads to a great increase in the amount of data that must be sent to the user, which in turn greatly increases the
amount of mental processing required. This is an issue that all visually impaired users face and much research has
already been done into the best ways of audibly representing this information. This report will analyse the strengths and
weaknesses of this strategies, as well as examing some existing software solutions.
Index Terms—Computer Science, Assistive Technology, Audition, Mathematical Formulae, Visual Disabilities, Earcons,
Spearcons, Prosody, ChromeVox.
!
1 INTRODUCTION
MATHEMATICAL formulae can be very suc-
cinctly portrayed in their printed form,
allowing for non-temporal and unambiguous
visuals. The spatial structure is easily conveyed
by viewing the physical locations of the sym-
bols relative to their surroundings with the
medium acting as an extension of the users
memory. Users with visual impairments are not
able to use this extension and must have the
spatial information audibly transmitted to their
brains causing a vast increase in the cognitive
workload. To reduce this effect, researchers
have come up with a number of potential
solutions for converting these symbols into
sounds that represent them. This report will
analyse their strengths and weaknesses whilst
also looking into future possibilities.
The report will be structured into the follow-
ing sections:- An introduction to the problem;
- Methods of audibly representing information;
- Digital formats for representing mathematical
data; - A description of the browser based tool
’ChromeVox’ with an evaluation of its perfor-
mance; - A description of a desktop based tool
’Central Access Reader’; and finally a summary
comparison of these systems along with a pre-
sentation of conclusions and remaining issues.
2 PROBLEMS
2.1 Visual Structure
x =
−b ±
√
b2 − 4ac
2a
For a person with full mental and cognitive
abilities, reading the above formula does not
present a huge difficulty. Assuming some ba-
sic knowledge of mathematical symbols. The
reader knows instinctively from the visual lay-
out that the square root only applies to the
numerator of the fraction, that the denominator
of the fraction applies not only to the contents
of the square root, but the whole numerator,
and that the initial negation only applies to
the first variable b and not the entirety of the
equation. The information displayed has no
ambiguity due to the recognised visual conven-
tions of mathematics, however once this area
of information is lost, describing the equation
becomes a much more difficult process. The
2. 2
famous phrase ”a picture tells a thousand words”
is very relevant here, as turning this visual data
into audible information has proved to be a
highly complex process, both for the developer
and the user.
2.2 Digital Representation
It is in the nature of humans to design a system
that they believe would be easy to use without
necessarily taking into account the disabilities
of others who may use it. As such, this leads
to the development of formats and specifica-
tions which limit usability. A good example of
this is in Wikipedia, which displays all maths
equations as rasterised images, with the LATEX
mark-up representation stored in the images alt
tag [1].
Listing 1: LATEX representation of the quadratic
formula
x = frac{-b pmsqrt{bˆ{2}-4ac}}{2a}
While the information is still there in some
form, this mark-up is not user-friendly, requir-
ing listeners to understand a language built for
a computer. Digital formats are required that
are capable of retaining the meaning of the
content while still being understandable by a
user without additional learning.
2.3 Visual Walkthroughs
The final problem to be discussed in this report
is that of stepping through an equation. Maths
formulae can reach an infinite level of complex-
ity, with each level providing more data. As
such, is can be very helpful to breakdown the
equation into its component parts and visually
identify which are being spoken at the current
time. In many cases this is not possible due
to the digital representation issues mentioned
previously, further increasing the need for a
more standardised format.
3 AUDIBLE FORMATS
There are a number of ways of presenting
information audibly, each with its own benefits.
This section will briefly explain the different
methods and determine the best situations in
which to use them.
3.1 Lexical Cues
Lexical cues refers to the use of terms to
audibly identify the start and end of spatial
structures, for example ”bfrac” to represent the
beginning of a fraction block and ”efrac” to
represent the exit from that block. This process
entirely removes any ambiguity from the audio
stream, however it severely increases the cogni-
tive load on the user and very quickly becomes
confusing upon the utterance of more complex
equations [2].
There is also a situation known as the Suffix
Effect which shows that statements ending in
an utterance can detract from a users ability to
remember what they have just heard [2].
3.2 Dynamic Sonic Trajectories
Dynamic Sonic Trajectories (DST) is the use of
directional sound to create sonic shapes that
can be cognitively processed rather than per-
ceptually. This potentially allows the sounds
to be perceptually shifted around to allow the
user to hear its spatial structure. However,
studies have shown this additional audio in-
formation to have detrimental effects on the
cognitive processing of the maths due to its
increased mental workload of identifying the
sonic trajectories [3]. It has also been shown
that auditory localisation is particularly inac-
curate across the vertical dimension and highly
difficult to synthesise [2].
While these have shown that linear spatial
mapping from visual to audio is not hugely
useful, it is still valid that sounds from differ-
ent locations are easy to distinguish from one
another and thus can be used to provide ad-
ditional audio cues that complement the main
information stream [3].
3.3 Earcons, Spearcons and Auditory
Icons
One method of indicating spatial structures
using audition is to cognitively link each con-
struct component to a sound. These sounds can
take a number of formats.
3. 3
3.3.1 Earcons
Earcons use synthesised sounds, such as that of
the Windows error sound, to represent different
components. As these sounds are synthesized
they can easily be altered and varied to convey
similar meanings through tone or pitch [2].
While Earcons have been shown to be effective
[4], they do require the user to first learn all
their meanings and thus, a standard must be
built to unify all of these sounds .
3.3.2 Spearcons
Spearcons have been developed more recently,
consisting of a time-compressed spoken phrase.
This compression results in a phrase that
lies between recognisable speech and abstract
sound. Studies have shown that these are easier
to learn and can result in an increase in perfor-
mance in interactive interface tasks [2]. This is
mostly due to the fact that, as they lay outside
of the standard speech channel, it allows for
parallel information processing [5].
3.3.3 Auditory Icons
Similar to Earcons, Auditory Icons are made
with the intention of sounding like the ob-
ject that they are attempting to represent. This
form of synthesis causes the sounds to be
harder to manipulate, but much easier for the
user to identify, thus decreasing the mental
load [2]. However, it is highly ambiguous to
make claims on what a mathematical character
would sound like to any individual, making
it a much more complex process to create the
sounds.
3.3.4 Hybrid Sounds
Some of the previously explained methods can
be combined into a new format that can poten-
tially allow for the strengths of one method to
overcome the drawbacks of the other, resulting
in an overall more powerful system. An exam-
ple of this is the combination of Earcons with
Auditory Icons [5].
3.3.5 Sized Hybrids
Additional indicators can be added to hybrid
sounds to give an impression of the size of
an object. This can take the form of an extra
audible note or short melody that differs in
pitch or duration [5].
3.4 The efficiency of these methods
A research experiment into the ease of recogni-
tion and learning times, for each of these meth-
ods, was performed with relatively strong re-
sults. The test involved the selection of a num-
ber of environmental features which then had
Earcons, Spearcons, Auditory Icons, Speech,
Earcon-Icon hybrids and Sized Hybrids created
for them. Users were first given a training
period to learn the selection and then tested to
match the sound with a grid of possible mean-
ings. The results of this showed that spearcons
are as easy to learn as speech with identical
performance across the experiment. For the
majority of cases only a single cycle of training
was required to allow the user to achieve a
test with little to no error. The results also
showed that spearcons represent data faster
making them more efficient than speech and
thus leaving the speech channel open allowing
for two simultaneous streams of data [6]. An-
other study supports these conclusions, with
users showing a preference for spearcons over
speech, indicating that the sounds were more
recognisable and did not slow down the flow
of data [7].
3.5 Prosody
Prosody is the use of varied patterns in vo-
cal pitch and intonation to provide additional
meaning to a word or phrase. It can also refer
to changes in parameters such as the length of
pause between words and the overall speaking
rate [8].
Existing systems have attempted to use
prosodic schemes to convey the spatial mean-
ing of an equation with promising results and
it is believed that this method is a highly
suitable solution due to its reduced cognitive
workload, in comparison to lexical cues, and
intuitive understanding [9]. There is however
a drawback that this solution will not work
alone with highly complex equations consisting
of nested structures.
4. 4
This scheme has been implemented into a
form of markup language as Speech Synthe-
sis Markup Language (SSML) which shall be
discussed later.
4 DIGITAL FORMATS
4.1 MathML
MathML has become the W3C standard for
online mathematical markup. It comes in two
major formats: Presentation MathML and Con-
tent MathML. Presentation MathML is pre-
dominantly for correct visual formatting whilst
Content ensures that semantic correctness is
maintained [10]
Listing 2: MathML Code Sample of πr2
<math>
<mrow>
<mi>pi</mi>
<msup>
<mi>r</mi>
<mn>2</mn>
</msup>
</mrow>
</math>
A good example where this distinction is
important is when the three superscript terms:
A2
, f−1
and AT
are used. While these represent
an exponent, a function reversal and a matrix
transposition, respectively, they are all format-
ted exactly the same way using MathML, mak-
ing is impossible to directly differentiate one
type of equation from another [8]. By using
Content MathML, more specific mathematical
markup is used to allow the actual meaning
of the maths to be maintained throughout and
be distinguishable for digital translation. How-
ever, there are two issues present that make
Presentation MathML to be the more popular
format at this time. The first being that Content
MathML is only capable of representing basic
maths, and the second being that the majority
of today’s mathematical web publications are
written using Presentation MathML so comply-
ing to a more recognised standard will allow
for a more globalised usability [11].
While the need for MathML has been long-
standing, it is still not natively supported in
all web browsers [12]. It has, however, recently
been built into the WebKit browser develop-
ment and is included as part of the HTML5
specification. For all the browsers that do not
yet support it natively, there is still the option
to display it through the use of a plugin called
MathJax.
4.2 MathJax
MathJax functions as a universal plugin, for
browsers, that automatically converts MathML
into standard HTML markup. This removes
any issues with browser compatibility as
the capability to understand the MathML is
removed from the situation. However, the
markup is much less comprehensible as a
mathematical equation, as can be seen in figure
1. MathJax is also usable on mobile devices,
to the extent that it is now built into many e-
Readers so that they may correctly display any
MathML embedded within e-publications [14].
4.3 Speech Synthesis Mark-up Language
Another format used for TTS in complex vi-
sual structures is the Speech Synthesis Mark-
up Language (SSML). SSML is based on Java
Speech Mark-up Language (JSML) from Sun
Microsystems and is designed to more easily al-
low the insertion of readable prosody schemes
directly into standard HTML.
Listing 3: SSML Code Sample of πr2
<speak version="1.0" xml:lang="en-US">
pie
<break strength="medium"/>
<prosody rate="fast">
r
<break strength="x-weak"/>
<prosody pitch="high">
squared
</prosody>
</prosody>
</speak>
SSML allows for the prosodic properties to
be embedded directly into the document; these
are then read by a SSML interpreter which
combines them into a waveform appropriate
for a screen reader [15]. While this is a viable
system for general use, and the reading of
web pages, it has stated in its documentation
that it has not been developed for use with
mathematical formulae and has that as a future
extension. However [15] was written in 1997
5. 5
Fig. 1: MathJax Markup for displaying the quadratic formula [13]
and has not progressed a great deal to this
date, meaning this is unlikely to soon become
a viable alternative as a web standard.
5 CHROMEVOX
This section will briefly cover the functionality
of the Chrome browser plugin, ChromeVox,
and then analyse its performance relative to the
initial problem areas mentioned in this report.
Google’s ChromeVox is a prime candidate for
future innovation in the area of web accessibil-
ity. It provides an all round platform for web
access, covering general text-to-speech (TTS),
various earcons for HyperText Markup Lan-
guage (HTML) elements, prosody and granular
interaction.
ChromeVox works by providing a persistent
background service which then dynamically
injects element properties into the document
object model of the page. These properties
represent the behaviours related to the audio
and visual feedback appropriate for that spe-
cific screen element, including such features as
prosody cues, alternative tags, element order-
ing and highlighting rules. Additional elements
are also inserted over the top of the current
page to represent extra User Interface (UI) com-
ponents [16].
ChromeVox uses a combination of regular
TTS and Earcons in its navigation process.
Upon switching focus to a new object or sec-
tion, an earcon will sound, accompanied by the
spoken name of that section. Different earcons
are used to represent different situations, such
as entering or leaving a list, selecting a link, etc.
They are also used to represent different events
that occur within the browser, e.g. when the
page is fully loaded and ready for interaction
or when a dialogue box appears.
One of the distinguishing features of
ChromeVox, is in its varying levels of
navigational granularity. Due to the intricacies
of modern web pages, screen readers can
be forced to go through a large number of
elements before the user reaches the position
they desire. This then also requires the user to
sift through an unnecessary amount of audible
data, increasing their workload. By providing
multiple granularities, the user can choose
how deep the page is read at any one time,
allowing for navigation more akin to that of
visual navigation, where a user scans larger
groups of objects to find the specific section
they want, and then focuses in on the detailed
elements contained within it.
Listing 4: HTML Object level example
This is <span>a</span> Test
Specifically, ChromeVox provides five levels
of granularity, with those being: Group, Ob-
6. 6
ject, Sentence, Word and Character. While the
last three are self-explanatory the other two
are more ambiguous. The Group level repre-
sents a section of content where the elements
have been considered heuristically related. This
could be a HTML paragraph tag or a division
tag sectioning a part of the document. Finally,
the Object level considers each HTML tagged
element as a separate item, where the example
shown in Listing 4 would be divided into three
distinct objects and navigated through individ-
ually [17].
Fig. 2: Group level selection
Another key feature related to the initially
mentioned problems is that of walking through
equations. Due to these levels of granularity,
it is made possible to walk through equations
written with the MathML mark-up, however
these walkthroughs have varying degrees of
success and still remain an issue.
Fig. 3: Object level selection, stepping into the
equation
Fig. 4: Erroneous highlighting of equation
Figures 3 and 4 show errors in the high-
lighting that occur while stepping through a
MathML example of the quadratic formula.
The elements of the equation appear almost
randomly grouped with the −b± existing as
one entity, the second b being separate from its
exponent and the line representing the frame
of the square root’s contents being selected
multiple times after having tabbed through the
content itself.
Upon selection at a group level, the equation
is read out almost flawlessly. Prosody is used
to indicate the separation of levels in the
equation, through pitch changes in the voice
to show when transitioning to a numerator
or denominator, as well as pauses to indicate
the cohesion of different parts. In this case,
the only error was in the pronunciation of the
variable a. However, when walking through
the equation the spoken text becomes
”x equal minus b plus or minus b
two minus fourack overline overline
overline square root two a”
This is clearly not close to the desired out-
put and would leave a visually impaired user
with little idea of the true equation. The tech-
nique for this walkthrough process is defi-
nitely present within the application, however
it requires refining to allow for the optimum
experience.
The finer levels of granularity do allow you
to select each individual element of the equa-
tion, however, at this depth they lose all their
relative meaning and any prosody rules that
were previously applied. Exponents become
standard numbers and lose any pitch associ-
ated with their vertical positioning and any
recognition of the location within a fraction is
no longer conveyed. While it may be more au-
dibly clear to hear everything uniformly when
stepping through slowly, the structural infor-
mation is still highly relevant when the user
is visually impaired meaning it is appropriate
to retain all modifiers, even when at the finest
level of granularity.
While ChromeVox does function well in
many of its key areas, granular stepping of
maths equations still falls short of the require-
ments for impaired users.
7. 7
6 CENTRAL ACCESS READER (CAR)
Another recent addition to the market is that of
the free, open source application from Central
Washington University. This software covers
the issues of reading text and mathematical
formulae from word documents or simply by
copying text directly into the interface. It is
capable of dual highlighting of the read text, in
that it highlights the area of text it is reading
as well as the individual word, however, is not
able to highlight the individual components in
equations when there is more than a single
level of vertical structural complexity. It pro-
vides multiple settings for the reading of maths
allowing it to read different types of equations
in different ways.
Upon testing of its mathematical reading
abilities it is perfectly capable of reading the
quadratic equation when supplied via a Word
document, giving the output of:
”x equals begin fraction negative b
plus or minus the square root of b
squared minus four a c over two a
end fraction”
However, when attempting to copy and paste
the equation in, either directly from Mircrosoft
Word, a PDF or as MathML it does not function
correctly and either fails to display the copied
text, displays incorrect symbols, or inserts bars
as square root markers respectively.
While the system seems proficient when it
comes to reading text, and definitely capable
of reading mathematics when in the form of a
Word document, this is not a highly common
format to receive formulae in and would re-
quire an additional level of conversion to place
it in the appropriate format before it could be
used. This makes the system inefficient and not
ideal for users who are already impaired.
7 SYSTEM COMPARISON
Having now tested both systems with math-
ematical formula in various forms, each has
shown its own strengths. Central Access
Reader has opted for the more verbose option
of narratively describing the structure through
the use of begin and end statements. This takes
more time, increasing the overall time taken
to convey the information to the user and
making them work harder. However it is a
more implicit way that requires no additional
learning of prosodic methods. ChromeVox has
instead opted for these prosodic components,
using pitch change and pauses to convey extra
information without taking extra time. It also
uses earcons and spearcons to represent struc-
tural data in a faster, more recognisable way
that does not block the speech channel in the
process.
While both have standard textual highlight-
ing systems, ChromeVox’s ability to adjust
granularity allows the user to alter their experi-
ence as they require a more shallow or deeper
perception of the content. This potentially al-
lows for precise stepping through of maths
equations, however this has been shown to not
yet be the case.
A feature of CAR not present in ChromeVox
is that of the ability to read files. Equally,
ChromeVox is capable of directly reading web
pages while CAR requires that the contents be
copied into its interface.
Finally, CAR also has the ability to output
its content as an MP3 file, allowing the user
to listen to it at a later date on another device
without the software installed. This allows for
an additional level of usability that ChromeVox
cannot match, however provides no direct ben-
efit other than being able to listen to documents
when no computer is available.
Feature ChromeVox CAR
Prosody
Syntax Highlighting
Formula Highlighting
File Reading
Browser Reading
MP3 Output
8. 8
8 REMAINING ISSUES
8.1 Syntax higlighting in equations
While this issue has been somewhat covered
in Google’s ChromeVox, it is still clear from
the demonstration that the system is not ideal,
and stepping between different levels of the
equation’s structure can still be a complex and
confusing task for users.
8.2 How should numbers be spoken?
While the issue of saying numbers may seem
trivial in comparison to the issues of convey-
ing memorable structure, the conventions used
must also be considered and maintained [18].
This issue has not been discussed in this report,
however, it will ultimately affect the process
of developing a standardised math reading
system and is another case where agreeing on
a specification is necessary.
8.3 Limitations of MathML
While MathML has become the standard for-
mat for online mathematical displays, it is still
not able to convey all possible equations [19].
It is also contradicting in the way that Pre-
sentation MathML is the most commonly used
mark-up while also being the least accessible
form. This will most likely remain the case,
however, until Content MathML is capable of
defining a greater range of formulae.
9 CONCLUSION
In this report, the techniques with the high-
est beneficial impact have been discussed and
comparisons on a number of the most promi-
nent systems have been made to show where
existing systems still fall short. Due to the
current trends in web technology growth and
the undeniable strength of cloud computing
services, a web based system does have merit
as being the strongest candidate for people
with disabilities. However, these people must
live with this difficulty everywhere they go,
not simply in places where there is an active
internet connection, making a stand-alone ap-
plication much more preferable in terms of
usability.
In terms of performance, based on the
explained issues, the ChromeVox plugin for
Google web browser has shown itself to be
more proficient at the tasks and to be, overall,
more convenient to use. However, it is not a
perfect system, with the walking through of
equations still requiring work to make it usable
by those reliant on it being accurate.
REFERENCES
[1] J. F. S. Lveda and L. Ferres, “Improving accessibility to
mathematical formulas: the wikipedia math accessor,”
New Review of Hypermedia and Multimedia, 2012.
[2] E. Bates and D. Fitzpatrick, “Spoken mathematics using
prosody, earcons and spearcons,” tech. rep., Dublin City
University, 2010.
[3] A. Hollander and T. Furness, “Perception of virtual au-
ditory shapes. in: Proceedings of the international confer-
ence on auditory displays,” tech. rep., , 1994.
[4] R. D. Stevens and A. D. N. Edwards, “An approach to the
evaluation of assistive technology,” tech. rep., University
of York, 1996.
[5] Learnability of Sound Cues for Environmental Features: Audi-
tory Icons, Earcons, Spearcons, and Speech, 2008.
[6] E. Gellenbeck and A. Stefik, “Evaluating prosodic cues as
a means to disambiguate algebraic expressions: An em-
pirical study,” tech. rep., Central Washington University,
2009.
[7] E. Murphy, E. Bates, and D. Fitzpatrick, “Designing au-
ditory cues to enhance spoken mathematics for visually
impaired users,” tech. rep., Dublin City University, 2010.
[8] E. G. et al, “Speaking mathml: Using prosody and context-
sensitive inferences to produce synthesized speech,” tech.
rep., , 2005.
[9] R. D. Stevens, Principles for the Design of Auditory Interfaces
to Present Complex Information to Blind Computer Users.
PhD thesis, University of York, 1996.
[10] R. A. et al, “Mathematical markup language (mathml)
version 3.0,” tech. rep., W3C, 2010.
[11] H. Ferreira and D. Freitas, “Audiomath using mathml
for speaking mathematics,” tech. rep., University of Porto,
2005.
[12] Wikipedia, “Mathematical markup langugaes (mathml).”
http://en.wikipedia.org/wiki/MathML.
[13] V. Sorge, “Accessibility to scientific material: The case of
speaking math,” tech. rep., The University of Birming-
ham, .
[14] D. Cervone, “Mathjax: A platform for mathematics on the
web,” Notices of the AMS, vol. 59, no. 2, pp. 312–316, 2012.
[15] P. Taylor and A. Isard, “Ssml: A speech synthesis markup
language,” Speech Communication, vol. 21, no. 12, pp. 123
– 133, 1997. Speak!
[16] Google, “Chromevox source code,” 2012.
https://code.google.com/p/google-axs-chrome/.
[17] T. V. R. et al, “Chromevox a screen reader built using web
technology,” tech. rep., Google Inc, 2012.
[18] R. Fateman, “How can we speak math?,” tech. rep.,
University of California, 2013.
[19] K. Kofler, P. Schodl, and A. Neumaier, “Limitations in
content mathml,” tech. rep., University of Vienna, 2009.