SlideShare a Scribd company logo
1 of 8
Download to read offline
1
Techniques of Audibly Representing
Mathematical Formulae
for People with Visual or Cognitive Disabilities
Matt Smith
Department of Electronics and Computer Science
University of Southampton
Abstract—The portrayal of math formula has always be well suited to a visual medium. Lines an brackets silently show
the relationships between different parts of the equation and the limitations of their scope. Taking this structural data
away leads to a great increase in the amount of data that must be sent to the user, which in turn greatly increases the
amount of mental processing required. This is an issue that all visually impaired users face and much research has
already been done into the best ways of audibly representing this information. This report will analyse the strengths and
weaknesses of this strategies, as well as examing some existing software solutions.
Index Terms—Computer Science, Assistive Technology, Audition, Mathematical Formulae, Visual Disabilities, Earcons,
Spearcons, Prosody, ChromeVox.
!
1 INTRODUCTION
MATHEMATICAL formulae can be very suc-
cinctly portrayed in their printed form,
allowing for non-temporal and unambiguous
visuals. The spatial structure is easily conveyed
by viewing the physical locations of the sym-
bols relative to their surroundings with the
medium acting as an extension of the users
memory. Users with visual impairments are not
able to use this extension and must have the
spatial information audibly transmitted to their
brains causing a vast increase in the cognitive
workload. To reduce this effect, researchers
have come up with a number of potential
solutions for converting these symbols into
sounds that represent them. This report will
analyse their strengths and weaknesses whilst
also looking into future possibilities.
The report will be structured into the follow-
ing sections:- An introduction to the problem;
- Methods of audibly representing information;
- Digital formats for representing mathematical
data; - A description of the browser based tool
’ChromeVox’ with an evaluation of its perfor-
mance; - A description of a desktop based tool
’Central Access Reader’; and finally a summary
comparison of these systems along with a pre-
sentation of conclusions and remaining issues.
2 PROBLEMS
2.1 Visual Structure
x =
−b ±
√
b2 − 4ac
2a
For a person with full mental and cognitive
abilities, reading the above formula does not
present a huge difficulty. Assuming some ba-
sic knowledge of mathematical symbols. The
reader knows instinctively from the visual lay-
out that the square root only applies to the
numerator of the fraction, that the denominator
of the fraction applies not only to the contents
of the square root, but the whole numerator,
and that the initial negation only applies to
the first variable b and not the entirety of the
equation. The information displayed has no
ambiguity due to the recognised visual conven-
tions of mathematics, however once this area
of information is lost, describing the equation
becomes a much more difficult process. The
2
famous phrase ”a picture tells a thousand words”
is very relevant here, as turning this visual data
into audible information has proved to be a
highly complex process, both for the developer
and the user.
2.2 Digital Representation
It is in the nature of humans to design a system
that they believe would be easy to use without
necessarily taking into account the disabilities
of others who may use it. As such, this leads
to the development of formats and specifica-
tions which limit usability. A good example of
this is in Wikipedia, which displays all maths
equations as rasterised images, with the LATEX
mark-up representation stored in the images alt
tag [1].
Listing 1: LATEX representation of the quadratic
formula
x = frac{-b pmsqrt{bˆ{2}-4ac}}{2a}
While the information is still there in some
form, this mark-up is not user-friendly, requir-
ing listeners to understand a language built for
a computer. Digital formats are required that
are capable of retaining the meaning of the
content while still being understandable by a
user without additional learning.
2.3 Visual Walkthroughs
The final problem to be discussed in this report
is that of stepping through an equation. Maths
formulae can reach an infinite level of complex-
ity, with each level providing more data. As
such, is can be very helpful to breakdown the
equation into its component parts and visually
identify which are being spoken at the current
time. In many cases this is not possible due
to the digital representation issues mentioned
previously, further increasing the need for a
more standardised format.
3 AUDIBLE FORMATS
There are a number of ways of presenting
information audibly, each with its own benefits.
This section will briefly explain the different
methods and determine the best situations in
which to use them.
3.1 Lexical Cues
Lexical cues refers to the use of terms to
audibly identify the start and end of spatial
structures, for example ”bfrac” to represent the
beginning of a fraction block and ”efrac” to
represent the exit from that block. This process
entirely removes any ambiguity from the audio
stream, however it severely increases the cogni-
tive load on the user and very quickly becomes
confusing upon the utterance of more complex
equations [2].
There is also a situation known as the Suffix
Effect which shows that statements ending in
an utterance can detract from a users ability to
remember what they have just heard [2].
3.2 Dynamic Sonic Trajectories
Dynamic Sonic Trajectories (DST) is the use of
directional sound to create sonic shapes that
can be cognitively processed rather than per-
ceptually. This potentially allows the sounds
to be perceptually shifted around to allow the
user to hear its spatial structure. However,
studies have shown this additional audio in-
formation to have detrimental effects on the
cognitive processing of the maths due to its
increased mental workload of identifying the
sonic trajectories [3]. It has also been shown
that auditory localisation is particularly inac-
curate across the vertical dimension and highly
difficult to synthesise [2].
While these have shown that linear spatial
mapping from visual to audio is not hugely
useful, it is still valid that sounds from differ-
ent locations are easy to distinguish from one
another and thus can be used to provide ad-
ditional audio cues that complement the main
information stream [3].
3.3 Earcons, Spearcons and Auditory
Icons
One method of indicating spatial structures
using audition is to cognitively link each con-
struct component to a sound. These sounds can
take a number of formats.
3
3.3.1 Earcons
Earcons use synthesised sounds, such as that of
the Windows error sound, to represent different
components. As these sounds are synthesized
they can easily be altered and varied to convey
similar meanings through tone or pitch [2].
While Earcons have been shown to be effective
[4], they do require the user to first learn all
their meanings and thus, a standard must be
built to unify all of these sounds .
3.3.2 Spearcons
Spearcons have been developed more recently,
consisting of a time-compressed spoken phrase.
This compression results in a phrase that
lies between recognisable speech and abstract
sound. Studies have shown that these are easier
to learn and can result in an increase in perfor-
mance in interactive interface tasks [2]. This is
mostly due to the fact that, as they lay outside
of the standard speech channel, it allows for
parallel information processing [5].
3.3.3 Auditory Icons
Similar to Earcons, Auditory Icons are made
with the intention of sounding like the ob-
ject that they are attempting to represent. This
form of synthesis causes the sounds to be
harder to manipulate, but much easier for the
user to identify, thus decreasing the mental
load [2]. However, it is highly ambiguous to
make claims on what a mathematical character
would sound like to any individual, making
it a much more complex process to create the
sounds.
3.3.4 Hybrid Sounds
Some of the previously explained methods can
be combined into a new format that can poten-
tially allow for the strengths of one method to
overcome the drawbacks of the other, resulting
in an overall more powerful system. An exam-
ple of this is the combination of Earcons with
Auditory Icons [5].
3.3.5 Sized Hybrids
Additional indicators can be added to hybrid
sounds to give an impression of the size of
an object. This can take the form of an extra
audible note or short melody that differs in
pitch or duration [5].
3.4 The efficiency of these methods
A research experiment into the ease of recogni-
tion and learning times, for each of these meth-
ods, was performed with relatively strong re-
sults. The test involved the selection of a num-
ber of environmental features which then had
Earcons, Spearcons, Auditory Icons, Speech,
Earcon-Icon hybrids and Sized Hybrids created
for them. Users were first given a training
period to learn the selection and then tested to
match the sound with a grid of possible mean-
ings. The results of this showed that spearcons
are as easy to learn as speech with identical
performance across the experiment. For the
majority of cases only a single cycle of training
was required to allow the user to achieve a
test with little to no error. The results also
showed that spearcons represent data faster
making them more efficient than speech and
thus leaving the speech channel open allowing
for two simultaneous streams of data [6]. An-
other study supports these conclusions, with
users showing a preference for spearcons over
speech, indicating that the sounds were more
recognisable and did not slow down the flow
of data [7].
3.5 Prosody
Prosody is the use of varied patterns in vo-
cal pitch and intonation to provide additional
meaning to a word or phrase. It can also refer
to changes in parameters such as the length of
pause between words and the overall speaking
rate [8].
Existing systems have attempted to use
prosodic schemes to convey the spatial mean-
ing of an equation with promising results and
it is believed that this method is a highly
suitable solution due to its reduced cognitive
workload, in comparison to lexical cues, and
intuitive understanding [9]. There is however
a drawback that this solution will not work
alone with highly complex equations consisting
of nested structures.
4
This scheme has been implemented into a
form of markup language as Speech Synthe-
sis Markup Language (SSML) which shall be
discussed later.
4 DIGITAL FORMATS
4.1 MathML
MathML has become the W3C standard for
online mathematical markup. It comes in two
major formats: Presentation MathML and Con-
tent MathML. Presentation MathML is pre-
dominantly for correct visual formatting whilst
Content ensures that semantic correctness is
maintained [10]
Listing 2: MathML Code Sample of πr2
<math>
<mrow>
<mi>pi</mi>
<msup>
<mi>r</mi>
<mn>2</mn>
</msup>
</mrow>
</math>
A good example where this distinction is
important is when the three superscript terms:
A2
, f−1
and AT
are used. While these represent
an exponent, a function reversal and a matrix
transposition, respectively, they are all format-
ted exactly the same way using MathML, mak-
ing is impossible to directly differentiate one
type of equation from another [8]. By using
Content MathML, more specific mathematical
markup is used to allow the actual meaning
of the maths to be maintained throughout and
be distinguishable for digital translation. How-
ever, there are two issues present that make
Presentation MathML to be the more popular
format at this time. The first being that Content
MathML is only capable of representing basic
maths, and the second being that the majority
of today’s mathematical web publications are
written using Presentation MathML so comply-
ing to a more recognised standard will allow
for a more globalised usability [11].
While the need for MathML has been long-
standing, it is still not natively supported in
all web browsers [12]. It has, however, recently
been built into the WebKit browser develop-
ment and is included as part of the HTML5
specification. For all the browsers that do not
yet support it natively, there is still the option
to display it through the use of a plugin called
MathJax.
4.2 MathJax
MathJax functions as a universal plugin, for
browsers, that automatically converts MathML
into standard HTML markup. This removes
any issues with browser compatibility as
the capability to understand the MathML is
removed from the situation. However, the
markup is much less comprehensible as a
mathematical equation, as can be seen in figure
1. MathJax is also usable on mobile devices,
to the extent that it is now built into many e-
Readers so that they may correctly display any
MathML embedded within e-publications [14].
4.3 Speech Synthesis Mark-up Language
Another format used for TTS in complex vi-
sual structures is the Speech Synthesis Mark-
up Language (SSML). SSML is based on Java
Speech Mark-up Language (JSML) from Sun
Microsystems and is designed to more easily al-
low the insertion of readable prosody schemes
directly into standard HTML.
Listing 3: SSML Code Sample of πr2
<speak version="1.0" xml:lang="en-US">
pie
<break strength="medium"/>
<prosody rate="fast">
r
<break strength="x-weak"/>
<prosody pitch="high">
squared
</prosody>
</prosody>
</speak>
SSML allows for the prosodic properties to
be embedded directly into the document; these
are then read by a SSML interpreter which
combines them into a waveform appropriate
for a screen reader [15]. While this is a viable
system for general use, and the reading of
web pages, it has stated in its documentation
that it has not been developed for use with
mathematical formulae and has that as a future
extension. However [15] was written in 1997
5
Fig. 1: MathJax Markup for displaying the quadratic formula [13]
and has not progressed a great deal to this
date, meaning this is unlikely to soon become
a viable alternative as a web standard.
5 CHROMEVOX
This section will briefly cover the functionality
of the Chrome browser plugin, ChromeVox,
and then analyse its performance relative to the
initial problem areas mentioned in this report.
Google’s ChromeVox is a prime candidate for
future innovation in the area of web accessibil-
ity. It provides an all round platform for web
access, covering general text-to-speech (TTS),
various earcons for HyperText Markup Lan-
guage (HTML) elements, prosody and granular
interaction.
ChromeVox works by providing a persistent
background service which then dynamically
injects element properties into the document
object model of the page. These properties
represent the behaviours related to the audio
and visual feedback appropriate for that spe-
cific screen element, including such features as
prosody cues, alternative tags, element order-
ing and highlighting rules. Additional elements
are also inserted over the top of the current
page to represent extra User Interface (UI) com-
ponents [16].
ChromeVox uses a combination of regular
TTS and Earcons in its navigation process.
Upon switching focus to a new object or sec-
tion, an earcon will sound, accompanied by the
spoken name of that section. Different earcons
are used to represent different situations, such
as entering or leaving a list, selecting a link, etc.
They are also used to represent different events
that occur within the browser, e.g. when the
page is fully loaded and ready for interaction
or when a dialogue box appears.
One of the distinguishing features of
ChromeVox, is in its varying levels of
navigational granularity. Due to the intricacies
of modern web pages, screen readers can
be forced to go through a large number of
elements before the user reaches the position
they desire. This then also requires the user to
sift through an unnecessary amount of audible
data, increasing their workload. By providing
multiple granularities, the user can choose
how deep the page is read at any one time,
allowing for navigation more akin to that of
visual navigation, where a user scans larger
groups of objects to find the specific section
they want, and then focuses in on the detailed
elements contained within it.
Listing 4: HTML Object level example
This is <span>a</span> Test
Specifically, ChromeVox provides five levels
of granularity, with those being: Group, Ob-
6
ject, Sentence, Word and Character. While the
last three are self-explanatory the other two
are more ambiguous. The Group level repre-
sents a section of content where the elements
have been considered heuristically related. This
could be a HTML paragraph tag or a division
tag sectioning a part of the document. Finally,
the Object level considers each HTML tagged
element as a separate item, where the example
shown in Listing 4 would be divided into three
distinct objects and navigated through individ-
ually [17].
Fig. 2: Group level selection
Another key feature related to the initially
mentioned problems is that of walking through
equations. Due to these levels of granularity,
it is made possible to walk through equations
written with the MathML mark-up, however
these walkthroughs have varying degrees of
success and still remain an issue.
Fig. 3: Object level selection, stepping into the
equation
Fig. 4: Erroneous highlighting of equation
Figures 3 and 4 show errors in the high-
lighting that occur while stepping through a
MathML example of the quadratic formula.
The elements of the equation appear almost
randomly grouped with the −b± existing as
one entity, the second b being separate from its
exponent and the line representing the frame
of the square root’s contents being selected
multiple times after having tabbed through the
content itself.
Upon selection at a group level, the equation
is read out almost flawlessly. Prosody is used
to indicate the separation of levels in the
equation, through pitch changes in the voice
to show when transitioning to a numerator
or denominator, as well as pauses to indicate
the cohesion of different parts. In this case,
the only error was in the pronunciation of the
variable a. However, when walking through
the equation the spoken text becomes
”x equal minus b plus or minus b
two minus fourack overline overline
overline square root two a”
This is clearly not close to the desired out-
put and would leave a visually impaired user
with little idea of the true equation. The tech-
nique for this walkthrough process is defi-
nitely present within the application, however
it requires refining to allow for the optimum
experience.
The finer levels of granularity do allow you
to select each individual element of the equa-
tion, however, at this depth they lose all their
relative meaning and any prosody rules that
were previously applied. Exponents become
standard numbers and lose any pitch associ-
ated with their vertical positioning and any
recognition of the location within a fraction is
no longer conveyed. While it may be more au-
dibly clear to hear everything uniformly when
stepping through slowly, the structural infor-
mation is still highly relevant when the user
is visually impaired meaning it is appropriate
to retain all modifiers, even when at the finest
level of granularity.
While ChromeVox does function well in
many of its key areas, granular stepping of
maths equations still falls short of the require-
ments for impaired users.
7
6 CENTRAL ACCESS READER (CAR)
Another recent addition to the market is that of
the free, open source application from Central
Washington University. This software covers
the issues of reading text and mathematical
formulae from word documents or simply by
copying text directly into the interface. It is
capable of dual highlighting of the read text, in
that it highlights the area of text it is reading
as well as the individual word, however, is not
able to highlight the individual components in
equations when there is more than a single
level of vertical structural complexity. It pro-
vides multiple settings for the reading of maths
allowing it to read different types of equations
in different ways.
Upon testing of its mathematical reading
abilities it is perfectly capable of reading the
quadratic equation when supplied via a Word
document, giving the output of:
”x equals begin fraction negative b
plus or minus the square root of b
squared minus four a c over two a
end fraction”
However, when attempting to copy and paste
the equation in, either directly from Mircrosoft
Word, a PDF or as MathML it does not function
correctly and either fails to display the copied
text, displays incorrect symbols, or inserts bars
as square root markers respectively.
While the system seems proficient when it
comes to reading text, and definitely capable
of reading mathematics when in the form of a
Word document, this is not a highly common
format to receive formulae in and would re-
quire an additional level of conversion to place
it in the appropriate format before it could be
used. This makes the system inefficient and not
ideal for users who are already impaired.
7 SYSTEM COMPARISON
Having now tested both systems with math-
ematical formula in various forms, each has
shown its own strengths. Central Access
Reader has opted for the more verbose option
of narratively describing the structure through
the use of begin and end statements. This takes
more time, increasing the overall time taken
to convey the information to the user and
making them work harder. However it is a
more implicit way that requires no additional
learning of prosodic methods. ChromeVox has
instead opted for these prosodic components,
using pitch change and pauses to convey extra
information without taking extra time. It also
uses earcons and spearcons to represent struc-
tural data in a faster, more recognisable way
that does not block the speech channel in the
process.
While both have standard textual highlight-
ing systems, ChromeVox’s ability to adjust
granularity allows the user to alter their experi-
ence as they require a more shallow or deeper
perception of the content. This potentially al-
lows for precise stepping through of maths
equations, however this has been shown to not
yet be the case.
A feature of CAR not present in ChromeVox
is that of the ability to read files. Equally,
ChromeVox is capable of directly reading web
pages while CAR requires that the contents be
copied into its interface.
Finally, CAR also has the ability to output
its content as an MP3 file, allowing the user
to listen to it at a later date on another device
without the software installed. This allows for
an additional level of usability that ChromeVox
cannot match, however provides no direct ben-
efit other than being able to listen to documents
when no computer is available.
Feature ChromeVox CAR
Prosody
Syntax Highlighting
Formula Highlighting
File Reading
Browser Reading
MP3 Output
8
8 REMAINING ISSUES
8.1 Syntax higlighting in equations
While this issue has been somewhat covered
in Google’s ChromeVox, it is still clear from
the demonstration that the system is not ideal,
and stepping between different levels of the
equation’s structure can still be a complex and
confusing task for users.
8.2 How should numbers be spoken?
While the issue of saying numbers may seem
trivial in comparison to the issues of convey-
ing memorable structure, the conventions used
must also be considered and maintained [18].
This issue has not been discussed in this report,
however, it will ultimately affect the process
of developing a standardised math reading
system and is another case where agreeing on
a specification is necessary.
8.3 Limitations of MathML
While MathML has become the standard for-
mat for online mathematical displays, it is still
not able to convey all possible equations [19].
It is also contradicting in the way that Pre-
sentation MathML is the most commonly used
mark-up while also being the least accessible
form. This will most likely remain the case,
however, until Content MathML is capable of
defining a greater range of formulae.
9 CONCLUSION
In this report, the techniques with the high-
est beneficial impact have been discussed and
comparisons on a number of the most promi-
nent systems have been made to show where
existing systems still fall short. Due to the
current trends in web technology growth and
the undeniable strength of cloud computing
services, a web based system does have merit
as being the strongest candidate for people
with disabilities. However, these people must
live with this difficulty everywhere they go,
not simply in places where there is an active
internet connection, making a stand-alone ap-
plication much more preferable in terms of
usability.
In terms of performance, based on the
explained issues, the ChromeVox plugin for
Google web browser has shown itself to be
more proficient at the tasks and to be, overall,
more convenient to use. However, it is not a
perfect system, with the walking through of
equations still requiring work to make it usable
by those reliant on it being accurate.
REFERENCES
[1] J. F. S. Lveda and L. Ferres, “Improving accessibility to
mathematical formulas: the wikipedia math accessor,”
New Review of Hypermedia and Multimedia, 2012.
[2] E. Bates and D. Fitzpatrick, “Spoken mathematics using
prosody, earcons and spearcons,” tech. rep., Dublin City
University, 2010.
[3] A. Hollander and T. Furness, “Perception of virtual au-
ditory shapes. in: Proceedings of the international confer-
ence on auditory displays,” tech. rep., , 1994.
[4] R. D. Stevens and A. D. N. Edwards, “An approach to the
evaluation of assistive technology,” tech. rep., University
of York, 1996.
[5] Learnability of Sound Cues for Environmental Features: Audi-
tory Icons, Earcons, Spearcons, and Speech, 2008.
[6] E. Gellenbeck and A. Stefik, “Evaluating prosodic cues as
a means to disambiguate algebraic expressions: An em-
pirical study,” tech. rep., Central Washington University,
2009.
[7] E. Murphy, E. Bates, and D. Fitzpatrick, “Designing au-
ditory cues to enhance spoken mathematics for visually
impaired users,” tech. rep., Dublin City University, 2010.
[8] E. G. et al, “Speaking mathml: Using prosody and context-
sensitive inferences to produce synthesized speech,” tech.
rep., , 2005.
[9] R. D. Stevens, Principles for the Design of Auditory Interfaces
to Present Complex Information to Blind Computer Users.
PhD thesis, University of York, 1996.
[10] R. A. et al, “Mathematical markup language (mathml)
version 3.0,” tech. rep., W3C, 2010.
[11] H. Ferreira and D. Freitas, “Audiomath using mathml
for speaking mathematics,” tech. rep., University of Porto,
2005.
[12] Wikipedia, “Mathematical markup langugaes (mathml).”
http://en.wikipedia.org/wiki/MathML.
[13] V. Sorge, “Accessibility to scientific material: The case of
speaking math,” tech. rep., The University of Birming-
ham, .
[14] D. Cervone, “Mathjax: A platform for mathematics on the
web,” Notices of the AMS, vol. 59, no. 2, pp. 312–316, 2012.
[15] P. Taylor and A. Isard, “Ssml: A speech synthesis markup
language,” Speech Communication, vol. 21, no. 12, pp. 123
– 133, 1997. Speak!
[16] Google, “Chromevox source code,” 2012.
https://code.google.com/p/google-axs-chrome/.
[17] T. V. R. et al, “Chromevox a screen reader built using web
technology,” tech. rep., Google Inc, 2012.
[18] R. Fateman, “How can we speak math?,” tech. rep.,
University of California, 2013.
[19] K. Kofler, P. Schodl, and A. Neumaier, “Limitations in
content mathml,” tech. rep., University of Vienna, 2009.

More Related Content

What's hot

DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGkevig
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Networkkevig
 
Building Bridges, a model for facilitating communication
Building Bridges, a model for facilitating communicationBuilding Bridges, a model for facilitating communication
Building Bridges, a model for facilitating communicationFlorence Dambricourt
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A ReviewIRJET Journal
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONijscai
 
Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...butest
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...butest
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGijasuc
 
Meetup 22/2/2018 - Artificiële Intelligentie & Human Resources
Meetup 22/2/2018 - Artificiële Intelligentie & Human ResourcesMeetup 22/2/2018 - Artificiële Intelligentie & Human Resources
Meetup 22/2/2018 - Artificiële Intelligentie & Human ResourcesDigipolis Antwerpen
 
Lecture Notes in Computer Science:
Lecture Notes in Computer Science:Lecture Notes in Computer Science:
Lecture Notes in Computer Science:butest
 
Cleveree: an artificially intelligent web service for Jacob voice chatbot
Cleveree: an artificially intelligent web service for Jacob voice chatbotCleveree: an artificially intelligent web service for Jacob voice chatbot
Cleveree: an artificially intelligent web service for Jacob voice chatbotTELKOMNIKA JOURNAL
 
Participation Patterns in Formal, Non-Formal, and Informal Online Learning En...
Participation Patterns in Formal, Non-Formal, and Informal Online Learning En...Participation Patterns in Formal, Non-Formal, and Informal Online Learning En...
Participation Patterns in Formal, Non-Formal, and Informal Online Learning En...Richard Schwier
 
IRJET- Emotion Recognition from Voice
IRJET- Emotion Recognition from VoiceIRJET- Emotion Recognition from Voice
IRJET- Emotion Recognition from VoiceIRJET Journal
 
semantic text doc clustering
semantic text doc clusteringsemantic text doc clustering
semantic text doc clusteringSouvik Roy
 
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...ijasuc
 

What's hot (17)

DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural NetworkSentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
Sentiment Analysis In Myanmar Language Using Convolutional Lstm Neural Network
 
Building Bridges, a model for facilitating communication
Building Bridges, a model for facilitating communicationBuilding Bridges, a model for facilitating communication
Building Bridges, a model for facilitating communication
 
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
 
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATIONTHE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
THE EFFECTS OF THE LDA TOPIC MODEL ON SENTIMENT CLASSIFICATION
 
Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...
 
Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...Statistical Named Entity Recognition for Hungarian – analysis ...
Statistical Named Entity Recognition for Hungarian – analysis ...
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKING
 
Meetup 22/2/2018 - Artificiële Intelligentie & Human Resources
Meetup 22/2/2018 - Artificiële Intelligentie & Human ResourcesMeetup 22/2/2018 - Artificiële Intelligentie & Human Resources
Meetup 22/2/2018 - Artificiële Intelligentie & Human Resources
 
Lecture Notes in Computer Science:
Lecture Notes in Computer Science:Lecture Notes in Computer Science:
Lecture Notes in Computer Science:
 
Cleveree: an artificially intelligent web service for Jacob voice chatbot
Cleveree: an artificially intelligent web service for Jacob voice chatbotCleveree: an artificially intelligent web service for Jacob voice chatbot
Cleveree: an artificially intelligent web service for Jacob voice chatbot
 
Participation Patterns in Formal, Non-Formal, and Informal Online Learning En...
Participation Patterns in Formal, Non-Formal, and Informal Online Learning En...Participation Patterns in Formal, Non-Formal, and Informal Online Learning En...
Participation Patterns in Formal, Non-Formal, and Informal Online Learning En...
 
IRJET- Emotion Recognition from Voice
IRJET- Emotion Recognition from VoiceIRJET- Emotion Recognition from Voice
IRJET- Emotion Recognition from Voice
 
Opportunistic Routing in Delay Tolerant Network with Different Routing Algorithm
Opportunistic Routing in Delay Tolerant Network with Different Routing AlgorithmOpportunistic Routing in Delay Tolerant Network with Different Routing Algorithm
Opportunistic Routing in Delay Tolerant Network with Different Routing Algorithm
 
semantic text doc clustering
semantic text doc clusteringsemantic text doc clustering
semantic text doc clustering
 
52 57
52 5752 57
52 57
 
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...
COMMUNITY DETECTION USING INTER CONTACT TIME AND SOCIAL CHARACTERISTICS BASED...
 

Viewers also liked

Plumber scottsdale
Plumber scottsdalePlumber scottsdale
Plumber scottsdalearlobrown
 
COMM 125 Portfolio
COMM 125 PortfolioCOMM 125 Portfolio
COMM 125 PortfolioCole Hibbard
 
Вебинар на тему знакомство с Ansible. популярные практики и ошибки
Вебинар на тему  знакомство с Ansible. популярные практики и ошибкиВебинар на тему  знакомство с Ansible. популярные практики и ошибки
Вебинар на тему знакомство с Ansible. популярные практики и ошибкиPaul Yehorov
 
Priyanka nandwani participate kaosal vikash mela as a training partner
Priyanka nandwani participate kaosal vikash mela as a training partnerPriyanka nandwani participate kaosal vikash mela as a training partner
Priyanka nandwani participate kaosal vikash mela as a training partnerPriyanka Nandwani
 
Carlos (David) Besinaiz Safety Certs page 1
Carlos (David) Besinaiz Safety Certs page 1Carlos (David) Besinaiz Safety Certs page 1
Carlos (David) Besinaiz Safety Certs page 1David Besinaiz
 
Feedback on the COSO Enterprise Risk Management 20160929 Final
Feedback on the COSO Enterprise Risk Management 20160929 FinalFeedback on the COSO Enterprise Risk Management 20160929 Final
Feedback on the COSO Enterprise Risk Management 20160929 FinalDarius Mayhew MCMI, SIRM, CSM
 
Vieux Carre'- A brand of absinthe
Vieux Carre'- A brand of absinthe Vieux Carre'- A brand of absinthe
Vieux Carre'- A brand of absinthe Narvis Kennel
 
Gato Persa: tudo que você precisa saber antes de comprar um
Gato Persa: tudo que você precisa saber antes de comprar umGato Persa: tudo que você precisa saber antes de comprar um
Gato Persa: tudo que você precisa saber antes de comprar umVinicius Nogueira
 
statistics first paper
statistics first paperstatistics first paper
statistics first paperJackie Nelson
 
MediaRadar_WhitePaper_DigitalPolitical_FIN.PDF
MediaRadar_WhitePaper_DigitalPolitical_FIN.PDFMediaRadar_WhitePaper_DigitalPolitical_FIN.PDF
MediaRadar_WhitePaper_DigitalPolitical_FIN.PDFJesse Sherb
 
байтакова адия будущее-конкуренты
байтакова адия будущее-конкурентыбайтакова адия будущее-конкуренты
байтакова адия будущее-конкурентыAdiya Baitakova
 

Viewers also liked (17)

Memoria ram infografia 5
Memoria ram infografia 5Memoria ram infografia 5
Memoria ram infografia 5
 
Plumber scottsdale
Plumber scottsdalePlumber scottsdale
Plumber scottsdale
 
Pravees kumar
Pravees kumar Pravees kumar
Pravees kumar
 
COMM 125 Portfolio
COMM 125 PortfolioCOMM 125 Portfolio
COMM 125 Portfolio
 
Вебинар на тему знакомство с Ansible. популярные практики и ошибки
Вебинар на тему  знакомство с Ansible. популярные практики и ошибкиВебинар на тему  знакомство с Ansible. популярные практики и ошибки
Вебинар на тему знакомство с Ansible. популярные практики и ошибки
 
Priyanka nandwani participate kaosal vikash mela as a training partner
Priyanka nandwani participate kaosal vikash mela as a training partnerPriyanka nandwani participate kaosal vikash mela as a training partner
Priyanka nandwani participate kaosal vikash mela as a training partner
 
Carlos (David) Besinaiz Safety Certs page 1
Carlos (David) Besinaiz Safety Certs page 1Carlos (David) Besinaiz Safety Certs page 1
Carlos (David) Besinaiz Safety Certs page 1
 
Feedback on the COSO Enterprise Risk Management 20160929 Final
Feedback on the COSO Enterprise Risk Management 20160929 FinalFeedback on the COSO Enterprise Risk Management 20160929 Final
Feedback on the COSO Enterprise Risk Management 20160929 Final
 
emotional eating
emotional eatingemotional eating
emotional eating
 
Vieux Carre'- A brand of absinthe
Vieux Carre'- A brand of absinthe Vieux Carre'- A brand of absinthe
Vieux Carre'- A brand of absinthe
 
Gato Persa: tudo que você precisa saber antes de comprar um
Gato Persa: tudo que você precisa saber antes de comprar umGato Persa: tudo que você precisa saber antes de comprar um
Gato Persa: tudo que você precisa saber antes de comprar um
 
statistics first paper
statistics first paperstatistics first paper
statistics first paper
 
MediaRadar_WhitePaper_DigitalPolitical_FIN.PDF
MediaRadar_WhitePaper_DigitalPolitical_FIN.PDFMediaRadar_WhitePaper_DigitalPolitical_FIN.PDF
MediaRadar_WhitePaper_DigitalPolitical_FIN.PDF
 
байтакова адия будущее-конкуренты
байтакова адия будущее-конкурентыбайтакова адия будущее-конкуренты
байтакова адия будущее-конкуренты
 
Practica 1
Practica 1Practica 1
Practica 1
 
Investigación excel
Investigación excelInvestigación excel
Investigación excel
 
Pseint ejemplos
Pseint ejemplosPseint ejemplos
Pseint ejemplos
 

Similar to FinalReport

Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesTELKOMNIKA JOURNAL
 
Applying User-behavior to Bandwidth Adaptations in Collaborative Workspace Ap...
Applying User-behavior to Bandwidth Adaptations in Collaborative Workspace Ap...Applying User-behavior to Bandwidth Adaptations in Collaborative Workspace Ap...
Applying User-behavior to Bandwidth Adaptations in Collaborative Workspace Ap...Cynthia Velynne
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENTNEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENTNEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENTNEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENTNEURAL NETWORKS FOR DIALOG GENERATIONgerogepatton
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
 
A Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningA Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningIRJET Journal
 
Mini-Project – EECE 365, Spring 2021 You are to read an
Mini-Project – EECE 365, Spring 2021   You are to read an Mini-Project – EECE 365, Spring 2021   You are to read an
Mini-Project – EECE 365, Spring 2021 You are to read an IlonaThornburg83
 
12EEE032- text 2 voice
12EEE032-  text 2 voice12EEE032-  text 2 voice
12EEE032- text 2 voiceNsaroj kumar
 
A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...ijma
 
A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...
A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...
A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...ijma
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learningIEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learningIEEEBEBTECHSTUDENTPROJECTS
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
Concurrency Issues in Object-Oriented Modeling
Concurrency Issues in Object-Oriented ModelingConcurrency Issues in Object-Oriented Modeling
Concurrency Issues in Object-Oriented ModelingIRJET Journal
 
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...IRJET Journal
 
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONSEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONcscpconf
 

Similar to FinalReport (20)

Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approaches
 
Applying User-behavior to Bandwidth Adaptations in Collaborative Workspace Ap...
Applying User-behavior to Bandwidth Adaptations in Collaborative Workspace Ap...Applying User-behavior to Bandwidth Adaptations in Collaborative Workspace Ap...
Applying User-behavior to Bandwidth Adaptations in Collaborative Workspace Ap...
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENTNEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENTNEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENTNEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENTNEURAL NETWORKS FOR DIALOG GENERATION
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
 
A Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep LearningA Review Paper on Speech Based Emotion Detection Using Deep Learning
A Review Paper on Speech Based Emotion Detection Using Deep Learning
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
Mini-Project – EECE 365, Spring 2021 You are to read an
Mini-Project – EECE 365, Spring 2021   You are to read an Mini-Project – EECE 365, Spring 2021   You are to read an
Mini-Project – EECE 365, Spring 2021 You are to read an
 
12EEE032- text 2 voice
12EEE032-  text 2 voice12EEE032-  text 2 voice
12EEE032- text 2 voice
 
A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...
 
A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...
A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...
A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learningIEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Scale adaptive dictionary learning
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
Concurrency Issues in Object-Oriented Modeling
Concurrency Issues in Object-Oriented ModelingConcurrency Issues in Object-Oriented Modeling
Concurrency Issues in Object-Oriented Modeling
 
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
 
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONSEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
 

FinalReport

  • 1. 1 Techniques of Audibly Representing Mathematical Formulae for People with Visual or Cognitive Disabilities Matt Smith Department of Electronics and Computer Science University of Southampton Abstract—The portrayal of math formula has always be well suited to a visual medium. Lines an brackets silently show the relationships between different parts of the equation and the limitations of their scope. Taking this structural data away leads to a great increase in the amount of data that must be sent to the user, which in turn greatly increases the amount of mental processing required. This is an issue that all visually impaired users face and much research has already been done into the best ways of audibly representing this information. This report will analyse the strengths and weaknesses of this strategies, as well as examing some existing software solutions. Index Terms—Computer Science, Assistive Technology, Audition, Mathematical Formulae, Visual Disabilities, Earcons, Spearcons, Prosody, ChromeVox. ! 1 INTRODUCTION MATHEMATICAL formulae can be very suc- cinctly portrayed in their printed form, allowing for non-temporal and unambiguous visuals. The spatial structure is easily conveyed by viewing the physical locations of the sym- bols relative to their surroundings with the medium acting as an extension of the users memory. Users with visual impairments are not able to use this extension and must have the spatial information audibly transmitted to their brains causing a vast increase in the cognitive workload. To reduce this effect, researchers have come up with a number of potential solutions for converting these symbols into sounds that represent them. This report will analyse their strengths and weaknesses whilst also looking into future possibilities. The report will be structured into the follow- ing sections:- An introduction to the problem; - Methods of audibly representing information; - Digital formats for representing mathematical data; - A description of the browser based tool ’ChromeVox’ with an evaluation of its perfor- mance; - A description of a desktop based tool ’Central Access Reader’; and finally a summary comparison of these systems along with a pre- sentation of conclusions and remaining issues. 2 PROBLEMS 2.1 Visual Structure x = −b ± √ b2 − 4ac 2a For a person with full mental and cognitive abilities, reading the above formula does not present a huge difficulty. Assuming some ba- sic knowledge of mathematical symbols. The reader knows instinctively from the visual lay- out that the square root only applies to the numerator of the fraction, that the denominator of the fraction applies not only to the contents of the square root, but the whole numerator, and that the initial negation only applies to the first variable b and not the entirety of the equation. The information displayed has no ambiguity due to the recognised visual conven- tions of mathematics, however once this area of information is lost, describing the equation becomes a much more difficult process. The
  • 2. 2 famous phrase ”a picture tells a thousand words” is very relevant here, as turning this visual data into audible information has proved to be a highly complex process, both for the developer and the user. 2.2 Digital Representation It is in the nature of humans to design a system that they believe would be easy to use without necessarily taking into account the disabilities of others who may use it. As such, this leads to the development of formats and specifica- tions which limit usability. A good example of this is in Wikipedia, which displays all maths equations as rasterised images, with the LATEX mark-up representation stored in the images alt tag [1]. Listing 1: LATEX representation of the quadratic formula x = frac{-b pmsqrt{bˆ{2}-4ac}}{2a} While the information is still there in some form, this mark-up is not user-friendly, requir- ing listeners to understand a language built for a computer. Digital formats are required that are capable of retaining the meaning of the content while still being understandable by a user without additional learning. 2.3 Visual Walkthroughs The final problem to be discussed in this report is that of stepping through an equation. Maths formulae can reach an infinite level of complex- ity, with each level providing more data. As such, is can be very helpful to breakdown the equation into its component parts and visually identify which are being spoken at the current time. In many cases this is not possible due to the digital representation issues mentioned previously, further increasing the need for a more standardised format. 3 AUDIBLE FORMATS There are a number of ways of presenting information audibly, each with its own benefits. This section will briefly explain the different methods and determine the best situations in which to use them. 3.1 Lexical Cues Lexical cues refers to the use of terms to audibly identify the start and end of spatial structures, for example ”bfrac” to represent the beginning of a fraction block and ”efrac” to represent the exit from that block. This process entirely removes any ambiguity from the audio stream, however it severely increases the cogni- tive load on the user and very quickly becomes confusing upon the utterance of more complex equations [2]. There is also a situation known as the Suffix Effect which shows that statements ending in an utterance can detract from a users ability to remember what they have just heard [2]. 3.2 Dynamic Sonic Trajectories Dynamic Sonic Trajectories (DST) is the use of directional sound to create sonic shapes that can be cognitively processed rather than per- ceptually. This potentially allows the sounds to be perceptually shifted around to allow the user to hear its spatial structure. However, studies have shown this additional audio in- formation to have detrimental effects on the cognitive processing of the maths due to its increased mental workload of identifying the sonic trajectories [3]. It has also been shown that auditory localisation is particularly inac- curate across the vertical dimension and highly difficult to synthesise [2]. While these have shown that linear spatial mapping from visual to audio is not hugely useful, it is still valid that sounds from differ- ent locations are easy to distinguish from one another and thus can be used to provide ad- ditional audio cues that complement the main information stream [3]. 3.3 Earcons, Spearcons and Auditory Icons One method of indicating spatial structures using audition is to cognitively link each con- struct component to a sound. These sounds can take a number of formats.
  • 3. 3 3.3.1 Earcons Earcons use synthesised sounds, such as that of the Windows error sound, to represent different components. As these sounds are synthesized they can easily be altered and varied to convey similar meanings through tone or pitch [2]. While Earcons have been shown to be effective [4], they do require the user to first learn all their meanings and thus, a standard must be built to unify all of these sounds . 3.3.2 Spearcons Spearcons have been developed more recently, consisting of a time-compressed spoken phrase. This compression results in a phrase that lies between recognisable speech and abstract sound. Studies have shown that these are easier to learn and can result in an increase in perfor- mance in interactive interface tasks [2]. This is mostly due to the fact that, as they lay outside of the standard speech channel, it allows for parallel information processing [5]. 3.3.3 Auditory Icons Similar to Earcons, Auditory Icons are made with the intention of sounding like the ob- ject that they are attempting to represent. This form of synthesis causes the sounds to be harder to manipulate, but much easier for the user to identify, thus decreasing the mental load [2]. However, it is highly ambiguous to make claims on what a mathematical character would sound like to any individual, making it a much more complex process to create the sounds. 3.3.4 Hybrid Sounds Some of the previously explained methods can be combined into a new format that can poten- tially allow for the strengths of one method to overcome the drawbacks of the other, resulting in an overall more powerful system. An exam- ple of this is the combination of Earcons with Auditory Icons [5]. 3.3.5 Sized Hybrids Additional indicators can be added to hybrid sounds to give an impression of the size of an object. This can take the form of an extra audible note or short melody that differs in pitch or duration [5]. 3.4 The efficiency of these methods A research experiment into the ease of recogni- tion and learning times, for each of these meth- ods, was performed with relatively strong re- sults. The test involved the selection of a num- ber of environmental features which then had Earcons, Spearcons, Auditory Icons, Speech, Earcon-Icon hybrids and Sized Hybrids created for them. Users were first given a training period to learn the selection and then tested to match the sound with a grid of possible mean- ings. The results of this showed that spearcons are as easy to learn as speech with identical performance across the experiment. For the majority of cases only a single cycle of training was required to allow the user to achieve a test with little to no error. The results also showed that spearcons represent data faster making them more efficient than speech and thus leaving the speech channel open allowing for two simultaneous streams of data [6]. An- other study supports these conclusions, with users showing a preference for spearcons over speech, indicating that the sounds were more recognisable and did not slow down the flow of data [7]. 3.5 Prosody Prosody is the use of varied patterns in vo- cal pitch and intonation to provide additional meaning to a word or phrase. It can also refer to changes in parameters such as the length of pause between words and the overall speaking rate [8]. Existing systems have attempted to use prosodic schemes to convey the spatial mean- ing of an equation with promising results and it is believed that this method is a highly suitable solution due to its reduced cognitive workload, in comparison to lexical cues, and intuitive understanding [9]. There is however a drawback that this solution will not work alone with highly complex equations consisting of nested structures.
  • 4. 4 This scheme has been implemented into a form of markup language as Speech Synthe- sis Markup Language (SSML) which shall be discussed later. 4 DIGITAL FORMATS 4.1 MathML MathML has become the W3C standard for online mathematical markup. It comes in two major formats: Presentation MathML and Con- tent MathML. Presentation MathML is pre- dominantly for correct visual formatting whilst Content ensures that semantic correctness is maintained [10] Listing 2: MathML Code Sample of πr2 <math> <mrow> <mi>pi</mi> <msup> <mi>r</mi> <mn>2</mn> </msup> </mrow> </math> A good example where this distinction is important is when the three superscript terms: A2 , f−1 and AT are used. While these represent an exponent, a function reversal and a matrix transposition, respectively, they are all format- ted exactly the same way using MathML, mak- ing is impossible to directly differentiate one type of equation from another [8]. By using Content MathML, more specific mathematical markup is used to allow the actual meaning of the maths to be maintained throughout and be distinguishable for digital translation. How- ever, there are two issues present that make Presentation MathML to be the more popular format at this time. The first being that Content MathML is only capable of representing basic maths, and the second being that the majority of today’s mathematical web publications are written using Presentation MathML so comply- ing to a more recognised standard will allow for a more globalised usability [11]. While the need for MathML has been long- standing, it is still not natively supported in all web browsers [12]. It has, however, recently been built into the WebKit browser develop- ment and is included as part of the HTML5 specification. For all the browsers that do not yet support it natively, there is still the option to display it through the use of a plugin called MathJax. 4.2 MathJax MathJax functions as a universal plugin, for browsers, that automatically converts MathML into standard HTML markup. This removes any issues with browser compatibility as the capability to understand the MathML is removed from the situation. However, the markup is much less comprehensible as a mathematical equation, as can be seen in figure 1. MathJax is also usable on mobile devices, to the extent that it is now built into many e- Readers so that they may correctly display any MathML embedded within e-publications [14]. 4.3 Speech Synthesis Mark-up Language Another format used for TTS in complex vi- sual structures is the Speech Synthesis Mark- up Language (SSML). SSML is based on Java Speech Mark-up Language (JSML) from Sun Microsystems and is designed to more easily al- low the insertion of readable prosody schemes directly into standard HTML. Listing 3: SSML Code Sample of πr2 <speak version="1.0" xml:lang="en-US"> pie <break strength="medium"/> <prosody rate="fast"> r <break strength="x-weak"/> <prosody pitch="high"> squared </prosody> </prosody> </speak> SSML allows for the prosodic properties to be embedded directly into the document; these are then read by a SSML interpreter which combines them into a waveform appropriate for a screen reader [15]. While this is a viable system for general use, and the reading of web pages, it has stated in its documentation that it has not been developed for use with mathematical formulae and has that as a future extension. However [15] was written in 1997
  • 5. 5 Fig. 1: MathJax Markup for displaying the quadratic formula [13] and has not progressed a great deal to this date, meaning this is unlikely to soon become a viable alternative as a web standard. 5 CHROMEVOX This section will briefly cover the functionality of the Chrome browser plugin, ChromeVox, and then analyse its performance relative to the initial problem areas mentioned in this report. Google’s ChromeVox is a prime candidate for future innovation in the area of web accessibil- ity. It provides an all round platform for web access, covering general text-to-speech (TTS), various earcons for HyperText Markup Lan- guage (HTML) elements, prosody and granular interaction. ChromeVox works by providing a persistent background service which then dynamically injects element properties into the document object model of the page. These properties represent the behaviours related to the audio and visual feedback appropriate for that spe- cific screen element, including such features as prosody cues, alternative tags, element order- ing and highlighting rules. Additional elements are also inserted over the top of the current page to represent extra User Interface (UI) com- ponents [16]. ChromeVox uses a combination of regular TTS and Earcons in its navigation process. Upon switching focus to a new object or sec- tion, an earcon will sound, accompanied by the spoken name of that section. Different earcons are used to represent different situations, such as entering or leaving a list, selecting a link, etc. They are also used to represent different events that occur within the browser, e.g. when the page is fully loaded and ready for interaction or when a dialogue box appears. One of the distinguishing features of ChromeVox, is in its varying levels of navigational granularity. Due to the intricacies of modern web pages, screen readers can be forced to go through a large number of elements before the user reaches the position they desire. This then also requires the user to sift through an unnecessary amount of audible data, increasing their workload. By providing multiple granularities, the user can choose how deep the page is read at any one time, allowing for navigation more akin to that of visual navigation, where a user scans larger groups of objects to find the specific section they want, and then focuses in on the detailed elements contained within it. Listing 4: HTML Object level example This is <span>a</span> Test Specifically, ChromeVox provides five levels of granularity, with those being: Group, Ob-
  • 6. 6 ject, Sentence, Word and Character. While the last three are self-explanatory the other two are more ambiguous. The Group level repre- sents a section of content where the elements have been considered heuristically related. This could be a HTML paragraph tag or a division tag sectioning a part of the document. Finally, the Object level considers each HTML tagged element as a separate item, where the example shown in Listing 4 would be divided into three distinct objects and navigated through individ- ually [17]. Fig. 2: Group level selection Another key feature related to the initially mentioned problems is that of walking through equations. Due to these levels of granularity, it is made possible to walk through equations written with the MathML mark-up, however these walkthroughs have varying degrees of success and still remain an issue. Fig. 3: Object level selection, stepping into the equation Fig. 4: Erroneous highlighting of equation Figures 3 and 4 show errors in the high- lighting that occur while stepping through a MathML example of the quadratic formula. The elements of the equation appear almost randomly grouped with the −b± existing as one entity, the second b being separate from its exponent and the line representing the frame of the square root’s contents being selected multiple times after having tabbed through the content itself. Upon selection at a group level, the equation is read out almost flawlessly. Prosody is used to indicate the separation of levels in the equation, through pitch changes in the voice to show when transitioning to a numerator or denominator, as well as pauses to indicate the cohesion of different parts. In this case, the only error was in the pronunciation of the variable a. However, when walking through the equation the spoken text becomes ”x equal minus b plus or minus b two minus fourack overline overline overline square root two a” This is clearly not close to the desired out- put and would leave a visually impaired user with little idea of the true equation. The tech- nique for this walkthrough process is defi- nitely present within the application, however it requires refining to allow for the optimum experience. The finer levels of granularity do allow you to select each individual element of the equa- tion, however, at this depth they lose all their relative meaning and any prosody rules that were previously applied. Exponents become standard numbers and lose any pitch associ- ated with their vertical positioning and any recognition of the location within a fraction is no longer conveyed. While it may be more au- dibly clear to hear everything uniformly when stepping through slowly, the structural infor- mation is still highly relevant when the user is visually impaired meaning it is appropriate to retain all modifiers, even when at the finest level of granularity. While ChromeVox does function well in many of its key areas, granular stepping of maths equations still falls short of the require- ments for impaired users.
  • 7. 7 6 CENTRAL ACCESS READER (CAR) Another recent addition to the market is that of the free, open source application from Central Washington University. This software covers the issues of reading text and mathematical formulae from word documents or simply by copying text directly into the interface. It is capable of dual highlighting of the read text, in that it highlights the area of text it is reading as well as the individual word, however, is not able to highlight the individual components in equations when there is more than a single level of vertical structural complexity. It pro- vides multiple settings for the reading of maths allowing it to read different types of equations in different ways. Upon testing of its mathematical reading abilities it is perfectly capable of reading the quadratic equation when supplied via a Word document, giving the output of: ”x equals begin fraction negative b plus or minus the square root of b squared minus four a c over two a end fraction” However, when attempting to copy and paste the equation in, either directly from Mircrosoft Word, a PDF or as MathML it does not function correctly and either fails to display the copied text, displays incorrect symbols, or inserts bars as square root markers respectively. While the system seems proficient when it comes to reading text, and definitely capable of reading mathematics when in the form of a Word document, this is not a highly common format to receive formulae in and would re- quire an additional level of conversion to place it in the appropriate format before it could be used. This makes the system inefficient and not ideal for users who are already impaired. 7 SYSTEM COMPARISON Having now tested both systems with math- ematical formula in various forms, each has shown its own strengths. Central Access Reader has opted for the more verbose option of narratively describing the structure through the use of begin and end statements. This takes more time, increasing the overall time taken to convey the information to the user and making them work harder. However it is a more implicit way that requires no additional learning of prosodic methods. ChromeVox has instead opted for these prosodic components, using pitch change and pauses to convey extra information without taking extra time. It also uses earcons and spearcons to represent struc- tural data in a faster, more recognisable way that does not block the speech channel in the process. While both have standard textual highlight- ing systems, ChromeVox’s ability to adjust granularity allows the user to alter their experi- ence as they require a more shallow or deeper perception of the content. This potentially al- lows for precise stepping through of maths equations, however this has been shown to not yet be the case. A feature of CAR not present in ChromeVox is that of the ability to read files. Equally, ChromeVox is capable of directly reading web pages while CAR requires that the contents be copied into its interface. Finally, CAR also has the ability to output its content as an MP3 file, allowing the user to listen to it at a later date on another device without the software installed. This allows for an additional level of usability that ChromeVox cannot match, however provides no direct ben- efit other than being able to listen to documents when no computer is available. Feature ChromeVox CAR Prosody Syntax Highlighting Formula Highlighting File Reading Browser Reading MP3 Output
  • 8. 8 8 REMAINING ISSUES 8.1 Syntax higlighting in equations While this issue has been somewhat covered in Google’s ChromeVox, it is still clear from the demonstration that the system is not ideal, and stepping between different levels of the equation’s structure can still be a complex and confusing task for users. 8.2 How should numbers be spoken? While the issue of saying numbers may seem trivial in comparison to the issues of convey- ing memorable structure, the conventions used must also be considered and maintained [18]. This issue has not been discussed in this report, however, it will ultimately affect the process of developing a standardised math reading system and is another case where agreeing on a specification is necessary. 8.3 Limitations of MathML While MathML has become the standard for- mat for online mathematical displays, it is still not able to convey all possible equations [19]. It is also contradicting in the way that Pre- sentation MathML is the most commonly used mark-up while also being the least accessible form. This will most likely remain the case, however, until Content MathML is capable of defining a greater range of formulae. 9 CONCLUSION In this report, the techniques with the high- est beneficial impact have been discussed and comparisons on a number of the most promi- nent systems have been made to show where existing systems still fall short. Due to the current trends in web technology growth and the undeniable strength of cloud computing services, a web based system does have merit as being the strongest candidate for people with disabilities. However, these people must live with this difficulty everywhere they go, not simply in places where there is an active internet connection, making a stand-alone ap- plication much more preferable in terms of usability. In terms of performance, based on the explained issues, the ChromeVox plugin for Google web browser has shown itself to be more proficient at the tasks and to be, overall, more convenient to use. However, it is not a perfect system, with the walking through of equations still requiring work to make it usable by those reliant on it being accurate. REFERENCES [1] J. F. S. Lveda and L. Ferres, “Improving accessibility to mathematical formulas: the wikipedia math accessor,” New Review of Hypermedia and Multimedia, 2012. [2] E. Bates and D. Fitzpatrick, “Spoken mathematics using prosody, earcons and spearcons,” tech. rep., Dublin City University, 2010. [3] A. Hollander and T. Furness, “Perception of virtual au- ditory shapes. in: Proceedings of the international confer- ence on auditory displays,” tech. rep., , 1994. [4] R. D. Stevens and A. D. N. Edwards, “An approach to the evaluation of assistive technology,” tech. rep., University of York, 1996. [5] Learnability of Sound Cues for Environmental Features: Audi- tory Icons, Earcons, Spearcons, and Speech, 2008. [6] E. Gellenbeck and A. Stefik, “Evaluating prosodic cues as a means to disambiguate algebraic expressions: An em- pirical study,” tech. rep., Central Washington University, 2009. [7] E. Murphy, E. Bates, and D. Fitzpatrick, “Designing au- ditory cues to enhance spoken mathematics for visually impaired users,” tech. rep., Dublin City University, 2010. [8] E. G. et al, “Speaking mathml: Using prosody and context- sensitive inferences to produce synthesized speech,” tech. rep., , 2005. [9] R. D. Stevens, Principles for the Design of Auditory Interfaces to Present Complex Information to Blind Computer Users. PhD thesis, University of York, 1996. [10] R. A. et al, “Mathematical markup language (mathml) version 3.0,” tech. rep., W3C, 2010. [11] H. Ferreira and D. Freitas, “Audiomath using mathml for speaking mathematics,” tech. rep., University of Porto, 2005. [12] Wikipedia, “Mathematical markup langugaes (mathml).” http://en.wikipedia.org/wiki/MathML. [13] V. Sorge, “Accessibility to scientific material: The case of speaking math,” tech. rep., The University of Birming- ham, . [14] D. Cervone, “Mathjax: A platform for mathematics on the web,” Notices of the AMS, vol. 59, no. 2, pp. 312–316, 2012. [15] P. Taylor and A. Isard, “Ssml: A speech synthesis markup language,” Speech Communication, vol. 21, no. 12, pp. 123 – 133, 1997. Speak! [16] Google, “Chromevox source code,” 2012. https://code.google.com/p/google-axs-chrome/. [17] T. V. R. et al, “Chromevox a screen reader built using web technology,” tech. rep., Google Inc, 2012. [18] R. Fateman, “How can we speak math?,” tech. rep., University of California, 2013. [19] K. Kofler, P. Schodl, and A. Neumaier, “Limitations in content mathml,” tech. rep., University of Vienna, 2009.