Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos III de Madrid
This document discusses using natural language processing (NLP) to help make text more accessible according to WCAG 2.0 and E2R guidelines. It presents NLP approaches like language detection, abbreviation detection, and topic detection that could help with text simplification. A proof of concept prototype simplifies drug package leaflets by replacing complex medical terms with simpler synonyms. The document concludes that NLP can provide semi-automatic support for making text more readable and understandable as required by accessibility guidelines.
Similaire à Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos III de Madrid
Similaire à Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos III de Madrid (20)
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Universidad Carlos III de Madrid
1. Exploring language technologies to provide
support to WCAG 2.0 and E2R guidelines
Lourdes Moreno * Paloma Martínez, Isabel Segura-Bedmar
and Ricardo Revert
Grupo LaBDA
Departamento de Informática
Universidad Carlos II de Madrid
(*) lmoreno@inf.uc3m.es
Vilanova I la Geltrú (Universitat Politècnica
Catalunya ), septiembre 2015
Reference ACM Digital Library:
http://dl.acm.org/citation.cfm?id=2829927&CFID=573822944&CFTOKEN=54544041
2. Contents
• Motivation and introduction
• EASY-TO-READ (E2R) Guidelines
• WCAG 2.0: readability and understandability
• Natural language processing (NLP) approaches for text
simplification
• Proof of Concept: Lexical Simplification of Drug Package
Leaflets
• Conclusions
LaBDA, Universidad Carlos III de Madrid
3. MOTIVATION
• Part of citizenship faces accessibility barriers when texts
containing:
long sentences
unusual words
complex linguistic structures
…
• Environment: web content
• Readability and understanding should be considered when
texts are created
LaBDA, Universidad Carlos III de Madrid
4. INTRODUCTION
Target groups
• People with cognitive or learning
disabilities
• Also:
Pre lingually deaf persons
Older people (Individual cognitive
abilities such as attention span and
memory)
Non-alphabetized people
Immigrants (different native language)
People with aphasia, dyslexia, autism
LaBDA, Universidad Carlos III de Madrid
5. INTRODUCTION
Initiatives
• Easy-to-Read (E2R)
Inclusion Europe 2009
Guidelines of IFLA 2010
• Web Content Accessibility Guidelines (WCAG) 2.0
Regulatory framework
Hard Success criteria
Conformance level AA
LaBDA, Universidad Carlos III de Madrid
6. EASY-TO-READ (E2R) Guidelines
• In general terms these guidelines are:
Use simplest and most common words
Avoid long words
Avoided use of abbreviations
The same term used to refer to the same concept
Use short sentences
Avoid complex sentences with dependent clauses
Use active language and avoid passive voice
LaBDA, Universidad Carlos III de Madrid
7. EASY-TO-READ (E2R) Guidelines
What can be done?
• To make online texts more accessible and readable
• In complex words or phrases are replaced with more
commonly used words
• These adaptations are carried out with the use of text
simplification techniques:
www.noticiasfacil.es
www.e-include.info/
simple.wikipedia.org/
www.simplext.es/
• Manual process? In some cases it is unfeasible
• Support Technology
LaBDA, Universidad Carlos III de Madrid
8. EASY-TO-READ (E2R) Guidelines
• These E2R guidelines are aimed only to text content.
• In addition: page structure, presentation, …
=> For this reason, accessibility requirements of WCAG 2.0
must be taken into account
LaBDA, Universidad Carlos III de Madrid
9. WCAG 2.0: READABILITY AND UNDERSTANDABILITY
understandable vs readability
“a text could be highly readable, since the syntax is extremely
simple, but extremely hard to understand because of the lexicon
used”
Readability gives an evaluation about the structure of
sentences (it concerns syntax and consequently requires
syntactic simplification approaches)
understandability captures the lexical aspects and lexical
simplification approaches are required
LaBDA, Universidad Carlos III de Madrid
10. WCAG success criteria
concerning text
• 3.1 (Readable: Make text
content readable and
understandable)
Readability - 3.1.5 (Reading
Level)
Understandable - 3.1.3
(Unusual Words) and 3.1.4 (
Abbreviations)
Code(Level
Conformance)
Description
1.1.1 Non-text
Content (Level A).
Every non-text content that is presented to the
user has a alternative text that serves the
equivalent purpose
2.4.2 Page Titled
(Level A).
Web pages have titles that describe topic or
purpose.
2.4.4 Link Purpose (In
Context):
(text type)
The purpose of each link can be determined
from the link text alone or from the link text
together with its programmatically determined
link context
2.4.6 Headings and
Labels (Level AA).
Headings and labels describe topic or purpose.
2.4.9 Link Purpose
(Link Only) (Level
AAA).
(text type)
A mechanism is available to allow the purpose
of each link to be identified from link text
alone, except where the purpose of the link
would be ambiguous to users in general.
2.4.10 Section
Headings (Level
AAA).
Section headings are used to organize the
content.
3.1.1 Language of
Page (Level A).
The default human language of each Web page
can be programmatically determined.
3.1.2 Language of
Parts (Level AA).
The human language of each passage or phrase
in the content can be programmatically
determined.
3.1.3 Unusual Words
(Level AAA).
A mechanism is available for identifying
specific definitions of words or phrases used in
an unusual.
3.1.4 Abbreviations
(Level AAA).
A mechanism for identifying the expanded
form or meaning of abbreviations is available.
3.1.5 Reading Level
(Level AAA).
When text requires reading ability more
advanced than the lower secondary education
level after removal of proper names and titles,
supplemental content, or a version that does not
require reading ability more advanced than the
lower secondary education level, is available.LaBDA, Universidad Carlos III de Madrid
11. WCAG 2.0: READABILITY AND UNDERSTANDABILITY
Additional accessibility requirements
• WCAG 2.0 document does not specify guidelines to these matters as
concerning visual or auditory accessibility
• A set of additional WCAG 2.0 success criteria has been obtained regarding
the presentation, navigation, structure, cognitive aspects in user task,…
• Some of these additional success criteria are:
1.4.8 (Visual Presentation)
2.2.3 (No Timing)
2.4.5 (Multiple Ways)
3.2.3 (Consistent Navigation)
3.2.4 (Consistent Identification)
2.2.3 (No Timing)
3.3.1 (Error Identification)
3.3.2 (Labels or Instructions)
3.3.5 (Help)
LaBDA, Universidad Carlos III de Madrid
12. WCAG 2.0: READABILITY AND UNDERSTANDABILITY
Discussion and conclusions
• No correspondence between concepts in E2R guidelines and
success criteria of WCAG 2.0
=> The professional closely to the field of the accessibility
conformity WCAG does not know how to accomplish
requirements E2R
• Aside from WCAG 2.0 regarding the text, further accessibility
features should be considered
• WCAG 2.0 support is not enough
• Technology supporting the authorship of texts is required
LaBDA, Universidad Carlos III de Madrid
13. WCAG 2.0: READABILITY AND UNDERSTANDABILITY
Discussion and conclusions
• Proposal:
PLN approaches with a use of E2R and WCAG 2.0 resources
provide the semi-automatic support
Different NLP strategies to simplify texts depending on
whether you want to analyse understandable or
readability
LaBDA, Universidad Carlos III de Madrid
14. Natural language processing (NLP)
• The discipline devoted to develop technology to understand
natural language
• Applications:
Machine translation
Information retrieval
Information extraction from unstructured data
Summarization
Question answering
….
LaBDA, Universidad Carlos III de Madrid
15. NLP APPROACHES FOR TEXT SIMPLIFICATION
Support to accessibility
• NLP processes are applied with the objective of transforming
a text in an equivalent one, but more accessible to people
with any kind of cognitive disability
• Three NLP processes that could be applied to text
simplification tasks are described:
Language detection
Abbreviations detection
Topic detection
LaBDA, Universidad Carlos III de Madrid
16. NLP APPROACHES FOR TEXT SIMPLIFICATION
Language detection
• Language detection consists on identifying the language of a
text
• It is helpful for example: when screen readers are used
• Approaches:
To find out it is to check if language-specific characters, (e.g. Dutch if
string “ik” appears, German is “ich” or “β” is used, Polish if “czy” or
“ń”, “Ł”, “ź” are included in words)
To use n-grams frequency distributions. All languages have words that
occur more frequently than others (Zipf´s Law)
• if two texts of a same language are compared then they should
have similar n-grams frequency distributions)
LaBDA, Universidad Carlos III de Madrid
17. NLP APPROACHES FOR TEXT SIMPLIFICATION
Abbreviations
• Approaches to recognized abbreviations and corresponding
expansions:
Pattern-matching methods based on rules and heuristics to
detect upper alphanumeric strings
• To identify Long form (short form) or Short form (long form)
A sequence of words co-occurs frequently with an
abbreviation and the sequence does not occur with other
near words => it is an “abbreviation-definition”
relationship.
LaBDA, Universidad Carlos III de Madrid
18. NLP APPROACHES FOR TEXT SIMPLIFICATION
Text summarization or topic detection
• Goal : to obtain a set of sentences that reflects the content
• This technique offers accessibility support to editors of web
contents to create:
Titles of paragraphs
Sections that faithfully represent the content
• Approach:
Automatic text extraction: considering relevant sentences
of a text has a big amount of important words
The importance of a word is calculated with a measure
that relies on how frequent is a word in a document and in
how many documents from a collection the word appears.
LaBDA, Universidad Carlos III de Madrid
19. NLP APPROACHES FOR TEXT SIMPLIFICATION
Text Simplification
• It is essential in several types of texts: News, Government and
administrative information, laws and rights, etc.
• There are three subtasks of text simplification
1 Syntactic simplification that divides complex sentences in
simplest sentences
2 Lexical simplification whose objective is to replace
complex vocabulary by common vocabulary
3 Clarification that provides definitions and explanations.
These tasks are not completely automatic, they have to be
manually reviewed in some cases.
LaBDA, Universidad Carlos III de Madrid
20. NLP APPROACHES FOR TEXT SIMPLIFICATION
Text Simplification
Lexical simplification:
• Replacing words (taking into account the context) and
complex utterances by easier words or phrases.
• Heuristic: complex words have a low frequency
• Proposals based on frequency give better results compared to
other sophisticated systems [Semeval 2012]
• Resource: lexical resources as Wordnet are used to extract
synonyms as candidates to replace a complex or difficult
word.
LaBDA, Universidad Carlos III de Madrid
21. NLP APPROACHES FOR TEXT SIMPLIFICATION
Text Simplification
Lexical simplification
• Complexity measures: frequency of words in texts as well as
the length of phrases
FOX index
Flesch-Kinaid
These indexes have to be validated by final users
LaBDA, Universidad Carlos III de Madrid
22. NLP APPROACHES FOR TEXT SIMPLIFICATION
WCAG 2.0 PLN Approach
2.4.2 (Page Titled)
2.4.6 (Headings and Labels)
2.4.10 (Section Headings)
Text summarization
3.1.4 (Abbreviations ) Abbreviations
3.1.3 (Unusual Words) Dictionaries with definition
3.1.5 (Reading Level) Syntactic simplification
LaBDA, Universidad Carlos III de Madrid
23. PROOF OF CONCEPT
Lexical Simplification of Drug Package Leaflets
• The principal text source of information for patients
• This document provides information about a its
appearance, actions, side effects and drug
interactions, contraindications, special warnings
• It is difficult to understand by patients:
Vocabulary is specific, technical.
Long paragraphs, especially those containing lists of
side effects.
Using a small font size (9 points)
• Problems: Patient misunderstanding could be a
potential source of medication errors and adverse
drug reactions.
LaBDA, Universidad Carlos III de Madrid
24. PROOF OF CONCEPT
Lexical Simplification of Drug Package Leaflets
• Goal of the system:
Provide information in an easy and clear way to read.
• Medical terms (in particular, drug effects) are translated into
lay terms, which patients can understand.
LaBDA, Universidad Carlos III de Madrid
25. PROOF OF CONCEPT
Lexical Simplification of Drug Package Leaflets
FIRST Module:
Named Entity Recognition
(NER)
• Detects the mentions of
drug effects
• Use MedDRA (medical
multilingual terminology
dictionary about events
associated with drugs )
• MeaningCloud integrates
MedDRA, into GATE
LaBDA, Universidad Carlos III de Madrid
26. PROOF OF CONCEPT
Lexical Simplification of Drug Package Leaflets
SECOND module:
Lexical Simplifier
• To Identify the effects whose
names are considered
complex with the objective
of replacing them by a
simpler synonym
• Two different strategies:
preferred term substitution
and most frequent term
substitution.
LaBDA, Universidad Carlos III de Madrid
27. PROOF OF CONCEPT
Lexical Simplification of Drug Package Leaflets
SECOND module. Lexical Simplifier
• Preferred Term Substitution
MedDRA allows to defining sets of synonyms and providing
a preferred term for each set
• Cefalalgia (cephalalgia) would be substituted for cefalea
(headache)
LaBDA, Universidad Carlos III de Madrid
28. PROOF OF CONCEPT
Lexical Simplification of Drug Package Leaflets
SECOND module. Lexical Simplifier
• Most Frequent Term Substitution
Corpus of MedlinePlus website documents (1,536 documents)
• 939 belonging to drug package leaflets
• 597 to general health related articles about diseases, effects and
diagnoses.
Elasticsearch to index the MedLinePlus documents
Hypothesis: complex terms should be less frequent than simpler terms
in the corpus
1) The frequency of each effect in the corpus is calculated
2) an effect will be substituted for its synonym with the highest
frequency (if it is not itself) in the corpus.
LaBDA, Universidad Carlos III de Madrid
29. PROOF OF CONCEPT
Lexical Simplification of Drug Package Leaflets
SECOND module. Lexical Simplifier
Synonyms from MedDRA appear in
MedLinePlus
corpus
catarro (nasopharyngitis), 12
resfriado (cold), 48
resfriado común (common
cold)
7
síntomas de resfriado (cold
symptoms)
6
The complex term
replaced by resfriado
(cold)
LaBDA, Universidad Carlos III de Madrid
30. PROOF OF CONCEPT
Lexical Simplification of Drug Package Leaflets
SECOND module. Lexical Simplifieroriginal
Muy frecuentes: diarrea e indigestión.
Frecuentes: náuseas, vómitos, dolor abdominal.
Poco frecuentes: hemorragia.
Raros: perforación gástrica, flatulencia, estreñimiento
PT
Muy frecuentes: diarrea e dispepsia.
Frecuentes: náuseas, vómitos, dolor abdominal.
Poco frecuentes: hemorragia.
Raros: perforación gástrica, flatulencia, estreñimiento
freq
Muy frecuentes: diarrea e pirosis.
Frecuentes: náuseas, vómitos, dolor abdominal.
Poco frecuentes: sangrado.
Raros: perforación gástrica, gases, estreñimiento
LaBDA, Universidad Carlos III de Madrid
31. CONCLUSIONS
• For some people, it is difficult to infer the meaning of an unusual
word or phrase from context
• Long sentences and complex linguistic structures can cause barriers
in access to the text content as indicated in WCAG and E2R
guidelines
However, these guidelines do not provide precise methods and
support (semi) automatic with which to address these
accessibility issues concerning to text readable and
understandable
• PLN approaches with a use of E2R and WCAG 2.0 resources provide
the semi-automatic support
Proof of concept: Prototype to simplify drug package leaflet that
implements a component for lexical simplificationLaBDA, Universidad Carlos III de Madrid
32. CONCLUSIONS
Work in progress
• New approaches to offer support: abbreviations, summaries,
definitions of unusual words, etc.
• Evaluations by users (In addition, by experts)
• Taking into account other important issues as:
Presentation elements
Page structure
Navigation structures
LaBDA, Universidad Carlos III de Madrid
33. REFERENCE
Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and
Ricardo Revert. 2015. Exploring language technologies to provide
support to WCAG 2.0 and E2R guidelines. In Proceedings of the XVI
International Conference on Human Computer Interaction
(Interacción '15). ACM, New York, NY, USA, , Article 57 , 8 pages.
DOI=http://dx.doi.org/10.1145/2829875.2829927