This document discusses accessibility, internationalization, localization, and character sets. It begins by defining each term and explaining their connections. It emphasizes that accessibility is important for many users, not just those with disabilities. It provides guidelines for making content accessible and discusses tools for internationalization and localization in WordPress like loading text domains and generating POT files to enable translations.
Similar to A11Y? I18N? L10N? UTF8? WTF? Understanding the connections between: accessibility, internationalization, localization, and character sets (long version)
The Guide to becoming a full stack developer in 2018Amit Ashwini
Ā
Similar to A11Y? I18N? L10N? UTF8? WTF? Understanding the connections between: accessibility, internationalization, localization, and character sets (long version) (20)
A11Y? I18N? L10N? UTF8? WTF? Understanding the connections between: accessibility, internationalization, localization, and character sets (long version)
1. A11Y? I18N? L10N? UTF8? WTF?
Understanding the
connections between:
accessibility,
internationalization,
localization,
and character sets
Michael Toppa
@mtoppa
WordCamp Nashville
May 3, 2014
2. About meā¦
* Iāve been developing for the web since the days of HTML 1.0, when web pages were ļ¬rst painted on cave walls.
* This is my 7th WordCamp presentation, and I have 7 plugins at wordpress.org, dating back to 2006.
* I was previously the Director of Development for WebDevStudios. One of my assignments while there was managing the WordPress VIP project for NBC
Latino.
* Iāve also managed the 16 person web application team at the U Penn School of Medicine, and I previously worked at Stanford, Georgetown, Ask
Jeeves, and E-Trade.
3. Iām now working at PromptWorks, a small consultancy in Philadelphia. We do a lot of work in Ruby, Rails, JavaScript, and infrastructure automation. In
addition to building products for our clients, we work closely with them, and pair program with them when possible. So that when we leave, they can
continue on the path using TDD and good object oriented design.
4. Accessibility, internationalization, and character sets are normally presented as separate, distinct topics. But I see them as strongly interconnected, and so
in this talk Iām going to discuss all of them, with a focus on how they relate to each other. This talk is by no means comprehensive, as they are each big
topics. My goal is start you thinking about how to make your web content more accessible to people with varying levels of ability using the web, and who
speak different languages.
5. Accessibility (A11Y)
Red, yellow, and green all look yellowish to many color-blind people. So how do they understand street signal lights?
They pay attention to the order instead. The signal lights communicate in two ways.
6. Accessibility (A11Y)
Like the signal lights, this wheel chair ramp is a good example of incorporating accessibility into a design without it making things ugly or seeming like an
afterthought.
7. Why bother?
Before going any further, why should you spend time worrying about any of this? If youāre just making a web site for a small business here in Nashville, why
spend time coding for accessibility, or worrying about languages other than English?
8. Reason #1
Accessibility ā Disability
Accessibility is important for a wide variety of people:
* older people, who often have impaired hearing, difļ¬culty clicking on small targets, etc
* people with low literacy or not ļ¬uent in the language
* people with low bandwidth connections or using older technologies
* new and infrequent users
* ā¦and persons with disabilities
9. Reason #2
More people need help than you think
* More than half of Americans over 65 are now online, and they spend a lot of time online
* About 9% of men suffer from a type of color blindness
* The number of Americans who speak a language other than English at home has tripled since 1980, to 1 in 5 Americans. Thatās about 60 million people.
* About 5% of Americans donāt speak English ļ¬uently, thatās over 15 million people.
* Another 5% live in places where the only way to get online is through slow dial-up connections.
* And thatās just the USā¦
10. Reason #3
The cost is low
As we go along in this talk, youāll see that meeting basic accessibility needs is not that hard. Itās also not hard to set up your content to be translation
ready, even if you donāt need to support other languages right now.
11. Reason #4
Itās the right thing to do
When you donāt give any thought to how people with varying abilities can use your site, the result can be a miserable experience for them.
12. Things I learned by pretending to be blind for a week
Some well known sites, such as Facebook and Amazon, are almost unusable by blind people. The Amazon home page has over 1,000 links, few alt tags for
images, and few ARIA landmarks (āroleā attributes), which help screen readers identify different regions of a page.
13. WCAG Accessibility Guidelines
1. Perceivable
<img src="smiley.gif" alt="Smiley face">
2. Operable
<input accesskey="S" type="submit" value="Submit">
3. Understandable and Predictable
<a href="news.html" target=ā_blankā>latest news (opens new
window)</a>
4. Robust and Compatible
<label for="ļ¬rst_name">First Name</label>
The World Wide Web Consortium (W3c) put together version 2 of their Web Content Accessibility Guidelines in 2008, and it has 4 key principles:
Perceivable - e.g. provide text alternatives for non-textual content
Operable - e.g make all functionality available from the keyboard, provide good site navigation
Understandable - e.g. help users avoid and prevent mistakes, such as clearly indicating errors in a form submission
Robust - e.g. use valid, well-structured HTML to maximize compatibility with user agents such as screen readers
14. WCAG Accessibility Guidelines
1. Perceivable
2. Operable
3. Understandable and Predictable
ā Guideline 3.1.1 Language of Page:
ā The default human language of each Web page can be
programmatically determined.
4. Robust and Compatible
There are 17 guidelines to follow for making a web page understandable. The ļ¬rst one is that it should be possible to programmatically determine the
language of a web page.
15. The lang attribute
ā Declare the language of a WordPress theme in
header.php:
<html <?php language_attributes(); ?>>
For a US English site, this renders as:
<html lang="en-US">
ā In HTML 5, declare the language of part of a document
<div lang="fr">
WordPress itself has been translated to over 70 languages, and if you are developing a theme or plugin, you need to make sure you are using the lang
attribute appropriately.
The language_attributes function will set a lang attribute based on the language speciļ¬ed in your wp-conļ¬g.php ļ¬le
16. Uses of the lang attribute
ā Supports speech synthesizers and automated translators
ā Supports spelling and grammar checkers
ā Improves search engine results
ā Helps support server content negotiation
ā Allows user-agents to select language appropriate fonts
Content negotiation lets the browser tell the server what media types and languages it prefers, and the server will do its best to comply. There is a plugin
to support this in WordPress.
17. Language appropriate fonts
This ideographic character has the same Unicode value and meaning in Chinese, Japanese, and Korean. The character means āsnow.ā But it is rendered
differently, depending on whether the lang attribute of the page is set to Simpliļ¬ed Chinese, Traditional Chinese, Japanese, or Korean.
18. Unicode?
Unicode is a single character set designed to include characters from just about every writing system on the planet. This is a small section of the Unicode
character map, showing characters used in languages spoken in Myanmar.
19. Klingon for
Unicode
It supports languages from off the planet as well. Although the Klingon application for incorporation into Unicode was rejected in 2001, encoding for it
was created it whatās called the āprivate useā range of code points in Unicode. So there are web sites out there written in Klingon, and you can download
Klingon fonts so you can read them.
20. Solving the
Unicode
Puzzle:
PHP
Architect,
May 2005
In 2005 I wrote an article on conļ¬guring Apache, Oracle, and PHP for Unicode, published in PHP Architect. At that time Unicode was just emerging as the
new standard for character encoding, and conļ¬guring end-to-end support for using it in web applications was a signiļ¬cant undertaking. These days,
Unicode support comes out of the box for the most part.
21. Before there was Unicodeā¦
Lower ASCII
Unicode has been prevalent on the web for about 10 years now. In the 1960s, unaccented English characters, as well as various control characters for
carriage returns, page feeds, etc., were each assigned a number from 0 to 127; there was general agreement on these number assignments, and so ASCII
was born (American Standard Code for Information Interchange).
22. Before there was Unicodeā¦
Upper ASCII: ISO 8859-1 (aka Latin 1)
The ASCII characters could ļ¬t in 7 bits, and computers used 8-bit bytes, which left an extra bit of space. This led to the proliferation of many different
character sets, with each one using this extra space in a different way. Hereās Latin 1, which contains special symbols and accented characters for Western
languages.
23. Before there was Unicodeā¦
Upper ASCII: ISO 8859-2
Hereās the version of Upper ASCII that supports Slavic languages. There are 15 variations on this ISO standard. This means that text generated on, say, a
computer in Russia would turn into gibberish if you tried to read it on a computer in the US. This happened because the number codes representing the
Cyrillic characters were assigned to totally different characters on the US computer. This became a bit of a problem when everyone started using the
internet.
24. The Unicode slogan
āUnicode provides a unique number for every
character, no matter what the platform, no
matter what the program, no matter what the
language.ā
Unicode represents an effort to clean up this mess. Unicode can do this because it allows characters to occupy more than one byte, so it has enough room
to store characters from languages around the worldāeven Asian languages that have thousands of characters. Itās a character set able to support over 1
million characters.
25. So what is UTF-8?
Unicode is a character set, and there are 3 different ways to encode it. UTF-8 is the unicode encoding standard for the web because, like ASCII, itās an 8-
bit encoding, and itās compatible with the Latin1 ASCII character set. This makes it backwards compatible with most previously created Western language
documents.
26. Learning everyday Japanese with Mangajin
UTF-8 is the standard character encoding in WordPress, since version 2.2. Hereās an example from my blog, showing a multi-lingual post.
28. Localization (L10N) and
Internationalization (I18N)
A multi-lingual page like that is fairly uncommon. More commonly, content is created in one language, but we want a standardized way to enable the
creation of translations into other languages. This is where localization and internationalization come in.
29. Localization
āLocalization refers to the adaptation of
a product, application or document
content to meet the language, cultural
and other requirements of a speciļ¬c
target market (a locale).ā
This often involves more than just translation
In addition to translation, this can also involve dealing with variations in numeric, date, currency, and time formats, varying legal requirements, and
awareness of things that may be misunderstood or be offensive in other cultures.
30. Internationalization
āInternationalization is the design and
development of a product, application or
document content that enables easy
localization for target audiences that
vary in culture, region, or language.ā
32. Step 1: use WordPressā I18N functions
ā Wrap all your text in WordPressā I18N functions, using a
custom ātext domainā. This is for my āshashinā plugin:
ā $greeting = __( 'Howdy', 'shashin' );
ā <li><?php _e( 'Howdy', 'shashin' ); ?></li>
ā $string = _x( 'Buffalo', 'an animal', 'shashin' );
ā $string = _x( 'Buffalo', 'a city in New York', 'shashin' );
ā And othersā¦
33. Step 2: load your text domain
ā For plugins:
load_plugin_textdomain(
'shashin',
false,
dirname(plugin_basename(__FILE__)) . '/languages/'
);
Give it the path to translation ļ¬les, which we will create in the next steps
34. Step 2: load your text domain
ā For themes:
function custom_theme_setup() {
load_theme_textdomain(
'my_theme',
get_template_directory() . '/languages')
);
}
add_action('after_setup_theme', 'custom_theme_setup');
35. Step 3: generate a POT file
The POT ļ¬le serves as a template for translating your theme or plugin into other languages. It extracts all the text you wrapped in the WordPressā I18N
functions and puts them in a single ļ¬le. If you have a plugin in the wordpress.org repository, it can generate a POT ļ¬le for you. There are other tools
available for this as well. See the references at the end of this talk for other ways to generate a POT ļ¬le for themes and plugins
36. Step 4: create translation files
This is a screenshot from POEdit. With POEdit, a translator can take your POT ļ¬le and create a translation to another language. This translation creates a
textual .po ļ¬le, and then a binary, compiled version of it, in a .mo ļ¬le. If you include a .mo ļ¬le translation that matches the language conļ¬guration of a
WordPress site, your theme or plugin can be shown in that language.
If you include the .pot ļ¬le with your theme or plugin, and it becomes popular, youāll probably start receiving unsolicited translations from people who have
translated for us in their language, and want to share the translation for others to use.
37. Step 4: create translation files
ā Other translation options:
ā The Codestyling Localization plugin
ā For themes, the ThemeZee translation site
The Codestyling localization plugin creates ļ¬les compatible with POEdit, and works directly with the Google Translate API and Microsoft Translator API to
help you translate. It has not been updated in over a year though.
ThemeZee has a collaborative online theme translation community, which you can join for free
38. Step 5: include translation files
This shows all the different language translations available for the popular plugin, Contact Form 7.
Maintaining translations can be difļ¬cult, as you will usually need to get an updated translation for each new release of your plugin or theme. Even just
changes in line numbers can throw off the translation.
40. Further reading
ā W3C
ā How to meet WCAG 2.0: quick reference
ā Why use the language attribute?
ā Localization vs. Internationalization
ā WordPress
ā How To Localize WordPress Themes and Plugins
ā I18n for WordPress Developers
ā Internationalization: Youāre probably doing it wrong
ā Solving the Unicode Puzzle