SlideShare une entreprise Scribd logo
1  sur  74
Télécharger pour lire hors ligne
Multimedia Technology
Text


  S T Nandasara
  ADMTC/UCSC




                        1
World of Languages




                     2
World of Languages –                                      Asian Countries




Source: Ethnologue- Languages of the World (The exact number of languages may never be determined 3
                                                                                                  exactly)
World of Languages – Asian region




(Half of the world’s languages are spoken in only eight countries)

                                                                     4
World of Languages –                                                              Asian Countries
Country       Number of Languages         Country Population      Official or National Languages
Indonesia                           742            245,452,739    Indonesian
India                               427           1,095,351,995   Assamese, Bengali, Bodo, Dogri, English, Gujarati, Hindi, Kannada,
                                                                  Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Marwari,
                                                                  Nepali, Oriya, Panjabi, Sanskrit, Sindhi, Tamil, Telugu, Urdu,
China                               241           1,313,973,713   Chinese, Zhuang, Uighur, Hmong, Hani
Philippines                         180             89,468,677    Filipino, English
Malaysia                            147             24,385,858    Malay
Nepal                               125             28,287,147    Nepali, Gurung, Tamang
Myanmar                             109             47,382,633    Burmese
Vietnam                             93              84,402,966    Vietnamese
Laos                                 82              6,368,481    Lao
Thailand                             75             64,631,595    Thai
Iran                                 74             68,688,433    Arabic, Farsi
Pakistan                             69            165,803,560    Urdu, Panjabi, Sindhi, English
Afghanistan                          45             31,056,997    Dari, Pashto
Bangladesh                           38            147,365,352    Bengali
Bhutan                               24              2,279,723    Dzongkha
Iraq                                 23             26,783,383    Arabic, Kurdi
Cambodia                             19             13,881,427    Khmer
Brunei                               17                379,444    Malay, English
Mongolia                             12              2,832,224    Halh Mongolian
Sri Lanka                             8             20,222,240    Sinhala, Tamil, English

                                                                                                                                        5
World of Languages –                    Script Diversity

 Three types of Major Scripts in South, South
  East & East Asia
     In East Asia - Chinese Ideographic Scripts
     In South Asia, Around Indian sub-continent & Part of
      South Asia - Influence by Brahmi Scripts
     Part of South East Asia and Austrasia - Roman Scripts
 Two Major Types of Scripts in West & Central
  Asia
     In Central Asia Historically in Arabic, but later
      Transformed in to Cyrillic
   In Western Asia, Arabic Scripts is widely used
 One major Type of Script in Europe and West
     Roman Script

                                                              6
World of Languages –                                                                             Script in Asia
Chinese (Mandarin)       885,000,000   普通話                        Nepali                           16,200,000   नेपाली

English                  322,000,000   English                    Filipino (Tagalog)               14,850,000   Tagalog

Arabic (Alarabia)        280,000,000                     ‫لعربية‬   Assamese                         14,604,000   aসমীয়া

Bengali                  196,000,000   বাংলা                      Azeri/Azerbaijani (Cyrillic)     13,869,000   Азәрбајҹан дили

Hindi                    182,000,000   िह दी                      Sinhala                          13,218,000   සිං     හල

Portuguese (Português)   182,000,000   português                  Zhuang                           10,000,000   Saw cuengh

Indonesian               140,000,000   Indonesea                  Pashto/Pakhto                     9,585,000                       ‫پښتو‬

Japanese (Nihongo)       125,000,000   日本語                        Kazakh                            8,000,000   Қазақ / ‫قازاق‬

Hankuko (Korean)          75,000,000   한국어 [韓國語]                  Uighur (Uyghur)                   7,464,000   Уйғур /‫ئۇيغۇر‬

Telugu                    73,000,000    ెలుగు                     Khmer                             7,063,200   ភាសាែ◌ ខមរ

Vietnamese                66,897,000   Tiếng Việt                 Dari                              7,000,000                       ‫دَري‬
                                                                                                                                     ِ
Marathi                   64,783,000   मराठी                      Tatar                             7,000,000   татарча / ‫تاتارچا‬

Tamil                     62,000,000   தமிழ்                      Turkmen                           5,397,500   түркmенче

Turkish (Türkçe)          59,000,000   Türkçe                     Kashmiri                          4,381,000   काऽशुर / ‫كٲشر‬
                                                                                                                          ُ

Urdu                      54,000,000                       ‫اردو‬   Lao                               4,000,000   ພາສາລາວ

Gujarati                  44,000,000   ગુજરાતી                    Balinese                          3,800,000   Bahasa Bali

Malayalam                 34,014,000   മലയാളം                     Kyrgyz                            2,631,420   Кыргыз

Kannada                   33,663,000   ಕನನ್ಡ                      Fijian                             650,000    vaka-Viti

Punjabi/Panjabi           25,700,000   ਪੰ ਜਾਬੀ / ‫باجنپ‬            Maldivian Dhivehi                  280000     ި ެ ި
                                                                                                                ‫ދވހ‬

Thai                      21,000,000   ภาษาไทย                    Sanskrit                           194,433    सं कृतम ्

Sindhi                    19,675,000                     ‫سنڌي‬     Tahitian                           150,000    Te Reo Tahiti

Uzbek (Cyrillic)          18,386,000   Ўзбек                      Maori                               70,000    Te Reo Māori

Bahasa Melayu (Malay)     17,600,000   Bahasa melayu              Hawaiian                              8,000   Ōlelo Hawai'i
                                                                                                                                     7
World of Languages –   Script in Asia




                                    8
Nature of Text
 The most basic media.
   Easiest to generate, store and transfer
    in PC.
 Still the best for complex explanation.
   Using structured text/Hypertext
 Light weight
   Smallest sized media
 Static
 Language dependent (biggest
  problem)
                                              9
Text – Digital Form
Input                     Digital Form           Output
  Creation                                       Typeface
   Keyboard
                                                Bitmap font
Handwriting                                     Vector Font
                            Text Data
Handwriting Recognition

Printed Documents
Optical Character
Recognition (OCR)         (Character code)          Voice
                          ASCII: 8 bit
 Human Voice              Unicode: 16 bit        Text-to-Speech
 Voice Recognition        Universal Character Set: 32 bit
                                                          10
Indexing and Hypertext
                                        Large Text Data
 Indexing
   Rapid random access/search                While, it is hard when we try to
                                           process by machine a plur ality of
                                           media together. The tele phone and




    method for Large Text Data.
                                           radio for voice, the camera for image.
                                              we usually tend to handle diff erent
                                           media individually.        Even with the
                                           computer, the represen tative device,
                                           origin -ally it could only handle text and
                                           numbers.
                                              With technological progre ss, it




   Essential for reference type
                                           became able to handle voice and
                                           images and to com municate, but there
                                           we re still many limitat ions. Tel




    applications
    Dictionary, Encyclopedia
    Etc.
                                    a                             b             c             d     e

 Hypertext                         ad           am                 bi                  bot        by


   Non-sequential navigation           adjust                    adorn


    structure for Large Text Data
   Used in Web pages (HTML)                                  Index
                                                                                              11
Hypertext, Hypermedia and Multimedia

               ia                Hy
             ed                     pe
       tim          Hypermedia         rte
     ul                                   xt
    M

Hypermedia system includes the non-
  linear Information links of hypertext
  systems and the continuous and
  discrete media of multimedia systems.


                                               12
Typography
 Until end of 14th Century, all writing
  was done by hand.
 Typography – the design of the
  characters that make up text and
  display type and the way they are
  configured on the page.
 Modern software allows :
   Rotation or distorting type, wrap around
    images,

                                           13
Typography –     Evolution of Asian Scripts
           3 rd Bc


           1st century



           3 rd century



           6 th century



           8 th Century
                                                                                                  Pa l l awa



           10 th Century



           12 th century



           M rn
            ode                                                                                                 ණ



                                                                                        Kannada

                                                                                                  Tamil



                                                                                                                      Sinhala
                                     Devanagari

                                                  Gujarati
                                                             Bengali

                                                                       Oriya

                                                                               Teligu




                                                                                                          Malayalam
                           Panjabi




                                                                                                                                14
Typography –       Complex Scripts


Bengali    Devanagar         Gujarati
           i
Kannada    Malayalam         Teligu

Sinhala    Tamil             Ranjana

Gurmuki    Oriya             Tibetan

Khmer      Lao               Thai

Jawani     Thana             Bagini

Sanskrit




                                        15
Typography -   Complex Vowels




                                16
Typography –   ASCII & EBCDIC

       ASCII   EBCDIC




                                17
Typography –     8 Bit English and Sinhala

1989 - SLASCII




                            Wadan Tharuwa SBIOS




                                              18
The Code Page Problem
 Characters in most languages are traditionally
  represented by single-byte values
     Allows for 256 characters max
     Real limit for most encodings is 192 characters
     This includes letters, digits, punctuation, symbols
 When a system is used for a new language, the
  encoding has to be adapted to use that
  language’s characters
 Encodings proliferate
   Each language or group of languages gets its own
    encoding
   Different vendors or standards committees devise
    different encodings, so generally each language has
    several, often incompatible, encodings

                                                            19
Multi-byte encodings

 Some languages (Chinese, Japanese, Korean,
  etc.) have more than 256 characters
 Encoding standards for these languages use
  sequences of bytes for many characters
   In many standards, not all characters are the same
    number of bytes
   Can’t tell whether a given byte is a whole character
    or part of a character
   Corruption of one byte can corrupt the whole data
    stream



                                                           20
21
Interoperability problems

 Can’t easily mix languages in a document or
  system
 Data not tagged with encoding, so loss can
  occur when transferring between systems
 Most encodings are ASCII-based, so problems
  often not seen with English-only data
 Two possible solutions:
   Systematic tagging of textual data with encoding
    ID
   Universal encoding standard with all languages’
    characters
                                                  22
Encoding space


           An ASCII character is 7 bits wide




                                               23
Encoding space


     Most encodings press the eighth bit into service




                                                        24
Encoding space


     Early versions of Unicode used 16 bits




                                              25
Encoding space


  Unicode now uses 21 bits




                             26
Encoding space

    Plane    Row     Character
   number   number    number




                                 27
Unicode
 21-bit encoding space allows for 1,114,112
  characters
 95,156 code point values assigned to
  characters in Unicode 3.2
 137,216 code point values set aside for
  application use
 2,114 code point values set aside for non-
  character use
 879,626 code point values reserved for future
  character assignments



                                                  28
The Unicode Encoding Space

   10
    F
    E
    D
    C
    B
    A
    9
    8
    7
    6
    5
    4
    3
    2
    1              Basic Multilingual Plane
    0


                                    29
The Unicode Encoding Space

   10
    F
    E
    D
    C
    B
    A
    9              Supplementary Planes
    8
    7
    6
    5
    4
    3
    2
    1
    0


                                    30
The Unicode Encoding Space

   10              Supplementary Special-Purpose
    F
    E              Plane
    D
    C
    B
    A
    9
    8
    7
    6
    5
    4
    3              Supplementary Ideographic Plane
    2              Supplementary Multilingual Plane
    1
    0


                                          31
The Unicode Encoding Space

                   Private Use Planes
   10
    F
    E
    D
    C
    B
    A
    9
    8
    7
    6
    5
    4
    3
    2
    1
    0


                                        32
The Unicode Encoding Space

   10
    F
    E
    D
    C
    B
    A
    9
    8
    7
    6
    5
    4
    3
    2
    1              Basic Multilingual Plane
    0


                                     33
The Basic Multilingual Plane
    0
       General Scripts Area
    1
    2          Symbols Area                    CJK Punct.

    3 CJK Punct.
    4
    5
                            Han
    6
    7
    8
    9
    A      Yi
    B
                          Hangul
    C
    D                             Surrogates Area
    E
         Private Use Area
    F                            Compatibility Area
                                                            34
The General Scripts Area
   00/01                      Latin
   02/03           IPA      Diacriticals         Greek
   04/05            Cyrillic             Armenian Hebrew
   06/07        Arabic             Syriac         Thaana
   08/09                         Devanagari Bengali
  0A/0B Gurmukhi     Gujarati        Oriya        Tamil
  0C/0D Telugu        Kannada Malayalam          Sinhala
  0E/0F    Thai        Lao                 Tibetan
   10/11  Myanmar Georgian                 Hangul
   12/13             Ethiopic                       Cherokee
   14/15         Canadian Aboriginal Syllabics
                   Ogh
   16/17           am   Runic      Philippine    Khmer
   18/19 Mongolian
  1A/1B
  1C/1D
  1E/1F          Latin                      Greek
                                                               35
Unicode Coverage
 European scripts
   Latin, Greek, Cyrillic, Armenian, Georgian, IPA
 Bidirectional (Middle Eastern) scripts
   Hebrew, Arabic, Syriac, Thaana
 Indic (Indian and Southeast Asian) scripts
   Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil,
      Telugu, Kannada, Malayalam, Sinhala, Thai, Lao,
      Khmer, Myanmar, Tibetan, Philippine
 East Asian scripts
   Chinese (Han) characters, Japanese (Hiragana and
      Katakana), Korean (Hangul), Yi
 Other modern scripts
   Mongolian, Ethiopic, Cherokee, Canadian Aboriginal
 Historical scripts
   Runic, Ogham, Old Italic, Gothic, Deseret
 Punctuation and symbols
   Numerals, math symbols, scientific symbols, arrows,
      blocks, geometric shapes, Braille, musical notation, etc.
                                                              36
Characters, Glyphs, and Fonts


 In computer terms, a character is a
  grouping of bits (binary ones and
  zeros) in packages of 8: one or more
  bytes
 There are two broad classes of
  characters: data characters and
  control characters


                                         37
Characters, Glyphs, and Fonts


A – Arial
A - Times New Roman
A - Courier new
A – Giddyup Standard
A - Bodoni
A - Papyrus
A - Forte
                                38
Characters, Glyphs, and Fonts


 You can run out of available characters pretty
  quick if you allow all those strange foreign,
  mathematical, scientific, engineering, currency,
  and other symbols


   (Informal Roman)

                                                     39
Unicode properties

 0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061;

 Representative
 glyph                  A
              Code point: 0041
              Name: LATIN CAPITAL LETTER A
 Semantic     General category: Uppercase letter (Lu)
 properties   Canonical combining class: Standard spacing (0)
              Bidirectional category: Left-to-right (L)
              Mirrored: no (N)
              Lowercase mapping: 0061
                                                        40
Combining characters

      One character…




                       41
Combining characters

      …or two?




                       42
Combining characters
Actually, either.
    Unicode is generative, with accent marks represented
    with their own code point values…

                    = U+0065 (e) U+0301 (accent)

   …but common combinations of letters and accents are
   also given their own code points for convenience.

                    = U+00E9


                                                           43
Combining characters

  This can be tough, because the two representations are
  to be treated as absolutely identical.


                              =
        U+0065 U+0301         =      U+00E9




                                                           44
Combining characters
Things can get really wild for characters with more
than one accent mark:

              = 006F (o) 0302 (circumflex) 0323 (dot)
              = 006F (o) 0323 (dot) 0302 (circumflex)
              = 00F4 (o-circumflex) 0323 (dot)
              = 1ECD (o-dot) 0302 (circumflex)
              = 1ED9 (o-circumflex-dot)



                                                        45
Typography -   Complex Vowels Positioning




                                            46
Smart rendering: Arabic
Keyboard:    Code points:
             0628 064e 0628 0650
  babibu b
  babib
  babi
  bab
  ba
Screen:      0628 064f 0020 0628




                              47
Smart rendering: Burmese

Keyboard:   Code points:
             1000 1039 101b
 krui
 kru
 kr
             102f 102d
 Screen:




                              48
Smart rendering: Tamil
           Ur r y N m k j
 Keyboard: Ur rU yU NU mU kU jU
 Code    b8a bb0 bb0 bc2 baf bc2
 points: ba3 bc2 bae bc2 b95 bc2
 Screen: b9c bc2




                             49
Typography -   Complex Ligature




                                  50
Canonical equivalence

          01FA
         LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE



          212B 0301
         ANGSTROM SIGN
         COMBINING ACUTE ACCENT

          00C5 0301
         LATIN CAPITAL LETTER A WITH RING ABOVE
         COMBINING ACUTE ACCENT


          0041 030A 0301
         LATIN CAPITAL LETTER A
         COMBINING RING ABOVE
         COMBINING ACUTE ACCENT
                                                      51
Case mapping

 Case mapping may produce strings of different length



      01F0  004A 030C
 Case mapping may depend on the locale


   English          0069  0049



   Turkish/Azeri    0069  0130


                                                         52
Combining characters
Things can get really wild for characters with more
than one accent mark:

              = 006F (o) 0302 (circumflex) 0323 (dot)
              = 006F (o) 0323 (dot) 0302 (circumflex)
              = 00F4 (o-circumflex) 0323 (dot)
              = 1ECD (o-dot) 0302 (circumflex)
              = 1ED9 (o-circumflex-dot)



                                                        53
Typography –                Unicode Sinhala
                                 1998 – Unicode Ver. 3.0 Sinhala
1987- Unicode Ver. 1.0 Sinhala




                                                         54
Typography -                 Complex Ligature




 ttha in Devanagari   ttha in Tamil   Tva in Malayalam   Tva in Sinhala


                                                                 55
Typography -                 Complex Ligature

 U+200C UTF8 E2 80 8C            U+200D UTF8 E2 80 8D




Tva with ZWNJ in Malayalam     Tva with ZWJ in Malayalam




Tva with ZWNJ in Sinhala       Tva with ZWJ in Sinhala



                                                           56
Typography -                          Complex Ligature-UTF 8

U+0000    .. U+007F     1 byte    0xxx xxxx
U+0080    .. U+07FF     2 bytes   110x xxxx 10xx xxxx
U+0800    .. U+FFFF     3 bytes   1110 xxxx 10xx xxxx 10xx xxxx
U+10000   .. U+10FFFF   4 bytes   1111 0xxx 10xx xxxx 10xx xxxx   10xx xxxx




 U+0026       AMPERSAND (decimal 38)
 U+0D85       SINHALA LETTER AYANNA (decimal 3,461)
 U+4E2D       HAN IDEOGRAPH 4E2D (decimal 20,013)
 U+10346      GOTHIC LETTER FAIHU (decimal 66,374)
 U+0E12       THAI LETTER THO PHUTHAO (3602)

                                                                              57
Typography -        Complex Ligature




     Preventing Conjunct Forms in Devanagari




            Half-Consonants in Devanagari
                                               58
Typography -   Complex Ligature




       Buddha in Sinhala


                                  59
Typography -                         Complex Ligature in DB
<html>
 <head>
  <title>සිංහල</title></head>
 <body>
  <?php
   include("connection.php"); //simple connection setting
   $result = mysql_query("SET NAMES utf8"); //the main trick
   $cmd = "select * from sinhala";
   $result = mysql_query($cmd);
   while ($myrow = mysql_fetch_row($result))
   {
       echo ($myrow[0]);
   }
  ?>
 </body>
</html>

//The dump for my database storing sinhala utf strings is
CREATE TABLE `sinhala` (
  `data` varchar(1000) character set utf8 collate utf8_bin default NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `sinhala` VALUES (‘අම්මා');

                                                                          60
Typography
 Typical typefaces (fonts) and type styles used
  in Word Processors
Typefaces
 Times New Roman          Arial           symbol
 Courier                  Impact
                          Arial Narrow   free hand
 Palatino
                          San Serif      Special
  Serif typefaces         typefaces      typefaces
Crazy fonts can be distracting!
Type styles    Bold Italics Outline
                                                       61
Typography

 Special effects
   Kerning increases or decreases the spacing
    between certain pairs of letters to improve
    their appearance.
   Line spacing or leading
   Orientation
   Anti-alias : To smooth out a text edge.This
    makes the edges of the text blend into the
    background so that the text is cleaner and
    more readable when it is large.



                                                  62
Typography


Ascender height
Cap Height



X height

Base line


Descanter height




                   63
Typography - Tracking & Kerning




                                  64
Typography - Orientation




                           65
Typography – Anti-alias




                          66
Typography
  Special effects cont..
    strokes, fills, effects and styles
     to text

stroke      fill        effect     style




                                          67
Typography
 Special effects cont..
   Attaching text to a path




                               68
Typography
 Special effects cont..
   Converting text to path :
   Text converted to paths retains all
    of its visual attributes, but you
    can edit it only as paths.




                                         69
Typography
 Bitmap Font
 Vector Font
   True Type
    Fast, Standard, for
    computer screen, Printer
   Adobe Type 1
    Precise, Professional, used   Screen from “Fontographer”
    for publishing
                                             Normal
 Anti-aliased Small font
   For LCD screen
    ClearType etc.
                                            Optimized
                                                    70
Text- Cross-media Technology
 Voice Recognition
   Converts voice (sound data) text data
   Need real time procession
   Specific speaker/Non specific speaker
 Text-to-Speech (Speech Synthesis)
   Computer “dictates” text data
    Automatic information services/New
    mail dictation.


                                            71
Text- Cross-media Technology cont…

 Optical Character Recognition
   Converts text bitmap image to real text
    data
   Used with image scanner
 Handwriting Recognition
   Similar to OCR, but use writing
    order/direction for better recognition.
   Used in PIM (Personal Information
    Manager)Devices (palmtop computers),


                                          72
Text- Cross-media Technology cont…
 Machine Translation
   All text based techniques are language
    dependent
   Needs automatic translation
   Vertical Market – Technical document translation
    Personal Market – Web browsing
   Combination of media technology
   Automatically translate international telephone
    messages.
   Japanese            Japanese            English        English
   Voice               Text data           Text data      Voice
   Japanese                        Machine             English
   voice recognition               Translation         Speech Synthesis
                                                              73
File Format

 .TXT - (unformatted text eg. Notepad)
 .DOC - (Developed by Microsoft eg. MS-
  Word)
 .RTF - (Rich Text Format)
 PDF - (Portable Document Format) –
  Adobe
 PS - (Post Script) – Page Description
  Language Use mainly for Desk Top
  Publishing
                                     74

Contenu connexe

En vedette

Technology At Work Design Template
Technology At Work Design TemplateTechnology At Work Design Template
Technology At Work Design TemplateTirhonda Lewis
 
Technology & Work Design
Technology & Work DesignTechnology & Work Design
Technology & Work Designrewa_monami
 
Multimedia Presentation on Obsolete and Emerging Technologies
Multimedia Presentation on Obsolete and Emerging TechnologiesMultimedia Presentation on Obsolete and Emerging Technologies
Multimedia Presentation on Obsolete and Emerging Technologiesvbjrvb36
 
Multimedia Presentation
Multimedia PresentationMultimedia Presentation
Multimedia PresentationRajesh R. Nair
 
Emerging database technology multimedia database
Emerging database technology   multimedia databaseEmerging database technology   multimedia database
Emerging database technology multimedia databaseSalama Al Busaidi
 
Lecture # 3
Lecture # 3Lecture # 3
Lecture # 3Mr SMAK
 
Multimedia file formats
Multimedia file formatsMultimedia file formats
Multimedia file formatsShruti Garg
 
Hypertext,hypermedia and multimedia
Hypertext,hypermedia and multimediaHypertext,hypermedia and multimedia
Hypertext,hypermedia and multimediagaflores2
 
Hypertext, hypermedia and multimedia
Hypertext, hypermedia and multimediaHypertext, hypermedia and multimedia
Hypertext, hypermedia and multimediafernandadavalos2566
 
multimedia data and file format
multimedia data and file formatmultimedia data and file format
multimedia data and file formatALOK SAHNI
 
MultiMedia dbms
MultiMedia dbmsMultiMedia dbms
MultiMedia dbmsTech_MX
 
multimedia technologies Introduction
multimedia technologies Introductionmultimedia technologies Introduction
multimedia technologies IntroductionMohammed Fareed
 
Multimedia data and file format
Multimedia data and file formatMultimedia data and file format
Multimedia data and file formatNiketa Jain
 

En vedette (20)

Technology At Work Design Template
Technology At Work Design TemplateTechnology At Work Design Template
Technology At Work Design Template
 
Technology & Work Design
Technology & Work DesignTechnology & Work Design
Technology & Work Design
 
Multimedia Presentation on Obsolete and Emerging Technologies
Multimedia Presentation on Obsolete and Emerging TechnologiesMultimedia Presentation on Obsolete and Emerging Technologies
Multimedia Presentation on Obsolete and Emerging Technologies
 
Multimedia Presentation
Multimedia PresentationMultimedia Presentation
Multimedia Presentation
 
Unit 4 and 5
Unit 4 and 5Unit 4 and 5
Unit 4 and 5
 
Hypertext: An Overview
Hypertext: An OverviewHypertext: An Overview
Hypertext: An Overview
 
Notes on a Standard: Unicode
Notes on a Standard: UnicodeNotes on a Standard: Unicode
Notes on a Standard: Unicode
 
Pablo 9r multimedia
Pablo 9r multimediaPablo 9r multimedia
Pablo 9r multimedia
 
CLI313
CLI313CLI313
CLI313
 
Emerging database technology multimedia database
Emerging database technology   multimedia databaseEmerging database technology   multimedia database
Emerging database technology multimedia database
 
Lecture # 3
Lecture # 3Lecture # 3
Lecture # 3
 
Unicode (and Python)
Unicode (and Python)Unicode (and Python)
Unicode (and Python)
 
Multimedia file formats
Multimedia file formatsMultimedia file formats
Multimedia file formats
 
Ch04
Ch04Ch04
Ch04
 
Hypertext,hypermedia and multimedia
Hypertext,hypermedia and multimediaHypertext,hypermedia and multimedia
Hypertext,hypermedia and multimedia
 
Hypertext, hypermedia and multimedia
Hypertext, hypermedia and multimediaHypertext, hypermedia and multimedia
Hypertext, hypermedia and multimedia
 
multimedia data and file format
multimedia data and file formatmultimedia data and file format
multimedia data and file format
 
MultiMedia dbms
MultiMedia dbmsMultiMedia dbms
MultiMedia dbms
 
multimedia technologies Introduction
multimedia technologies Introductionmultimedia technologies Introduction
multimedia technologies Introduction
 
Multimedia data and file format
Multimedia data and file formatMultimedia data and file format
Multimedia data and file format
 

Dernier

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 

Dernier (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 

Multimedia Technology - text

  • 1. Multimedia Technology Text S T Nandasara ADMTC/UCSC 1
  • 3. World of Languages – Asian Countries Source: Ethnologue- Languages of the World (The exact number of languages may never be determined 3 exactly)
  • 4. World of Languages – Asian region (Half of the world’s languages are spoken in only eight countries) 4
  • 5. World of Languages – Asian Countries Country Number of Languages Country Population Official or National Languages Indonesia 742 245,452,739 Indonesian India 427 1,095,351,995 Assamese, Bengali, Bodo, Dogri, English, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Marwari, Nepali, Oriya, Panjabi, Sanskrit, Sindhi, Tamil, Telugu, Urdu, China 241 1,313,973,713 Chinese, Zhuang, Uighur, Hmong, Hani Philippines 180 89,468,677 Filipino, English Malaysia 147 24,385,858 Malay Nepal 125 28,287,147 Nepali, Gurung, Tamang Myanmar 109 47,382,633 Burmese Vietnam 93 84,402,966 Vietnamese Laos 82 6,368,481 Lao Thailand 75 64,631,595 Thai Iran 74 68,688,433 Arabic, Farsi Pakistan 69 165,803,560 Urdu, Panjabi, Sindhi, English Afghanistan 45 31,056,997 Dari, Pashto Bangladesh 38 147,365,352 Bengali Bhutan 24 2,279,723 Dzongkha Iraq 23 26,783,383 Arabic, Kurdi Cambodia 19 13,881,427 Khmer Brunei 17 379,444 Malay, English Mongolia 12 2,832,224 Halh Mongolian Sri Lanka 8 20,222,240 Sinhala, Tamil, English 5
  • 6. World of Languages – Script Diversity  Three types of Major Scripts in South, South East & East Asia  In East Asia - Chinese Ideographic Scripts  In South Asia, Around Indian sub-continent & Part of South Asia - Influence by Brahmi Scripts  Part of South East Asia and Austrasia - Roman Scripts  Two Major Types of Scripts in West & Central Asia  In Central Asia Historically in Arabic, but later Transformed in to Cyrillic  In Western Asia, Arabic Scripts is widely used  One major Type of Script in Europe and West  Roman Script 6
  • 7. World of Languages – Script in Asia Chinese (Mandarin) 885,000,000 普通話 Nepali 16,200,000 नेपाली English 322,000,000 English Filipino (Tagalog) 14,850,000 Tagalog Arabic (Alarabia) 280,000,000 ‫لعربية‬ Assamese 14,604,000 aসমীয়া Bengali 196,000,000 বাংলা Azeri/Azerbaijani (Cyrillic) 13,869,000 Азәрбајҹан дили Hindi 182,000,000 िह दी Sinhala 13,218,000 සිං හල Portuguese (Português) 182,000,000 português Zhuang 10,000,000 Saw cuengh Indonesian 140,000,000 Indonesea Pashto/Pakhto 9,585,000 ‫پښتو‬ Japanese (Nihongo) 125,000,000 日本語 Kazakh 8,000,000 Қазақ / ‫قازاق‬ Hankuko (Korean) 75,000,000 한국어 [韓國語] Uighur (Uyghur) 7,464,000 Уйғур /‫ئۇيغۇر‬ Telugu 73,000,000 ెలుగు Khmer 7,063,200 ភាសាែ◌ ខមរ Vietnamese 66,897,000 Tiếng Việt Dari 7,000,000 ‫دَري‬ ِ Marathi 64,783,000 मराठी Tatar 7,000,000 татарча / ‫تاتارچا‬ Tamil 62,000,000 தமிழ் Turkmen 5,397,500 түркmенче Turkish (Türkçe) 59,000,000 Türkçe Kashmiri 4,381,000 काऽशुर / ‫كٲشر‬ ُ Urdu 54,000,000 ‫اردو‬ Lao 4,000,000 ພາສາລາວ Gujarati 44,000,000 ગુજરાતી Balinese 3,800,000 Bahasa Bali Malayalam 34,014,000 മലയാളം Kyrgyz 2,631,420 Кыргыз Kannada 33,663,000 ಕನನ್ಡ Fijian 650,000 vaka-Viti Punjabi/Panjabi 25,700,000 ਪੰ ਜਾਬੀ / ‫باجنپ‬ Maldivian Dhivehi 280000 ި ެ ި ‫ދވހ‬ Thai 21,000,000 ภาษาไทย Sanskrit 194,433 सं कृतम ् Sindhi 19,675,000 ‫سنڌي‬ Tahitian 150,000 Te Reo Tahiti Uzbek (Cyrillic) 18,386,000 Ўзбек Maori 70,000 Te Reo Māori Bahasa Melayu (Malay) 17,600,000 Bahasa melayu Hawaiian 8,000 Ōlelo Hawai'i 7
  • 8. World of Languages – Script in Asia 8
  • 9. Nature of Text  The most basic media.  Easiest to generate, store and transfer in PC.  Still the best for complex explanation.  Using structured text/Hypertext  Light weight  Smallest sized media  Static  Language dependent (biggest problem) 9
  • 10. Text – Digital Form Input Digital Form Output Creation Typeface Keyboard Bitmap font Handwriting Vector Font Text Data Handwriting Recognition Printed Documents Optical Character Recognition (OCR) (Character code) Voice ASCII: 8 bit Human Voice Unicode: 16 bit Text-to-Speech Voice Recognition Universal Character Set: 32 bit 10
  • 11. Indexing and Hypertext Large Text Data  Indexing  Rapid random access/search While, it is hard when we try to process by machine a plur ality of media together. The tele phone and method for Large Text Data. radio for voice, the camera for image. we usually tend to handle diff erent media individually. Even with the computer, the represen tative device, origin -ally it could only handle text and numbers. With technological progre ss, it  Essential for reference type became able to handle voice and images and to com municate, but there we re still many limitat ions. Tel applications Dictionary, Encyclopedia Etc. a b c d e  Hypertext ad am bi bot by  Non-sequential navigation adjust adorn structure for Large Text Data  Used in Web pages (HTML) Index 11
  • 12. Hypertext, Hypermedia and Multimedia ia Hy ed pe tim Hypermedia rte ul xt M Hypermedia system includes the non- linear Information links of hypertext systems and the continuous and discrete media of multimedia systems. 12
  • 13. Typography  Until end of 14th Century, all writing was done by hand.  Typography – the design of the characters that make up text and display type and the way they are configured on the page.  Modern software allows :  Rotation or distorting type, wrap around images, 13
  • 14. Typography – Evolution of Asian Scripts 3 rd Bc 1st century 3 rd century 6 th century 8 th Century Pa l l awa 10 th Century 12 th century M rn ode ණ Kannada Tamil Sinhala Devanagari Gujarati Bengali Oriya Teligu Malayalam Panjabi 14
  • 15. Typography – Complex Scripts Bengali Devanagar Gujarati i Kannada Malayalam Teligu Sinhala Tamil Ranjana Gurmuki Oriya Tibetan Khmer Lao Thai Jawani Thana Bagini Sanskrit 15
  • 16. Typography - Complex Vowels 16
  • 17. Typography – ASCII & EBCDIC ASCII EBCDIC 17
  • 18. Typography – 8 Bit English and Sinhala 1989 - SLASCII Wadan Tharuwa SBIOS 18
  • 19. The Code Page Problem  Characters in most languages are traditionally represented by single-byte values  Allows for 256 characters max  Real limit for most encodings is 192 characters  This includes letters, digits, punctuation, symbols  When a system is used for a new language, the encoding has to be adapted to use that language’s characters  Encodings proliferate  Each language or group of languages gets its own encoding  Different vendors or standards committees devise different encodings, so generally each language has several, often incompatible, encodings 19
  • 20. Multi-byte encodings  Some languages (Chinese, Japanese, Korean, etc.) have more than 256 characters  Encoding standards for these languages use sequences of bytes for many characters  In many standards, not all characters are the same number of bytes  Can’t tell whether a given byte is a whole character or part of a character  Corruption of one byte can corrupt the whole data stream 20
  • 21. 21
  • 22. Interoperability problems  Can’t easily mix languages in a document or system  Data not tagged with encoding, so loss can occur when transferring between systems  Most encodings are ASCII-based, so problems often not seen with English-only data  Two possible solutions:  Systematic tagging of textual data with encoding ID  Universal encoding standard with all languages’ characters 22
  • 23. Encoding space An ASCII character is 7 bits wide 23
  • 24. Encoding space Most encodings press the eighth bit into service 24
  • 25. Encoding space Early versions of Unicode used 16 bits 25
  • 26. Encoding space Unicode now uses 21 bits 26
  • 27. Encoding space Plane Row Character number number number 27
  • 28. Unicode  21-bit encoding space allows for 1,114,112 characters  95,156 code point values assigned to characters in Unicode 3.2  137,216 code point values set aside for application use  2,114 code point values set aside for non- character use  879,626 code point values reserved for future character assignments 28
  • 29. The Unicode Encoding Space 10 F E D C B A 9 8 7 6 5 4 3 2 1 Basic Multilingual Plane 0 29
  • 30. The Unicode Encoding Space 10 F E D C B A 9 Supplementary Planes 8 7 6 5 4 3 2 1 0 30
  • 31. The Unicode Encoding Space 10 Supplementary Special-Purpose F E Plane D C B A 9 8 7 6 5 4 3 Supplementary Ideographic Plane 2 Supplementary Multilingual Plane 1 0 31
  • 32. The Unicode Encoding Space Private Use Planes 10 F E D C B A 9 8 7 6 5 4 3 2 1 0 32
  • 33. The Unicode Encoding Space 10 F E D C B A 9 8 7 6 5 4 3 2 1 Basic Multilingual Plane 0 33
  • 34. The Basic Multilingual Plane 0 General Scripts Area 1 2 Symbols Area CJK Punct. 3 CJK Punct. 4 5 Han 6 7 8 9 A Yi B Hangul C D Surrogates Area E Private Use Area F Compatibility Area 34
  • 35. The General Scripts Area 00/01 Latin 02/03 IPA Diacriticals Greek 04/05 Cyrillic Armenian Hebrew 06/07 Arabic Syriac Thaana 08/09 Devanagari Bengali 0A/0B Gurmukhi Gujarati Oriya Tamil 0C/0D Telugu Kannada Malayalam Sinhala 0E/0F Thai Lao Tibetan 10/11 Myanmar Georgian Hangul 12/13 Ethiopic Cherokee 14/15 Canadian Aboriginal Syllabics Ogh 16/17 am Runic Philippine Khmer 18/19 Mongolian 1A/1B 1C/1D 1E/1F Latin Greek 35
  • 36. Unicode Coverage  European scripts  Latin, Greek, Cyrillic, Armenian, Georgian, IPA  Bidirectional (Middle Eastern) scripts  Hebrew, Arabic, Syriac, Thaana  Indic (Indian and Southeast Asian) scripts  Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Khmer, Myanmar, Tibetan, Philippine  East Asian scripts  Chinese (Han) characters, Japanese (Hiragana and Katakana), Korean (Hangul), Yi  Other modern scripts  Mongolian, Ethiopic, Cherokee, Canadian Aboriginal  Historical scripts  Runic, Ogham, Old Italic, Gothic, Deseret  Punctuation and symbols  Numerals, math symbols, scientific symbols, arrows, blocks, geometric shapes, Braille, musical notation, etc. 36
  • 37. Characters, Glyphs, and Fonts  In computer terms, a character is a grouping of bits (binary ones and zeros) in packages of 8: one or more bytes  There are two broad classes of characters: data characters and control characters 37
  • 38. Characters, Glyphs, and Fonts A – Arial A - Times New Roman A - Courier new A – Giddyup Standard A - Bodoni A - Papyrus A - Forte 38
  • 39. Characters, Glyphs, and Fonts  You can run out of available characters pretty quick if you allow all those strange foreign, mathematical, scientific, engineering, currency, and other symbols (Informal Roman) 39
  • 40. Unicode properties 0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061; Representative glyph A Code point: 0041 Name: LATIN CAPITAL LETTER A Semantic General category: Uppercase letter (Lu) properties Canonical combining class: Standard spacing (0) Bidirectional category: Left-to-right (L) Mirrored: no (N) Lowercase mapping: 0061 40
  • 41. Combining characters One character… 41
  • 42. Combining characters …or two? 42
  • 43. Combining characters Actually, either. Unicode is generative, with accent marks represented with their own code point values… = U+0065 (e) U+0301 (accent) …but common combinations of letters and accents are also given their own code points for convenience. = U+00E9 43
  • 44. Combining characters This can be tough, because the two representations are to be treated as absolutely identical. = U+0065 U+0301 = U+00E9 44
  • 45. Combining characters Things can get really wild for characters with more than one accent mark: = 006F (o) 0302 (circumflex) 0323 (dot) = 006F (o) 0323 (dot) 0302 (circumflex) = 00F4 (o-circumflex) 0323 (dot) = 1ECD (o-dot) 0302 (circumflex) = 1ED9 (o-circumflex-dot) 45
  • 46. Typography - Complex Vowels Positioning 46
  • 47. Smart rendering: Arabic Keyboard: Code points: 0628 064e 0628 0650 babibu b babib babi bab ba Screen: 0628 064f 0020 0628 47
  • 48. Smart rendering: Burmese Keyboard: Code points: 1000 1039 101b krui kru kr 102f 102d Screen: 48
  • 49. Smart rendering: Tamil Ur r y N m k j Keyboard: Ur rU yU NU mU kU jU Code b8a bb0 bb0 bc2 baf bc2 points: ba3 bc2 bae bc2 b95 bc2 Screen: b9c bc2 49
  • 50. Typography - Complex Ligature 50
  • 51. Canonical equivalence 01FA LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE 212B 0301 ANGSTROM SIGN COMBINING ACUTE ACCENT 00C5 0301 LATIN CAPITAL LETTER A WITH RING ABOVE COMBINING ACUTE ACCENT 0041 030A 0301 LATIN CAPITAL LETTER A COMBINING RING ABOVE COMBINING ACUTE ACCENT 51
  • 52. Case mapping  Case mapping may produce strings of different length 01F0  004A 030C  Case mapping may depend on the locale English 0069  0049 Turkish/Azeri 0069  0130 52
  • 53. Combining characters Things can get really wild for characters with more than one accent mark: = 006F (o) 0302 (circumflex) 0323 (dot) = 006F (o) 0323 (dot) 0302 (circumflex) = 00F4 (o-circumflex) 0323 (dot) = 1ECD (o-dot) 0302 (circumflex) = 1ED9 (o-circumflex-dot) 53
  • 54. Typography – Unicode Sinhala 1998 – Unicode Ver. 3.0 Sinhala 1987- Unicode Ver. 1.0 Sinhala 54
  • 55. Typography - Complex Ligature ttha in Devanagari ttha in Tamil Tva in Malayalam Tva in Sinhala 55
  • 56. Typography - Complex Ligature U+200C UTF8 E2 80 8C U+200D UTF8 E2 80 8D Tva with ZWNJ in Malayalam Tva with ZWJ in Malayalam Tva with ZWNJ in Sinhala Tva with ZWJ in Sinhala 56
  • 57. Typography - Complex Ligature-UTF 8 U+0000 .. U+007F 1 byte 0xxx xxxx U+0080 .. U+07FF 2 bytes 110x xxxx 10xx xxxx U+0800 .. U+FFFF 3 bytes 1110 xxxx 10xx xxxx 10xx xxxx U+10000 .. U+10FFFF 4 bytes 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx U+0026 AMPERSAND (decimal 38) U+0D85 SINHALA LETTER AYANNA (decimal 3,461) U+4E2D HAN IDEOGRAPH 4E2D (decimal 20,013) U+10346 GOTHIC LETTER FAIHU (decimal 66,374) U+0E12 THAI LETTER THO PHUTHAO (3602) 57
  • 58. Typography - Complex Ligature Preventing Conjunct Forms in Devanagari Half-Consonants in Devanagari 58
  • 59. Typography - Complex Ligature Buddha in Sinhala 59
  • 60. Typography - Complex Ligature in DB <html> <head> <title>සිංහල</title></head> <body> <?php include("connection.php"); //simple connection setting $result = mysql_query("SET NAMES utf8"); //the main trick $cmd = "select * from sinhala"; $result = mysql_query($cmd); while ($myrow = mysql_fetch_row($result)) { echo ($myrow[0]); } ?> </body> </html> //The dump for my database storing sinhala utf strings is CREATE TABLE `sinhala` ( `data` varchar(1000) character set utf8 collate utf8_bin default NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `sinhala` VALUES (‘අම්මා'); 60
  • 61. Typography  Typical typefaces (fonts) and type styles used in Word Processors Typefaces Times New Roman Arial  symbol Courier Impact Arial Narrow free hand Palatino San Serif Special Serif typefaces typefaces typefaces Crazy fonts can be distracting! Type styles Bold Italics Outline 61
  • 62. Typography  Special effects  Kerning increases or decreases the spacing between certain pairs of letters to improve their appearance.  Line spacing or leading  Orientation  Anti-alias : To smooth out a text edge.This makes the edges of the text blend into the background so that the text is cleaner and more readable when it is large. 62
  • 63. Typography Ascender height Cap Height X height Base line Descanter height 63
  • 64. Typography - Tracking & Kerning 64
  • 67. Typography  Special effects cont..  strokes, fills, effects and styles to text stroke fill effect style 67
  • 68. Typography  Special effects cont..  Attaching text to a path 68
  • 69. Typography  Special effects cont..  Converting text to path : Text converted to paths retains all of its visual attributes, but you can edit it only as paths. 69
  • 70. Typography  Bitmap Font  Vector Font  True Type Fast, Standard, for computer screen, Printer  Adobe Type 1 Precise, Professional, used Screen from “Fontographer” for publishing Normal  Anti-aliased Small font  For LCD screen ClearType etc. Optimized 70
  • 71. Text- Cross-media Technology  Voice Recognition  Converts voice (sound data) text data  Need real time procession  Specific speaker/Non specific speaker  Text-to-Speech (Speech Synthesis)  Computer “dictates” text data Automatic information services/New mail dictation. 71
  • 72. Text- Cross-media Technology cont…  Optical Character Recognition  Converts text bitmap image to real text data  Used with image scanner  Handwriting Recognition  Similar to OCR, but use writing order/direction for better recognition.  Used in PIM (Personal Information Manager)Devices (palmtop computers), 72
  • 73. Text- Cross-media Technology cont…  Machine Translation  All text based techniques are language dependent  Needs automatic translation Vertical Market – Technical document translation Personal Market – Web browsing  Combination of media technology Automatically translate international telephone messages. Japanese Japanese English English Voice Text data Text data Voice Japanese Machine English voice recognition Translation Speech Synthesis 73
  • 74. File Format  .TXT - (unformatted text eg. Notepad)  .DOC - (Developed by Microsoft eg. MS- Word)  .RTF - (Rich Text Format)  PDF - (Portable Document Format) – Adobe  PS - (Post Script) – Page Description Language Use mainly for Desk Top Publishing 74