SlideShare une entreprise Scribd logo
1  sur  14
UNICODE TRANSFORMATION
FORMAT
By ANKIT SHARMA
                  Page 1
INTRODUCTION
• Computers at their most basic level just
  deal with numbers. They store letters,
  numerals and other characters by
  assigning a number for each one.
• �In the pre-Unicode environment, we
  had single 8-bit characters sets, which
  limited us to 256 characters max. No
  single encoding could contain enough
  characters to cover all the languages.
• �so hundreds of different encoding
  systems were developed for assigning
  numbers to characters.
                                             Page 2
Cnt…

• As a result, these coding systems
  conflict with each other. That is, two
  encodings can use the same number
  for two different characters or different
  numbers for the same character.
• �Any given computer needs to support
  many different encodings.
• �yet whenever data is passed
  between different encodings or
  platforms, that data always runs the
  risk of corruption.

                                              Page 3
examples of character encoding
            systems
• examples of character encoding
  systems
• Morse code,
• Baudot code,
• the American Standard Code for
  Information Interchange (ASCII)
• Unicode.


                                    Page 4
WHAT IS UNICODE ?



 Unicode provides a unique number for
 every character,
 no matter what the platform,
 no matter what the program,
 no matter what the language.



 The Unicode Standard is a character coding
 system designed to support the worldwide
 interchange, processing, and display of the
 written texts of the diverse languages.


                                               Page 5
From ASCII to Unicode
• �Most character sets and encodings in
  70s/80s were modifications or
  extensions of ASCII
• �Most common encodings now a days
  use single byte per character (SBCS)
• �They are all limited to 256 characters
• �Due to that, none of them can even
  cover the letters for the Western
  European languages




                                            Page 6
Where is Unicode Used ?
• �The Unicode standards has been
  adopted by many software and hardware
  vendors.
• �Most OSs support Unicode.
• �Unicode is required for international
  document and data interchange, the
  Internet and the WWW, and therefore by
  modern standards such as:
• �Java, C#, Perl, Python
• �Markup languages such as XML,
  HTML, XHTML,
• �JavaScript, LDAP, CORBA etc.
                                           Page 7
UTF-8
• �UTF-8 is the 8-bit encoding of Unicode
• �It’s a variable-width encoding and also
  a strict superset of ASCII.
• �“Strict superset” means that every
  character in ASCII is available in UTF-8
  with the same corresponding code point
  value
• �1 character = 1byte to 4 bytes in the
  encoding
• �Characters from European scripts:
  either 1or 2 bytes
• �Asian scripts: 3 or 4 bytes

                                             Page 8
• �UTF-8 used for UNIX-platforms, HTML
  and most Internet Browsers
• �Main benefits of UTF-8
• �compact storage requirements for
  European scripts
• �In general European scripts will occupy
  less storage on disk and memory
• �Ease of migration –since 7-bit ASCII
  data remains the same in UTF-8, data
  conversion effort between ASCII based
  character sets and UTF-8 is reduced
  significantly.
                                             Page 9
UTF-16
• �UTF-16 is the 16-bit encoding of
  Unicode
• Basically an extension of UCS-2
• �One Unicode character can be 2 or 4
  bytes in
• �the encoding Characters from
  European and most Asian scripts are
  represented in 2 bytes
• �Supplementary characters are
  represented in 4 bytes
• �UTF-16 is the main Unicode encoding
  from Windows 2K

                                         Page 10
• �Main benefits of UTF-16:
• �More compact storage requirements for
  Asian scripts (2 bytes for commonly used
  characters)
• �Ideal if European and Asian scripts are
  used together
• �UTF-16 will occupy less storage on
  disk and memory than with UTF-8 (3
  bytes for Asian part) Balance of efficient
  access to characters and economical
  use of storage.

                                               Page 11
UTF-32


• �32-Bit encoding
• �Popular when memory space is no
  concern
• �Fixed width (4Byte)




                                     Page 12
Unicode @ the Library



•   �» Display all scripts and characters
•   �» Record data in all languages
•   �» Exchange bibliographic data
•   �» Search in all languages …




                                            Page 13
THANK
   YOU
         Page 14

Contenu connexe

Tendances

Simple Mail Transfer Protocol
Simple Mail Transfer ProtocolSimple Mail Transfer Protocol
Simple Mail Transfer ProtocolUjjayanta Bhaumik
 
Data Link Layer Numericals
Data Link Layer NumericalsData Link Layer Numericals
Data Link Layer NumericalsManisha Keim
 
Overview of character encoding
Overview of character encodingOverview of character encoding
Overview of character encodingDuy Lâm
 
iso osi model
 iso osi model iso osi model
iso osi modelvishnu1204
 
Cs8591 Computer Networks - UNIT V
Cs8591 Computer Networks - UNIT VCs8591 Computer Networks - UNIT V
Cs8591 Computer Networks - UNIT Vpkaviya
 
Cryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipherCryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipherNiloy Biswas
 
block ciphers
block ciphersblock ciphers
block ciphersAsad Ali
 
Error Detection And Correction
Error Detection And CorrectionError Detection And Correction
Error Detection And CorrectionRenu Kewalramani
 
Token, Pattern and Lexeme
Token, Pattern and LexemeToken, Pattern and Lexeme
Token, Pattern and LexemeA. S. M. Shafi
 
RFC and internet standards presentation
RFC and internet standards presentationRFC and internet standards presentation
RFC and internet standards presentationNaveen Jakhar, I.T.S
 
Introduction to Compiler Construction
Introduction to Compiler Construction Introduction to Compiler Construction
Introduction to Compiler Construction Sarmad Ali
 
Application layer protocols
Application layer protocolsApplication layer protocols
Application layer protocolsFabMinds
 
Idea(international data encryption algorithm)
Idea(international data encryption algorithm)Idea(international data encryption algorithm)
Idea(international data encryption algorithm)SAurabh PRajapati
 
Compiler Design
Compiler DesignCompiler Design
Compiler DesignMir Majid
 

Tendances (20)

Simple Mail Transfer Protocol
Simple Mail Transfer ProtocolSimple Mail Transfer Protocol
Simple Mail Transfer Protocol
 
Data Link Layer Numericals
Data Link Layer NumericalsData Link Layer Numericals
Data Link Layer Numericals
 
Overview of character encoding
Overview of character encodingOverview of character encoding
Overview of character encoding
 
iso osi model
 iso osi model iso osi model
iso osi model
 
Compiler Design Unit 1
Compiler Design Unit 1Compiler Design Unit 1
Compiler Design Unit 1
 
Connecting devices
Connecting devicesConnecting devices
Connecting devices
 
Ascii codes
Ascii codesAscii codes
Ascii codes
 
Cs8591 Computer Networks - UNIT V
Cs8591 Computer Networks - UNIT VCs8591 Computer Networks - UNIT V
Cs8591 Computer Networks - UNIT V
 
Cryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipherCryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipher
 
block ciphers
block ciphersblock ciphers
block ciphers
 
Error Detection And Correction
Error Detection And CorrectionError Detection And Correction
Error Detection And Correction
 
Token, Pattern and Lexeme
Token, Pattern and LexemeToken, Pattern and Lexeme
Token, Pattern and Lexeme
 
Registers
RegistersRegisters
Registers
 
RFC and internet standards presentation
RFC and internet standards presentationRFC and internet standards presentation
RFC and internet standards presentation
 
Hamming codes
Hamming codesHamming codes
Hamming codes
 
Ethernet
EthernetEthernet
Ethernet
 
Introduction to Compiler Construction
Introduction to Compiler Construction Introduction to Compiler Construction
Introduction to Compiler Construction
 
Application layer protocols
Application layer protocolsApplication layer protocols
Application layer protocols
 
Idea(international data encryption algorithm)
Idea(international data encryption algorithm)Idea(international data encryption algorithm)
Idea(international data encryption algorithm)
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 

En vedette

En vedette (15)

Ebcdic code 24 1
Ebcdic code 24 1Ebcdic code 24 1
Ebcdic code 24 1
 
Comp codes (ascii...).24to25
Comp codes (ascii...).24to25Comp codes (ascii...).24to25
Comp codes (ascii...).24to25
 
Ascii
AsciiAscii
Ascii
 
ASCII CODE & BAUDOT CODE
ASCII CODE & BAUDOT CODEASCII CODE & BAUDOT CODE
ASCII CODE & BAUDOT CODE
 
Ascii 03
Ascii 03Ascii 03
Ascii 03
 
Ascii
AsciiAscii
Ascii
 
Codes
CodesCodes
Codes
 
04 bits andarithmetic
04 bits andarithmetic04 bits andarithmetic
04 bits andarithmetic
 
Number system
Number systemNumber system
Number system
 
Computer Fundamentals Chapter 11 pcp
Computer Fundamentals Chapter 11 pcpComputer Fundamentals Chapter 11 pcp
Computer Fundamentals Chapter 11 pcp
 
Computer Fundamentals Chapter 12 cl
Computer Fundamentals Chapter 12 clComputer Fundamentals Chapter 12 cl
Computer Fundamentals Chapter 12 cl
 
Conversion of number system
Conversion of number systemConversion of number system
Conversion of number system
 
Computer number systems
Computer number systemsComputer number systems
Computer number systems
 
08. Numeral Systems
08. Numeral Systems08. Numeral Systems
08. Numeral Systems
 
Number system
Number systemNumber system
Number system
 

Similaire à Unicode

Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesMilind Patil
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode formatAdityaSharma1452
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeUlf Mattsson
 
4 character encoding
4 character encoding4 character encoding
4 character encodingirdginfo
 
Xml For Dummies Chapter 6 Adding Character(S) To Xml
Xml For Dummies   Chapter 6 Adding Character(S) To XmlXml For Dummies   Chapter 6 Adding Character(S) To Xml
Xml For Dummies Chapter 6 Adding Character(S) To Xmlphanleson
 
Understanding Character Encodings
Understanding Character EncodingsUnderstanding Character Encodings
Understanding Character EncodingsMobisoft Infotech
 
Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)Feihong Hsu
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6Andrei Zmievski
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Ulf Mattsson
 
Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - ITguest6ddfb98
 
Internationalization and Translatability for Beginners
Internationalization and Translatability for BeginnersInternationalization and Translatability for Beginners
Internationalization and Translatability for BeginnersUltan O'Broin
 
Introduction to computers
Introduction to computersIntroduction to computers
Introduction to computersVisualBee.com
 

Similaire à Unicode (20)

Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode format
 
Notes on a Standard: Unicode
Notes on a Standard: UnicodeNotes on a Standard: Unicode
Notes on a Standard: Unicode
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
Unicode & PHP6
Unicode & PHP6Unicode & PHP6
Unicode & PHP6
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
 
4 character encoding
4 character encoding4 character encoding
4 character encoding
 
Xml For Dummies Chapter 6 Adding Character(S) To Xml
Xml For Dummies   Chapter 6 Adding Character(S) To XmlXml For Dummies   Chapter 6 Adding Character(S) To Xml
Xml For Dummies Chapter 6 Adding Character(S) To Xml
 
Unicode Primer for the Uninitiated
Unicode Primer for the UninitiatedUnicode Primer for the Uninitiated
Unicode Primer for the Uninitiated
 
Understanding Character Encodings
Understanding Character EncodingsUnderstanding Character Encodings
Understanding Character Encodings
 
Io
IoIo
Io
 
Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)
 
Uncdtalk
UncdtalkUncdtalk
Uncdtalk
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...
 
What character is that
What character is thatWhat character is that
What character is that
 
Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - IT
 
Internationalization and Translatability for Beginners
Internationalization and Translatability for BeginnersInternationalization and Translatability for Beginners
Internationalization and Translatability for Beginners
 
Introduction to computers
Introduction to computersIntroduction to computers
Introduction to computers
 

Unicode

  • 2. INTRODUCTION • Computers at their most basic level just deal with numbers. They store letters, numerals and other characters by assigning a number for each one. • �In the pre-Unicode environment, we had single 8-bit characters sets, which limited us to 256 characters max. No single encoding could contain enough characters to cover all the languages. • �so hundreds of different encoding systems were developed for assigning numbers to characters. Page 2
  • 3. Cnt… • As a result, these coding systems conflict with each other. That is, two encodings can use the same number for two different characters or different numbers for the same character. • �Any given computer needs to support many different encodings. • �yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption. Page 3
  • 4. examples of character encoding systems • examples of character encoding systems • Morse code, • Baudot code, • the American Standard Code for Information Interchange (ASCII) • Unicode. Page 4
  • 5. WHAT IS UNICODE ? Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard is a character coding system designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages. Page 5
  • 6. From ASCII to Unicode • �Most character sets and encodings in 70s/80s were modifications or extensions of ASCII • �Most common encodings now a days use single byte per character (SBCS) • �They are all limited to 256 characters • �Due to that, none of them can even cover the letters for the Western European languages Page 6
  • 7. Where is Unicode Used ? • �The Unicode standards has been adopted by many software and hardware vendors. • �Most OSs support Unicode. • �Unicode is required for international document and data interchange, the Internet and the WWW, and therefore by modern standards such as: • �Java, C#, Perl, Python • �Markup languages such as XML, HTML, XHTML, • �JavaScript, LDAP, CORBA etc. Page 7
  • 8. UTF-8 • �UTF-8 is the 8-bit encoding of Unicode • �It’s a variable-width encoding and also a strict superset of ASCII. • �“Strict superset” means that every character in ASCII is available in UTF-8 with the same corresponding code point value • �1 character = 1byte to 4 bytes in the encoding • �Characters from European scripts: either 1or 2 bytes • �Asian scripts: 3 or 4 bytes Page 8
  • 9. • �UTF-8 used for UNIX-platforms, HTML and most Internet Browsers • �Main benefits of UTF-8 • �compact storage requirements for European scripts • �In general European scripts will occupy less storage on disk and memory • �Ease of migration –since 7-bit ASCII data remains the same in UTF-8, data conversion effort between ASCII based character sets and UTF-8 is reduced significantly. Page 9
  • 10. UTF-16 • �UTF-16 is the 16-bit encoding of Unicode • Basically an extension of UCS-2 • �One Unicode character can be 2 or 4 bytes in • �the encoding Characters from European and most Asian scripts are represented in 2 bytes • �Supplementary characters are represented in 4 bytes • �UTF-16 is the main Unicode encoding from Windows 2K Page 10
  • 11. • �Main benefits of UTF-16: • �More compact storage requirements for Asian scripts (2 bytes for commonly used characters) • �Ideal if European and Asian scripts are used together • �UTF-16 will occupy less storage on disk and memory than with UTF-8 (3 bytes for Asian part) Balance of efficient access to characters and economical use of storage. Page 11
  • 12. UTF-32 • �32-Bit encoding • �Popular when memory space is no concern • �Fixed width (4Byte) Page 12
  • 13. Unicode @ the Library • �» Display all scripts and characters • �» Record data in all languages • �» Exchange bibliographic data • �» Search in all languages … Page 13
  • 14. THANK YOU Page 14

Notes de l'éditeur

  1. ANKIT & SUSHEEL