SlideShare a Scribd company logo
1 of 13
Overview of Character Encoding Duy Lam – Dec 2010
Agenda Character Encoding Unicode Encoding problem 2
Character encoding
Definition Character Encoding (character set - charset, character map or code page) is a system to specify: Set of codes (natural numbers or electrical pulses) that represents for characters How to persist characters (such as “hello”) onto disk as a sequence of bytes 4
Common Encodings 5
Unicode
Life was perfect 7 a = 01100001  ấ = ???????? ä = ???????? a ?ä
Unicode Unicode is a computing industry standard to map every known character to a number (code point) Unicode is one character set that can be encoded several different ways. Common Unicode encoding methods (Unicode Transformation Format and Universal Character Set): UTF-8 (one to four bytes): maximized compatibility with ASCII UTF-16 (UCS-2): variable-width encoding (one or two 16-bit code unit) UTF-32 (UCS-4): fixed-width encoding 8
Unicode mapping table Unicode charts 9
Encoding problem
Application Missing understanding 11 UTF-16 encoding UTF-8 encoding UTF-8 encoding
Demo
End

More Related Content

What's hot

Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compilerSudhaa Ravi
 
Microprocessor chapter 9 - assembly language programming
Microprocessor  chapter 9 - assembly language programmingMicroprocessor  chapter 9 - assembly language programming
Microprocessor chapter 9 - assembly language programmingWondeson Emeye
 
2_2Specification of Tokens.ppt
2_2Specification of Tokens.ppt2_2Specification of Tokens.ppt
2_2Specification of Tokens.pptRatnakar Mikkili
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compilerIffat Anjum
 
Unicode Encoding Forms
Unicode Encoding FormsUnicode Encoding Forms
Unicode Encoding FormsMehdi Hasan
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture NotesFellowBuddy.com
 
Assembler design options
Assembler design optionsAssembler design options
Assembler design optionsMohd Arif
 
Data encoding and modulation
Data encoding and modulationData encoding and modulation
Data encoding and modulationShankar Gangaju
 
Top down parsering and bottom up parsering.pptx
Top down parsering and bottom up parsering.pptxTop down parsering and bottom up parsering.pptx
Top down parsering and bottom up parsering.pptxLaibaFaisal3
 
IPSec (Internet Protocol Security) - PART 1
IPSec (Internet Protocol Security) - PART 1IPSec (Internet Protocol Security) - PART 1
IPSec (Internet Protocol Security) - PART 1Shobhit Sharma
 
Different types of Symmetric key Cryptography
Different types of Symmetric key CryptographyDifferent types of Symmetric key Cryptography
Different types of Symmetric key Cryptographysubhradeep mitra
 
Error control, parity check, check sum, vrc
Error control, parity check, check sum, vrcError control, parity check, check sum, vrc
Error control, parity check, check sum, vrcHuawei Technologies
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler DesignKuppusamy P
 

What's hot (20)

Structure of the compiler
Structure of the compilerStructure of the compiler
Structure of the compiler
 
IP Sec - Basic Concepts
IP Sec - Basic ConceptsIP Sec - Basic Concepts
IP Sec - Basic Concepts
 
Microprocessor chapter 9 - assembly language programming
Microprocessor  chapter 9 - assembly language programmingMicroprocessor  chapter 9 - assembly language programming
Microprocessor chapter 9 - assembly language programming
 
2_2Specification of Tokens.ppt
2_2Specification of Tokens.ppt2_2Specification of Tokens.ppt
2_2Specification of Tokens.ppt
 
File Storage
File StorageFile Storage
File Storage
 
Lecture 01 introduction to compiler
Lecture 01 introduction to compilerLecture 01 introduction to compiler
Lecture 01 introduction to compiler
 
Unicode Encoding Forms
Unicode Encoding FormsUnicode Encoding Forms
Unicode Encoding Forms
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture Notes
 
Transposition Cipher
Transposition CipherTransposition Cipher
Transposition Cipher
 
File structures
File structuresFile structures
File structures
 
Parsing LL(1), SLR, LR(1)
Parsing LL(1), SLR, LR(1)Parsing LL(1), SLR, LR(1)
Parsing LL(1), SLR, LR(1)
 
Assembler design options
Assembler design optionsAssembler design options
Assembler design options
 
Code generation
Code generationCode generation
Code generation
 
Data encoding and modulation
Data encoding and modulationData encoding and modulation
Data encoding and modulation
 
Top down parsering and bottom up parsering.pptx
Top down parsering and bottom up parsering.pptxTop down parsering and bottom up parsering.pptx
Top down parsering and bottom up parsering.pptx
 
IPSec (Internet Protocol Security) - PART 1
IPSec (Internet Protocol Security) - PART 1IPSec (Internet Protocol Security) - PART 1
IPSec (Internet Protocol Security) - PART 1
 
Different types of Symmetric key Cryptography
Different types of Symmetric key CryptographyDifferent types of Symmetric key Cryptography
Different types of Symmetric key Cryptography
 
Phases of a Compiler
Phases of a CompilerPhases of a Compiler
Phases of a Compiler
 
Error control, parity check, check sum, vrc
Error control, parity check, check sum, vrcError control, parity check, check sum, vrc
Error control, parity check, check sum, vrc
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 

Viewers also liked

Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014Duy Lâm
 
KMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScriptKMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScriptDuy Lâm
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web ServicesDuy Lâm
 
Refactoring group 1 - chapter 3,4,6
Refactoring   group 1 - chapter 3,4,6Refactoring   group 1 - chapter 3,4,6
Refactoring group 1 - chapter 3,4,6Duy Lâm
 
Advantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureAdvantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureDuy Lâm
 

Viewers also liked (6)

Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
Building Single-page Web Applications with AngularJS @ TechCamp Sai Gon 2014
 
Mocha
Mocha Mocha
Mocha
 
KMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScriptKMS TechCon 2014 - Interesting in JavaScript
KMS TechCon 2014 - Interesting in JavaScript
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
 
Refactoring group 1 - chapter 3,4,6
Refactoring   group 1 - chapter 3,4,6Refactoring   group 1 - chapter 3,4,6
Refactoring group 1 - chapter 3,4,6
 
Advantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architectureAdvantages of Cassandra's masterless architecture
Advantages of Cassandra's masterless architecture
 

Similar to Overview of character encoding

Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - ITguest6ddfb98
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeUlf Mattsson
 
Camomile : A Unicode library for OCaml
Camomile : A Unicode library for OCamlCamomile : A Unicode library for OCaml
Camomile : A Unicode library for OCamlYamagata Yoriyuki
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesMilind Patil
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode formatAdityaSharma1452
 
Data Representation in Computers
Data Representation in ComputersData Representation in Computers
Data Representation in ComputersCBAKhan
 
Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9Dimelo R&D Team
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Ulf Mattsson
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character setsrenchenyu
 
Computer Systems Organization
Computer Systems OrganizationComputer Systems Organization
Computer Systems OrganizationLiEdo
 
Unicode - Hacking The International Character System
Unicode - Hacking The International Character SystemUnicode - Hacking The International Character System
Unicode - Hacking The International Character SystemWebsecurify
 
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...InSync2011
 

Similar to Overview of character encoding (20)

Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - IT
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
 
Camomile : A Unicode library for OCaml
Camomile : A Unicode library for OCamlCamomile : A Unicode library for OCaml
Camomile : A Unicode library for OCaml
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode format
 
Data Representation in Computers
Data Representation in ComputersData Representation in Computers
Data Representation in Computers
 
Notes on a Standard: Unicode
Notes on a Standard: UnicodeNotes on a Standard: Unicode
Notes on a Standard: Unicode
 
Character Sets
Character SetsCharacter Sets
Character Sets
 
Uncdtalk
UncdtalkUncdtalk
Uncdtalk
 
Journey of Bsdconv
Journey of BsdconvJourney of Bsdconv
Journey of Bsdconv
 
Unicode basics in python
Unicode basics in pythonUnicode basics in python
Unicode basics in python
 
Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9Encodings - Ruby 1.8 and Ruby 1.9
Encodings - Ruby 1.8 and Ruby 1.9
 
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...
 
Unicode and character sets
Unicode and character setsUnicode and character sets
Unicode and character sets
 
Io
IoIo
Io
 
Computer Systems Organization
Computer Systems OrganizationComputer Systems Organization
Computer Systems Organization
 
Unicode - Hacking The International Character System
Unicode - Hacking The International Character SystemUnicode - Hacking The International Character System
Unicode - Hacking The International Character System
 
Unit iv
Unit ivUnit iv
Unit iv
 
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
E-Business Suite 1 _ Jim Pang _ The anatomy of multiple language support (MLS...
 
Unicode 101
Unicode 101Unicode 101
Unicode 101
 

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Overview of character encoding

Editor's Notes

  1. ASCII encoding, which specifies how to store English characters in a single byte each (taking up the space in 0-127, leaving 128-255 empty)Microsoft code page is used in pre-Windows NT systems (Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. NT was the first fully 32-bit version of Windows, (Windows 3.1x and Windows 9x, were 16-bit/32-bit hybrids). Windows 2000, Windows XP, Windows Server 2003, Windows Vista, Windows Home Server, Windows Server 2008 and Windows 7 are based on Windows NT, although they are not branded as Windows NT)Reference:http://www.nadcomm.com/fiveunit/fiveunits.htm
  2. In the beginning of computer age, ASCII covers everything you would find on an English keyboard: letters in upper and lower case, numbers, and some common symbols. There was even some room left in the 128 character ASCII mapping for some control character sequences. But the entire world can't quite get by on just these characters. Need a encoding system to help encode characters in languages There are many characters in different existing Chinese, Japanese and Korean (CJK) character sets actually represent the same character. Need an effort to identify them
  3. A character has to be stored in computer as some number. Unicode tries to unify characters from different encodings that represent the same character. For instance, the A in ASCII, the A in ISO-8859-1, and the A in the Japanese encoding SHIFT-JIS all map to the same Unicode character.A character set and a character encoding aren't necessarily the same thing. Unicode is one character set, and has multiple character encodingsThe UTF-8 is most efficient for Strings containing mostly ASCII characters (inWestern countries). UTF-8 and UTF-16 are approximately equivalent for Strings containing mostly characters outside ASCII but inside the the BMP (characters for almost all modern languages, and a large number of special characters). For Strings containing mostly characters outside the BMP, UTF-8, UTF-16, and UTF-32 are approximately equivalent.
  4. Go to Start Menu > Accessories > System Tools > Character MapsUse this tool show characters and their number in Unicode charts