SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
Notes on a Standard:
    UNICODE
     Elena-Oana Tabaranu
  elena.tabaranu@info.uaic.ro
           UAIC, Iasi
Plan
●   Introduction
●   Design Goals
●   Code Points and Characters
●   Encoding Forms, UTF-32, UTF-16, UTF-8
●   Conclusion




                                            2
Introduction
●   UNIversal character enCODing system
●   Unicode = universal character encoding scheme
    for written characters and text
●   Advantages
    ●   Consistent way of encoding multilingual text
    ●   Data stability instead of proliferating character sets
    ●   Encode ALL characters used for the written languages (> 1
        million characters can be encoded)
    ●   Creates a foundation for global software

                                                                    3
Design Principles




                    4
Characters, not Glyphs
●   The Unicode Standard draws a distinction between
    characters and glyphs.
●   Characters are the abstract representations of the
    smallest components of written language that have
    semantic value.




                                                         5
Logical Order
       ●   The order in which
           Unicode text is stored
           in the memory
           representation is
           called logical order
       ●   Unicode Standard
           includes characters to
           explicitly specify
           changes in direction
           when necessary
                                6
Code Points and Characters
●   Abstract characters are
    encoded internally as
    numbers
●   Codespace: 0 to 10FFFF16
    => 1,114,112 code points
    available
●   Abstract character -> code
    point
●   Example:
    U+0061 latin small letter a

                                     7
Encoding Forms
●   Encoding forms specify how
    each code point is to be
    expressed as a sequence of
    one or more code unit (8-bit,
    16-bit, 32-bit units)
●   Encoding forms for Unicode
    characters: UTF-8, UTF-16,
    UTF-32
●   Each form can be efficiently
    transformed into either of the
    other two without any loss of
    data

                                     8
UTF-32
●   The simplest Unicode encoding form
●   Each Unicode code point is represented directly
    by a single 32-bit code unit (fixed-width)
●   restricted to representation of code points in
    the range 0..10FFFF16
●   Example:
    U+10000 is represented as <00010000>
●   preferred encoding form for processing
    characters on most Unix platforms                 9
UTF-16
●   Code unit values often change from the code
    point value => conversion required
●   Variable-width encoding:
    ➢   U+0000..U+FFFF are represented as a single 16-bit
        code unit
    ➢   U+10000..U+10FFFF are represented as pairs of
        16-bit code units (surrogate pairs)
●   Optimized for BMP (Basic Multilingual Plain) =
    majority of common-use characters for all
    modern scripts of the world
                                                        10
UTF-8
●   UTF-8 encodes each character (code point) in 1 to 4
    octets (8-bit bytes), with the single–octet encoding
    used only for the 128 US-ASCII characters
    ●   U+0000 to U+007F → 1 byte
    ●   above → 2, 3, up to 4 bytes
●   Backwards compatible with ASCII
●   Standard for XML (XHTML) documents
●   Example:
    U+10000 is represented as <F0 90 80 80>

                                                           11
Conclusion
●   The Unicode Standard is a superset of all
    characters in widespread use today.
●   It contains characters from major international
    and national standards (e.g. the SGML
    standard) as well as prominient industry
    character sets (e.g. industy code from Apple,
    Adobe, Fujitsu, etc).
●   Responds to changing industry demands by
    encoding important new characters (e.g. the €
    sign )
                                                      12
Questions?
●   Thank You!




                              13

Contenu connexe

Tendances

what is Font in multimedia
 what is Font in multimedia what is Font in multimedia
what is Font in multimediamaliksiddique1
 
Chapter 2 : TEXT
Chapter 2 : TEXTChapter 2 : TEXT
Chapter 2 : TEXTazira96
 
Compiler Design
Compiler DesignCompiler Design
Compiler DesignMir Majid
 
Life cycle of a computer program
Life cycle of a computer programLife cycle of a computer program
Life cycle of a computer programAbhay Kumar
 
Unicode Encoding Forms
Unicode Encoding FormsUnicode Encoding Forms
Unicode Encoding FormsMehdi Hasan
 
Logical programming languages and functional programming languages
Logical programming languages and functional programming languagesLogical programming languages and functional programming languages
Logical programming languages and functional programming languagesnahianzarif
 
Basic Computer Organization and Design
Basic Computer Organization and DesignBasic Computer Organization and Design
Basic Computer Organization and DesignKamal Acharya
 
What is Kernel, basic idea of kernel
What is Kernel, basic idea of kernelWhat is Kernel, basic idea of kernel
What is Kernel, basic idea of kernelNeel Parikh
 
Java IO Package and Streams
Java IO Package and StreamsJava IO Package and Streams
Java IO Package and Streamsbabak danyal
 
File Management in Operating System
File Management in Operating SystemFile Management in Operating System
File Management in Operating SystemJanki Shah
 
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgPractical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgSam Bowne
 
Types of Drivers in JDBC
Types of Drivers in JDBCTypes of Drivers in JDBC
Types of Drivers in JDBCHemant Sharma
 
Validation Controls in asp.net
Validation Controls in asp.netValidation Controls in asp.net
Validation Controls in asp.netDeep Patel
 

Tendances (20)

Monolithic kernel
Monolithic kernelMonolithic kernel
Monolithic kernel
 
Computer programming concepts
Computer programming conceptsComputer programming concepts
Computer programming concepts
 
what is Font in multimedia
 what is Font in multimedia what is Font in multimedia
what is Font in multimedia
 
Chapter 2 : TEXT
Chapter 2 : TEXTChapter 2 : TEXT
Chapter 2 : TEXT
 
Assembler1
Assembler1Assembler1
Assembler1
 
Multimedia chapter 2
Multimedia chapter 2Multimedia chapter 2
Multimedia chapter 2
 
Assemblies
AssembliesAssemblies
Assemblies
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Life cycle of a computer program
Life cycle of a computer programLife cycle of a computer program
Life cycle of a computer program
 
Unicode Encoding Forms
Unicode Encoding FormsUnicode Encoding Forms
Unicode Encoding Forms
 
Logical programming languages and functional programming languages
Logical programming languages and functional programming languagesLogical programming languages and functional programming languages
Logical programming languages and functional programming languages
 
Direct linking loaders
Direct linking loadersDirect linking loaders
Direct linking loaders
 
Basic Computer Organization and Design
Basic Computer Organization and DesignBasic Computer Organization and Design
Basic Computer Organization and Design
 
What is Kernel, basic idea of kernel
What is Kernel, basic idea of kernelWhat is Kernel, basic idea of kernel
What is Kernel, basic idea of kernel
 
Java IO Package and Streams
Java IO Package and StreamsJava IO Package and Streams
Java IO Package and Streams
 
File Management in Operating System
File Management in Operating SystemFile Management in Operating System
File Management in Operating System
 
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgPractical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
 
Types of Drivers in JDBC
Types of Drivers in JDBCTypes of Drivers in JDBC
Types of Drivers in JDBC
 
Compiler design
Compiler designCompiler design
Compiler design
 
Validation Controls in asp.net
Validation Controls in asp.netValidation Controls in asp.net
Validation Controls in asp.net
 

En vedette

Multimedia Presentation
Multimedia PresentationMultimedia Presentation
Multimedia PresentationRajesh R. Nair
 
Lecture # 3
Lecture # 3Lecture # 3
Lecture # 3Mr SMAK
 
Multimedia file formats
Multimedia file formatsMultimedia file formats
Multimedia file formatsShruti Garg
 
Hypertext,hypermedia and multimedia
Hypertext,hypermedia and multimediaHypertext,hypermedia and multimedia
Hypertext,hypermedia and multimediagaflores2
 
Hypertext, hypermedia and multimedia
Hypertext, hypermedia and multimediaHypertext, hypermedia and multimedia
Hypertext, hypermedia and multimediafernandadavalos2566
 
multimedia data and file format
multimedia data and file formatmultimedia data and file format
multimedia data and file formatALOK SAHNI
 
MultiMedia dbms
MultiMedia dbmsMultiMedia dbms
MultiMedia dbmsTech_MX
 
Multimedia data and file format
Multimedia data and file formatMultimedia data and file format
Multimedia data and file formatNiketa Jain
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Karan Panjwani
 
File formats and its types
File formats and its typesFile formats and its types
File formats and its typesAnu Garg
 
Pulse modulation
Pulse modulationPulse modulation
Pulse modulationstk_gpg
 
Text-Elements of multimedia
Text-Elements of multimediaText-Elements of multimedia
Text-Elements of multimediaVanitha Chandru
 

En vedette (20)

Multimedia Presentation
Multimedia PresentationMultimedia Presentation
Multimedia Presentation
 
Pablo 9r multimedia
Pablo 9r multimediaPablo 9r multimedia
Pablo 9r multimedia
 
Unit 4 and 5
Unit 4 and 5Unit 4 and 5
Unit 4 and 5
 
Hypertext: An Overview
Hypertext: An OverviewHypertext: An Overview
Hypertext: An Overview
 
CLI313
CLI313CLI313
CLI313
 
Lecture # 3
Lecture # 3Lecture # 3
Lecture # 3
 
Unicode (and Python)
Unicode (and Python)Unicode (and Python)
Unicode (and Python)
 
Multimedia Technology - text
Multimedia Technology - textMultimedia Technology - text
Multimedia Technology - text
 
Ch04
Ch04Ch04
Ch04
 
Multimedia file formats
Multimedia file formatsMultimedia file formats
Multimedia file formats
 
Hypertext,hypermedia and multimedia
Hypertext,hypermedia and multimediaHypertext,hypermedia and multimedia
Hypertext,hypermedia and multimedia
 
Hypertext, hypermedia and multimedia
Hypertext, hypermedia and multimediaHypertext, hypermedia and multimedia
Hypertext, hypermedia and multimedia
 
multimedia data and file format
multimedia data and file formatmultimedia data and file format
multimedia data and file format
 
MultiMedia dbms
MultiMedia dbmsMultiMedia dbms
MultiMedia dbms
 
Multimedia data and file format
Multimedia data and file formatMultimedia data and file format
Multimedia data and file format
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )
 
File formats and its types
File formats and its typesFile formats and its types
File formats and its types
 
Pulse modulation
Pulse modulationPulse modulation
Pulse modulation
 
File formats
File formatsFile formats
File formats
 
Text-Elements of multimedia
Text-Elements of multimediaText-Elements of multimedia
Text-Elements of multimedia
 

Similaire à Notes on a Standard: Unicode

Unicode Fundamentals
Unicode Fundamentals Unicode Fundamentals
Unicode Fundamentals SamiHsDU
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesMilind Patil
 
Xml For Dummies Chapter 6 Adding Character(S) To Xml
Xml For Dummies   Chapter 6 Adding Character(S) To XmlXml For Dummies   Chapter 6 Adding Character(S) To Xml
Xml For Dummies Chapter 6 Adding Character(S) To Xmlphanleson
 
Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - ITguest6ddfb98
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeUlf Mattsson
 
Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)David Yonge-Mallo
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode formatAdityaSharma1452
 
Lecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.pptLecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.pptAlula Tafere
 
Overview of character encoding
Overview of character encodingOverview of character encoding
Overview of character encodingDuy Lâm
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6Andrei Zmievski
 
Type हिन्दी in Java
Type हिन्दी in JavaType हिन्दी in Java
Type हिन्दी in Javagagmansa
 
Type हिन्दी in Java
Type हिन्दी in JavaType हिन्दी in Java
Type हिन्दी in Javagagmansa
 
Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)Feihong Hsu
 

Similaire à Notes on a Standard: Unicode (20)

Unicode Fundamentals
Unicode Fundamentals Unicode Fundamentals
Unicode Fundamentals
 
Unicode
UnicodeUnicode
Unicode
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
 
Unicode basics in python
Unicode basics in pythonUnicode basics in python
Unicode basics in python
 
Xml For Dummies Chapter 6 Adding Character(S) To Xml
Xml For Dummies   Chapter 6 Adding Character(S) To XmlXml For Dummies   Chapter 6 Adding Character(S) To Xml
Xml For Dummies Chapter 6 Adding Character(S) To Xml
 
Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - IT
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode format
 
Lecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.pptLecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.ppt
 
Overview of character encoding
Overview of character encodingOverview of character encoding
Overview of character encoding
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
 
Type हिन्दी in Java
Type हिन्दी in JavaType हिन्दी in Java
Type हिन्दी in Java
 
Type हिन्दी in Java
Type हिन्दी in JavaType हिन्दी in Java
Type हिन्दी in Java
 
Unicode Primer for the Uninitiated
Unicode Primer for the UninitiatedUnicode Primer for the Uninitiated
Unicode Primer for the Uninitiated
 
Uncdtalk
UncdtalkUncdtalk
Uncdtalk
 
Unicode & PHP6
Unicode & PHP6Unicode & PHP6
Unicode & PHP6
 
Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)
 

Plus de Elena-Oana Tabaranu

Recunoasterea organizatiilor in postarile pe Tweeter
Recunoasterea organizatiilor in postarile pe TweeterRecunoasterea organizatiilor in postarile pe Tweeter
Recunoasterea organizatiilor in postarile pe TweeterElena-Oana Tabaranu
 
SXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBustersSXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBustersElena-Oana Tabaranu
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaElena-Oana Tabaranu
 
Miscarea "NoSQL" in contextul Web-ului social/semantic
Miscarea "NoSQL" in contextul Web-ului social/semanticMiscarea "NoSQL" in contextul Web-ului social/semantic
Miscarea "NoSQL" in contextul Web-ului social/semanticElena-Oana Tabaranu
 
Folosirea instumentului Zemanta in recomandarea de continut
Folosirea instumentului Zemanta in recomandarea de continutFolosirea instumentului Zemanta in recomandarea de continut
Folosirea instumentului Zemanta in recomandarea de continutElena-Oana Tabaranu
 

Plus de Elena-Oana Tabaranu (9)

Recunoasterea organizatiilor in postarile pe Tweeter
Recunoasterea organizatiilor in postarile pe TweeterRecunoasterea organizatiilor in postarile pe Tweeter
Recunoasterea organizatiilor in postarile pe Tweeter
 
SXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBustersSXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBusters
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
 
Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense Disambiguation
 
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
 
Miscarea "NoSQL" in contextul Web-ului social/semantic
Miscarea "NoSQL" in contextul Web-ului social/semanticMiscarea "NoSQL" in contextul Web-ului social/semantic
Miscarea "NoSQL" in contextul Web-ului social/semantic
 
Folosirea instumentului Zemanta in recomandarea de continut
Folosirea instumentului Zemanta in recomandarea de continutFolosirea instumentului Zemanta in recomandarea de continut
Folosirea instumentului Zemanta in recomandarea de continut
 
Adobe Flex Framework
Adobe Flex FrameworkAdobe Flex Framework
Adobe Flex Framework
 

Dernier

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Dernier (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Notes on a Standard: Unicode

  • 1. Notes on a Standard: UNICODE Elena-Oana Tabaranu elena.tabaranu@info.uaic.ro UAIC, Iasi
  • 2. Plan ● Introduction ● Design Goals ● Code Points and Characters ● Encoding Forms, UTF-32, UTF-16, UTF-8 ● Conclusion 2
  • 3. Introduction ● UNIversal character enCODing system ● Unicode = universal character encoding scheme for written characters and text ● Advantages ● Consistent way of encoding multilingual text ● Data stability instead of proliferating character sets ● Encode ALL characters used for the written languages (> 1 million characters can be encoded) ● Creates a foundation for global software 3
  • 5. Characters, not Glyphs ● The Unicode Standard draws a distinction between characters and glyphs. ● Characters are the abstract representations of the smallest components of written language that have semantic value. 5
  • 6. Logical Order ● The order in which Unicode text is stored in the memory representation is called logical order ● Unicode Standard includes characters to explicitly specify changes in direction when necessary 6
  • 7. Code Points and Characters ● Abstract characters are encoded internally as numbers ● Codespace: 0 to 10FFFF16 => 1,114,112 code points available ● Abstract character -> code point ● Example: U+0061 latin small letter a 7
  • 8. Encoding Forms ● Encoding forms specify how each code point is to be expressed as a sequence of one or more code unit (8-bit, 16-bit, 32-bit units) ● Encoding forms for Unicode characters: UTF-8, UTF-16, UTF-32 ● Each form can be efficiently transformed into either of the other two without any loss of data 8
  • 9. UTF-32 ● The simplest Unicode encoding form ● Each Unicode code point is represented directly by a single 32-bit code unit (fixed-width) ● restricted to representation of code points in the range 0..10FFFF16 ● Example: U+10000 is represented as <00010000> ● preferred encoding form for processing characters on most Unix platforms 9
  • 10. UTF-16 ● Code unit values often change from the code point value => conversion required ● Variable-width encoding: ➢ U+0000..U+FFFF are represented as a single 16-bit code unit ➢ U+10000..U+10FFFF are represented as pairs of 16-bit code units (surrogate pairs) ● Optimized for BMP (Basic Multilingual Plain) = majority of common-use characters for all modern scripts of the world 10
  • 11. UTF-8 ● UTF-8 encodes each character (code point) in 1 to 4 octets (8-bit bytes), with the single–octet encoding used only for the 128 US-ASCII characters ● U+0000 to U+007F → 1 byte ● above → 2, 3, up to 4 bytes ● Backwards compatible with ASCII ● Standard for XML (XHTML) documents ● Example: U+10000 is represented as <F0 90 80 80> 11
  • 12. Conclusion ● The Unicode Standard is a superset of all characters in widespread use today. ● It contains characters from major international and national standards (e.g. the SGML standard) as well as prominient industry character sets (e.g. industy code from Apple, Adobe, Fujitsu, etc). ● Responds to changing industry demands by encoding important new characters (e.g. the € sign ) 12
  • 13. Questions? ● Thank You! 13