This presentation was shared on PAS Digital Marketing Conference "Dig-It 2.0"
Session name: Urdu Internet - Leveraging Technologies
Presentation: Computing support for Pakistani Languages, Challenges & Practices
Speaker: Dr. Sarmad Hussain, Professor and Head, Center for Language Engineering, University of Engineering and Technology, Pakistan
"Computing support for Pakistani Languages, Challenges & Practices" by Dr. Sarmad Hussain
1. Computing Support for Pakistani
Languages – Challenges and Practice
Unlocking Information for Human Development
www.CLE.org.pk
Sarmad Hussain
Center for Language Engineering
Al-Khawarizmi Institute of Computer Science
University of Engineering and Technology
Lahore
sarmad@cantab.net
www.cle.org.pk
1
2. Need
ICTs promise significant socio-economic impact
Impact dependent on size of population which can use ICTs
180 Million citizens need access
66+ languages
10% understand English
58% literate
11% have access to computers
70% have access to mobile phones
ITU IDI: Pakistan ranked 127 of 155 nations
Human Language Technology necessary to bridge the gap
www.cle.org.pk
2
3. Languages of Pakistan
Urdu
Punjabi Sindhi
Pushto Balochi Saraiki Others (60)
Total
7.57
44.15
14.1
15.42
3.57
10.53
4.66
Rural
1.48
42.51
16.46
18.06
3.99
12.97
4.53
Urban
20.22
47.56
9.20
9.94
2.69
5.46
4.93
Percent Population of
Pakistan by
Mother Tongue
www.cle.org.pk
3
4. Languages of Pakistan
Sociocultural
Economic
Urdu
Punjabi Sindhi
Pushto Balochi Saraiki Others (60)
Total
7.57
44.15
14.1
15.42
3.57
10.53
4.66
Rural
1.48
42.51
16.46
18.06
3.99
12.97
4.53
Urban
20.22
47.56
9.20
9.94
2.69
5.46
4.93
Percent Population of
Pakistan by
Mother Tongue
www.cle.org.pk
4
5. Sociocultural
Languages of Pakistan
Economic
Urdu
Punjabi Sindhi
Pushto Balochi Saraiki Others (60)
Total
7.57
44.15
14.1
15.42
3.57
10.53
4.66
Rural
1.48
42.51
16.46
18.06
3.99
12.97
4.53
Urban
20.22
47.56
9.20
9.94
2.69
5.46
4.93
Percent Population of
Pakistan by
Mother Tongue
Languages of Pakistan
in Danger (UNESCO)
Vulnerable
definitely endangered
www.cle.org.pk
severely endangered
5
6. How?
Human Language
Technology Linguistic Research
Standards
Applications
Materials
Training
Adoption
USE
Relevant Content Access
Relevant Content Generation
www.cle.org.pk
6
7. Human Language Technology –
Bridging Barriers
•
•
•
•
Interfacing
Assisting
Enabling
Empowering
www.cle.org.pk
7
8. Interfacing
Language
– Character Set
• Input Methods
• Writing
• Collation
Standards
– National
– International
– Terminology Translation
• ISO 639
• ISO 3166
• ISO 10646/Unicode
Technology
– Applications
– Platforms: Computers and Phones
• Fonts
• Linux/Unix and Symbian
• Keyboards, Keypads and
Other Input Methods
• Microsoft Windows and Phone
• Collation Methods
• iOS – iPAD, iPhone, Macbook, …
• Localized Platform
• Google – Gmail, Docs, …Android
www.cle.org.pk
8
16. Empowering
• ICT for ICT - Focused on infrastructure
• ICT for Development - Focused on content and applications
• ICT for Human Development - Focused on participatory process
www.cle.org.pk
16
18. LANGUAGE AND ICT TRAINING
100%
Preference for Urdu
80%
Preference for English
100
60%
80
20%
0%
Before Training
Software
Percent Teachers
40%
Preference for Urdu
Preference for English
60
40
After Training
Before Training
20
After Training
Training Material
0
Before Training
After Training
Software
www.cle.org.pk
Before Training
After Training
Training Material
18
19. LANGUAGE AND ICT TRAINING
Icon Identification by Students
Urdu
Icons
SubTotal
Total
F
M
English
Transliterat Didn't
English
ed into Recognize
Urdu
F
M
330
16%
M
F
691 656 132 198 150
1347
4%
F
M
SubTotal
183
49
40
2099
333
www.cle.org.pk
89
16%
64%
2099
19
20. ACCESSING INFO ONLINE
Language Used
Students
Female
Male
Total
Urdu
English
44
45
89
2
2
4
Total
46
47
93
Language Preference
for Searching on the Internet
Preferred Language
for Setting a Homepage
Participant
English
Urdu
Students
0
138
Teachers
5
13
Total
5
151
www.cle.org.pk
20
22. [1]
One school did not participate, and one school website was disqualified as the team took significant external assistance.
LANGUAGE FOR CONTENT
DEVELOPMENT
Website Competition Category
Language of Website
Urdu
English
Total
School Website (by 10 School Teacher Teams)
9
1
10
Local Village Website (by 10 School Student
Teams)
8
0
8
Open Category (Individual Students)
38
0
38
Total
55
1
56
www.cle.org.pk
22
24. Development Process of
Human Language Technology
Select
Language
Linguistic Data
Collection
Core Linguistic
Analysis and
Definition
Publishing
Language
Computing
Standards
Development
of Localization
Utilities
Detailed
Linguistic
Analysis
Publishing
Data
Annotations
Schema
Annotation of
Linguistic Data
Development
of Linguistic
Utilities
Publishing
Annotated
Linguistic
Resources
Localization of
Existing
Applications
Development
of Advanced
HLT
Application
Extension of
Localization
Applications
24
25. Status of Human Language
Technology
URDU
Linguistic Data
Collection
Core Linguistic
Analysis and
Definition
Publishing
Language
Computing
Standards
Development
of Localization
Utilities
Detailed
Linguistic
Analysis
Publishing
Data
Annotations
Schema
Annotation of
Linguistic Data
Development
of Linguistic
Utilities
Publishing
Annotated
Linguistic
Resources
Development
of Advanced
HLT
Application
Localization of
Existing
Applications
Extension of
Localization
Applications
Reasonable
Support
Some
Support
Minimal
Support
25
26. Status of Human Language
Technology
SINDHI
Linguistic Data
Collection
Core Linguistic
Analysis and
Definition
Publishing
Language
Computing
Standards
Development
of Localization
Utilities
Detailed
Linguistic
Analysis
Publishing
Data
Annotations
Schema
Annotation of
Linguistic Data
Development
of Linguistic
Utilities
Publishing
Annotated
Linguistic
Resources
Development
of Advanced
HLT
Application
Localization of
Existing
Applications
Extension of
Localization
Applications
Reasonable
Support
Some
Support
Minimal
Support
26
27. Status of Human Language
Technology
PUSHTO
Linguistic Data
Collection
Core Linguistic
Analysis and
Definition
Publishing
Language
Computing
Standards
Development
of Localization
Utilities
Detailed
Linguistic
Analysis
Publishing
Data
Annotations
Schema
Annotation of
Linguistic Data
Development
of Linguistic
Utilities
Publishing
Annotated
Linguistic
Resources
Development
of Advanced
HLT
Application
Localization of
Existing
Applications
Extension of
Localization
Applications
Reasonable
Support
Some
Support
Minimal
Support
27
28. Status of Human Language
Technology
PUNJABI
Linguistic Data
Collection
Core Linguistic
Analysis and
Definition
Publishing
Language
Computing
Standards
Development
of Localization
Utilities
Detailed
Linguistic
Analysis
Publishing
Data
Annotations
Schema
Annotation of
Linguistic Data
Development
of Linguistic
Utilities
Publishing
Annotated
Linguistic
Resources
Development
of Advanced
HLT
Application
Localization of
Existing
Applications
Extension of
Localization
Applications
Reasonable
Support
Some
Support
Minimal
Support
28
29. Status of Human Language
Technology
BALOCHI
Linguistic Data
Collection
Core Linguistic
Analysis and
Definition
Publishing
Language
Computing
Standards
Development
of Localization
Utilities
Detailed
Linguistic
Analysis
Publishing
Data
Annotations
Schema
Annotation of
Linguistic Data
Development
of Linguistic
Utilities
Publishing
Annotated
Linguistic
Resources
Development
of Advanced
HLT
Application
Localization of
Existing
Applications
Extension of
Localization
Applications
Reasonable
Support
Some
Support
Minimal
Support
29
30. Status of Human Language
Technology
SARAIKI
Linguistic Data
Collection
Core Linguistic
Analysis and
Definition
Publishing
Language
Computing
Standards
Development
of Localization
Utilities
Detailed
Linguistic
Analysis
Publishing
Data
Annotations
Schema
Annotation of
Linguistic Data
Development
of Linguistic
Utilities
Publishing
Annotated
Linguistic
Resources
Development
of Advanced
HLT
Application
Localization of
Existing
Applications
Extension of
Localization
Applications
Reasonable
Support
Some
Support
Minimal
Support
30
31. Status of Human Language
Technology
OTHERS
Linguistic Data
Collection
Core Linguistic
Analysis and
Definition
Publishing
Language
Computing
Standards
Development
of Localization
Utilities
Detailed
Linguistic
Analysis
Publishing
Data
Annotations
Schema
Annotation of
Linguistic Data
Development
of Linguistic
Utilities
Publishing
Annotated
Linguistic
Resources
Development
of Advanced
HLT
Application
Localization of
Existing
Applications
Extension of
Localization
Applications
Reasonable
Support
Some
Support
Minimal
Support
31