SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
A	29-Year	Journey	of	Thai	NLP	
MT-ED-OSS-IR-DM-DT
Virach	Sornlertlamvanich
Sirindhorn International	Institute	of	Technology	(SIIT),	Thammasat University
virach@gmail.com
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
14			15
5-10-7-5-2	
RESEARCH
88			89 90			91 92			93 94			95 96			97 98			99 00			01 02			03 04			05 06			07 08			09 10			11 12			13
NEC/CICC
LINKS,	NECTEC
NLP,	Speech,	
Image,	e-Learning,	
OSS
NLP,	AWN,	IR,	OSS
Mobile	Application,	
Digitized	Thailand
RDI,	NECTEC
Machine	Translation
MT,	NLP
TCL,	NICT
IMA,	NECTEC
TPA/SIIT
NLP,	AI,	
Data	Mining,	
Big	Data,
SNS,	Deep	
Learning
TITECH
PGLR
① ② ③ ④ ⑤
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
SNLP
When	an	engineer	developed	a	
grammar	for	the	Thai	language
Font,	Encoding,	Input	method,	POS,	Dictionary,	Verb	pattern,	Grammar,	MT
① NEC/CICC	1988-1992
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Thai	Non-Logical	Order
• Non-logical	order	in	the	representation	of	consonant-vowel	
sequences.	Vowels	that	occur	to	the	left	side	of	their	consonant	are	
represented	in	visual	order	before	the	consonant	in	a	string,	even	
though	they	are	pronounced	afterward.	(Left-positioned	vowel	signs)
• Difficulty	in	Collation	(Sorting),	Grapheme	to	phoneme
Text โปรแกรม
Encoding U+0E42 U+0E1B U+0E23 U+0E41 U+0E01 U+0E23 U+0E21
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Zero-Width	Character	in	Thai
ที่อยู่ Base line
Consonant
Vowel sign (lower)
Vowel sign (upper)
Tone mark
Text ท ที ท่ ที่
Encoding U+0E17 U+0E35 U+0E48 U+0E17 U+0E35 U+0E48
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Store Display
X-TIS620
“อยู่” อ ย ยู ย่
CD C2 D9 E8
อ ย อู่
CD B0 C2 EA
TIS X-TIS
EA = B0 (base) + 38 ( อู ) + 02 ( อ่ )
0 0 0 1 0 0 1 0
0 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
0 0 0 1 1 0 1 1
0 0 1 1 0 1 1 0
0 0 1 0 1 1 0 1
0 0 1 1 1 1 1 1
อ็
อ่
อ้
อ๊
อ๋
อ์
อํ
อั
อิ
อี
อึ
อื
อุ
อู
0 1 0 0 0อฺ
“|อ|ยู่|”
Advantages
- More	than	1,000	code-points	
prepared	for	kerning	and	
rendering
- Internal	encoding	for	terminal	
text	wrapping
- Cursor	positioning
- Base	concept	for	TCC	
(Thai	Character	Cluster:- the	
smallest	unit	of	character	
cluster	according	to	the	spelling	
rules)	
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
TCC	(Thai	Character	Cluster)
• The	smallest	unit	of	character	cluster	according	to	the	spelling	rules.
• To	cluster	Thai	text	into	undividable	units.	Character	cluster	is	defined	
to	be	the	smallest	recognizable	unit.	The	character	string	is	clustered	
for	the	sake	of	avoiding	the	processing	of	invalid	Thai	character	units.
Examples of TCC
Pre-position: เ, แ, ไ, ใ, โ ⊕ C+
Post-position: C+ ⊕ ะ, า
Upper/Lower: ที่, มี, กุ, รู, …
Sound killer: ร์, ดิ์, ตร์, ทธิ์, ถุ์
Compound: เสร็จ, เหลือ, หน่วย
Leading char: หล่น, หนัง, หวะ, ไหล่
Diphthong: ครัว, อ้วน
Character: เ - ป - อ้ - า - ห - ม - า - ย
Cluster (TCC): เป้า - หมา - ย
Word: เป้าหมาย or เป้า - หมาย
Virach	Sornlertlamvanich	and	Tanaka	Hozumi.	The	Automatic	Extraction	of	Open	Compounds	from	Text	Corpora.	
Proceedings	of	the	16th	International	Conference	on	Computational	Linguistics	(COLING-96),	pp.	1143-1146,	Aug	1996.
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Implementation	(1991-)
•X-TIS	620	for	tterm	in	UNIX
•X	bitmap	fonts
•X	Consortium:	Thai	in	X11R6
•Thai	in	UNIX/Linux	applications
• Xfig
• Mule/GNU	Emacs:	SWATH,	LEXiTRON
• Xemacs:	X-TIS
• Mozilla:	LibInThai
• LaTeX:	Babel,	Omega
• National	fonts:	Kinnari,	Garuda,	Norasi
Free	developers
POS	Tagset
• 14	categories	(N,	PRON,	V,	AUX,	
DET,	ADV,	CLAS,	CONJ,	PREP,	INT,	
PREF,	END,	NEG,	PUNC)	and	47	
sub-categories
• VACT,	VSTA,	VATT
• Transitive,	Intransitive
• AUX
• Word	order
• S	vs	NP
• No	diff	in	some	cases
No. POS Description Example
1 NPRP Proper noun วินโดวส์ 95, โคโรน่า, โค้ก, พระอาทิตย์
2 NCNM Cardinal number หนึ่ง, สอง, สาม, 1, 2, 3
3 NONM Ordinal number ที่หนึ่ง, ที่สอง, ที่สาม, ที่1, ที่2, ที่3
4 NLBL Label noun 1, 2, 3, 4, ก, ข, a, b
5 NCMN Common noun หนังสือ, อาหาร, อาคาร, คน
6 NTTL Title noun ดร., พลเอก
7 PPRS Personal pronoun คุณ, เขา, ฉัน
8 PDMN Demonstrative pronoun นี่, นั่น, ที่นั่น, ที่นี่
9 PNTR Interrogative pronoun ใคร, อะไร, อย่างไร
10 PREL Relative pronoun ที่, ซื่ง, อัน, ผู้
11 VACT Active verb ทำงาน, ร้องเพลง, กิน
12 VSTA Stative verb เห็น, รู้, คือ
13 VATT Attributive verb อ้วน, ดี, สวย
14 XVBM Pre-verb auxiliary, before negator “ไม่” เกิด, เกือบ, กำลัง
15 XVAM Pre-verb auxiliary, after negator “ไม่” ค่อย, น่า, ได้
16 XVMM Pre-verb, before or after negator “ไม่” ควร, เคย, ต้อง
17 XVBB Pre-verb auxiliary, in imperative mood กรุณา, จง, เชิญ, อย่า, ห้าม
18 XVAE Post-verb auxiliary ไป, มา, ขึ้น
19 DDAN Definite determiner, after noun without
classifier in between
นี่, นั่น, โน่น, ทั้งหมด
20 DDAC Definite determiner, allowing classifier in
between
นี้, นั้น, โน้น, นู้น
21 DDBQ Definite determiner, between noun and
classifier or preceding quantitative
expression
ทั้ง, อีก, เพียง
22 DDAQ Definite determiner, following quantitative
expression
พอดี, ถ้วน
23 DIAC Indefinite determiner, following noun;
allowing classifier in between
ไหน, อื่น, ต่างๆ
24 DIBQ Indefinite determiner, between noun and
classifier or preceding quantitative
expression
บาง, ประมาณ, เกือบ
25 DIAQ Indefinite determiner, following
quantitative expression
กว่า, เศษ
26 DCNM Determiner, cardinal number expression หนึ่งคน, เสือ 2 ตัว
27 DONM Determiner, ordinal number expression ที่หนึ่ง, ที่สอง, ที่สุดท้าย
28 ADVN Adverb with normal form เก่ง, เร็ว, ช้า, สม่ำเสมอ
29 ADVI Adverb with iterative form เร็วๆ, เสมอๆ, ช้าๆ
30 ADVP Adverb with prefixed form โดยเร็ว
31 ADVS Sentential adverb โดยปกติ, ธรรมดา
32 CNIT Unit classifier ตัว, คน, เล่ม
33 CLTV Collective classifier คู่, กลุ่ม, ฝูง, เชิง, ทาง, ด้าน, แบบ, รุ่น
34 CMTR Measurement classifier กิโลกรัม, แก้ว, ชั่วโมง
35 CFQC Frequency classifier ครั้ง, เที่ยว
36 CVBL Verbal classifier ม้วน, มัด
37 JCRG Coordinating conjunction และ, หรือ, แต่
38 JCMP Comparative conjunction กว่า, เหมือนกับ, เท่ากับ
39 JSBR Subordinating conjunction เพราะว่า, เนื่องจาก, ที่, แม้ว่า, ถ้า
40 RPRE Preposition จาก, ละ, ของ, ใต้, บน
41 INT Interjection โอ้ย,โอ้, เออ, เอ๋, อ๋อ
42 FIXN Nominal prefix การทำงาน, ความสนุกสนาน
43 FIXV Adverbial prefix อย่างเร็ว
44 EAFF Ending for affirmative sentence จ๊ะ, จ้ะ, ค่ะ, ครับ, นะ, น่า, เถอะ
45 EITT Ending for interrogative sentence หรือ, เหรอ, ไหม, มั้ย
46 NEG Negator ไม่, มิได้, ไม่ได้, มิ
47 PUNC Punctuation (, ), “, ,, ;
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Virach	Sornlertlamvanich,	Naoto	Takahashi	and	Hitoshi	Isahara.	
Building	a	Thai	Part-Of-Speech	Tagged	Corpus	(ORCHID).	
The	Journal	of	the	Acoustical	Society	of	Japan	(E),	Vol.20,	No.3,	
pp	189-140,	May	1999.
Multi-lingual	Machine	Translation	Project	(MMT)
1987-1992	(+2)
• 6	years-project	(1987-1992)
• Interlingual approach	MMT	for	
CIJMT
• R&D
− Analysis
− Generation
− Dictionary
− Interlingua
− Integration	system
• Collaboration
− Thailand	(NECTEC,	CU,	KU,	KMUTT,	
KMITL)
− Japan	(NEC,	Fujitsu,	Hitachi,	OKI,	
Sharp,	Mitsubishi,	Toshiba)
− China,	Indonesia,	Malaysia
• 1969	Computerized	Alphabetization	of	
Thai
• 1974	Thai	Transliteration	System
• 1981	ARIANE	Project
− English-Thai	MT
− Ministry	of	University	Affairs	and	Grenoble	
Univ.
• 1986	Establishment	of	NECTEC	
• 1986	TIS620-2529
− Thai	Standard	Character	Code	for	Computer	by	
TISI
• 1987-92	(+2)	NECTEC-CICC	MMT	Project
• 1992-present	Establishment	of	LINKS	at	
NECTEC
− AI	R&D	Center	at	KMITT
− NAiST at	KU
− KIND	at	SIIT
− RDI	at	NECTEC
− SLS	at	CU,	….
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
MMT	Project
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Interlingua	in	MMT
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
NLP	Applications	and	Services
LEXiTRON,	Royal	Thai	Institute	Dictionary,	EZKey,	ParSit,	Sansarn
② LINKS/RD-I,	NECTEC	1993-2003
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
2537												2538												2539												2540													2541													2542												2543												2544							 2545
1994												1995												1996												1997													1998													1999												2000												2001							 2002
②
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
LEXiTRON
• LEXiTRON version	1.1
• Corpus-based	dictionary
• Dictionary	for	writing
• Launched	in	1995
• CD-ROM	for	Windows	3.1	Thai	
Edition
• Thai	11,000	entries
• English	9,000	entries
• 6	types	of	dictionaries
− General	word	entry
− Thai	usage	dictionary	(sample	
sentence)
− Synonym-Antonym
− Thai-English	(equivalent)
− Word	class
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Virach	Sornlertlamvanich,	Apichit Pittayaratsophon and	Kriangchai Chansaenwilai.	
Thai	Dictionary	Data	Base	Manipulation	using	Multi-indexed	Double	Array	Trie.	
The	5th	Annual	Conference,	NECTEC,	Bangkok.	pp.	197-206,	1993.	(in	Thai)
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Thai	Electronic	Dictionary
ORCHID	POS	Tagged	
Corpus
%TTitle: การประชุมทางวิชาการ ครั้งที่ 1
%ETitle: [1st Annual Conference]
%TAuthor:
%EAuthor:
%TInbook: การประชุมทางวิชาการ ครั้งที่ 1, โครงการวิจัยและพัฒนา
อิเล็กทรอนิกส์และคอมพิวเตอร์, ปีงบประมาณ 2531, เล่ม 1
%EInbook: The 1st Annual Conference, Electronics and
Computer Research and Development Project, Fiscal Year
1988, Book 1
%TPublisher: ศูนย์เทคโนโลยีอิเล็กทรอนิกส์และคอมพิวเตอร์
แห่งชาติ, กระทรวงวิทยาศาสตร์ เทคโนโลยีและการพลังงาน
%EPublisher: National Electronics and Computer
Technology Center, Ministry of Science, Technology and
Energy
%Page:
%Year: 1989
%File:
#P1
#1
การประชุมทางวิชาการ ครั้งที่ 1//
การ/FIXNป
ระชุม/VACT
ทาง/NCMN
วิชาการ/NCMN
<space>/PUNC
ครั้ง/CFQC
ที่ 1/DONM//
#2โครงการวิจัยและพัฒนาอิเล็กทรอนิกส์และคอมพิวเตอร์//
โครงการวิจัยและพัฒนา/NCMN
อิเล็กทรอนิกส์/NCMN
และ/JCRG
คอมพิวเตอร์/NCMN//
…
• ORCHID	Corpus	(1997)	supported	
by	CRL	Japan
• Source:	NECTEC	Technical	
Report
• Size:	160	documents;	5.75	MB;	
400K	words
• Tag:	XML	tagged	paragraph,	
sentence,	word,	part-of-
speech
• Availability:	for	research
• Difficulties
• Hard	to	find	consensus	in	the	
sentence	boundary, word	
boundary,	and	POS	tag
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Virach	Sornlertlamvanich,	Thatsanee Charoenporn and	Hitoshi	Isahara.	
ORCHID:	Thai	Part-Of-Speech	Tagged	Corpus.	Technical	Report	Orchid	
TR-NECTEC-1997-001,	NECTEC,	Thailand,	pp.	5-19,	Dec	1997.
Interlingua	English-Thai	MT
Concept	Composition	and	Decomposition
c#amaze
c#news c#i
objectimplement
this
c#cause
c#news
c#i
objectimplement
this
c#amazing
a-object
This news amazes me. ข่าวนี)ทําให้ฉันประหลาดใจ
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
English-Thai Web Translation
http://come.to/parsit
http://www.suparsit.com/
• 51,075 visits/month
•138,748 translation-pages/month
Term	Candidate	Extraction	for	Dictionary-less	
Search	Engine
• Virach	Sornlertlamvanich	et	al.	(COLING	2000)	:
- Automatic	Corpus-Based	Thai	Word	Extraction	with	the	C4.5	Learning	
Algorithm
- C4.5-trained	decision	tree	for	determining	potential	word	boundary	
from	MI,	Entropy	potential	word	boundary	from	MI,	Entropy	and	
some	linguistic	information
- Capable	of	discovering	new	words	in	document	without	assistance	
from	static	dictionary
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Virach	Sornlertlamvanich,	Tanapong Potipiti and	Thatsanee Charoenporn.	
Automatic	Corpus-based	Thai	Word	Extraction	with	the	C4.5	Learning	Algorithm.	
Proceedings	of	the	18th	International	Conference	on	Computational	Linguistics	(COLING2000),	
Saarbrucken,	Germany,	pp	802-807,	July-August	2000.
Attributes(1) : Left	and	Right	Mutual	Information
High	mutual	information	implies	that	xyz co-occurs	more	than	expected	
by	chance.	If	xyz is	a	word,	its MIL and MIR must	be	high.
…efunction…	and	...function...
x yz zxy
where
x is the leftmost character of string xyz
y is the middle substring of xyz
z is the rightmost character of string xyz
p( ) is the probability function.
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Attributes(2) : Left	and	Right	Entropy
Entropy	shows	the	variety	of	characters	before	and	after	a	word.	If y is	
a	word,	its	left	and	right	entropy	must	be	high.
...?function... and ...?unction...
where
x is the leftmost character of string xyz
y is the middle substring of xyz
z is the rightmost character of string xyz
p( ) is the probability function.
x y
y z
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
EZKey
%~
T/E
ฏ
ก
D โ
ด
F ฌ
เ
G
Shift
.of]dp68 computer vtwidh’jkpwxs,f_
ในโลกยุค computer อะไรก็ง่ายไปหมด_
The	Names
• LEXiTRON :-
Lexicon	+	Electron
• ParSit :-
Parse	it
• ORCHID	:-
Orchid	=	Ran	(蘭)
• Sansarn logo	:-
Frog	=	Return	of	happiness
カエルは“福帰る”,	幸運が還ってくる
• LinuxTLE,	OfficeTLE :-
TLE	=	Ta-Le	(Sea	series	Linux	distro)
Thai	Language	Extension
• Vaja :-
Speech
Smart-Q,	EZKey,
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Multi-lingualism
Language	Observatory,	Asian	WordNet
③ TCL,	NICT	2003-2008
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Collaboration	Project
Project
Year
03 04 05 06 07 08 09 10
Asian E-Learning Network (AEN), CICC
Language Observatory Project (LOP), NUT
Intercultural Collaboration Experiments (ICE), KU
Asian Language Resource Network (ALRN), NUT
Asian Language Resources (ALR), NEDO
World Network on Linguistics Diversity (REDILI), UNESCO
Open Standards Promotion, NECTEC, UNDP-APDIP
Asian applied nlp for linguistics Diversity and language
resource Development (ADD)
KuiSci: STKC Research Community for MOST
KuiPoll: Educational Community (BUU, NECTEC)
KuiHerb: Collective Herbal Information (SIL, PSU, NECTEC)
AsianWordNet: WordNet for Asian languages development and
sharing
XPLOG: Experience Log for Local Wisdom Collection
NLP tools and corpora web services
③
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
TCL’s	Computational	Lexicon:	Representativity
Constraint based
a conceptual class referring to the whole of which a given word X is a
partWhole-of (WOF)
a conceptual class specifying a part of a given word XPart-of (POF)
a word having the opposite meaning of a given word XNot-equal (NEQ)
a word having the same meaning as a given word XEqual (EQU)
a conceptual class of a given word XIs-a (ISA)
Value descriptionAttribute
Logical Constraints
Semantic Constraints
a point or period of time when an event occursTime (TIM)
a position or place where an event occursLocation (LOC)
an entity used in the actionInstrument (INS)
an entity affected by the actionObject (OBJ)
an entity initiating the actionAgent (AGT)
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Synset	Assignment	Algorithm	(CS=4)
l Accept	the	Synset that	includes	more	than	
one	English	Equivalent	with	confidence	
score 4.
L0
E0
S0Î
S1
Î
E1
Î
S2
Î
Example:
L0:	เป้าหมาย
E0:	aim
E1:	target
S0:	purpose,	intent,	intention,	aim,	design
S1:	aim,	object,	objective,	target
S2:	aim
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Synset	Assignment	Algorithm	(CS=3)
Example:
L0:	จ้อง
L1:	เพ่งมอง
E0: stare
E1: gaze
S0: stare
S1: gaze,	stare
Synonym
l Accept	the	Synset that	includes	more	than	
one	English	Equivalent	from	the	synonym	
of	the	target	language	with	confidence	
score	3.
L0 E0
S0Î
S1
Î
E1
Î
S2
ÎL1
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Synset	Assignment	Algorithm	(CS=2)
Example:
L0:	สูติแพทย์
E0:	obstetrician
S0:	obstetrician,	accoucheur
l Accept	the	only	Synset that	includes	the	
English	Equivalent	with	confidence	score	2. L0 E0 S0
Î
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Technical	
term
Synset	Assignment	Algorithm	(CS=1)
Example:
L0:	ช่อง
E0:	hole
E1:	canal
S0:	hole,	hollow		
S1:	hole,	trap,	cakehole,	maw,	yap,	gap
S2:	canal,	duct,	epithelial	duct,	channel
l Accept	more	than	one	Synset that	includes	
each	of	the	English	Equivalent	with	
confidence	score	1. L0
E0
S0Î
S1
Î
E1
S2
Î
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Common	
term
KUI
Correction
Voting
Lookup
Translation
Discussion
Addition
WN
GWN
AWN
X-English
X-English
X-English
Thai-English
X-English
X-English
X-English
Indonesian
-English
merged-WN
ML Applications
Dictionary
Ontology
CL-Search
MT
Summarization
IE/IR
….
Asian WordNet
Development
Process
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Asian	WordNet
http://www.asianwordnet.org/ • Asian	WordNet
• Visualization	of	Asian	WordNet
• Function
• Cross	language	visualization
• 3	modes	of	visualization
• Progress	(May	3,	2010)
• Burmese	
(19949	senses,	11006	u.	words)
• Indonesian	
(26175	senses,	24398	u.	words)
• Japanese	
(58447	senses,	64678	u.	words)
• Korean	
(42274	senses,	26009	u.	words)
• Lao	
(38890	senses,	44032	u.	words)
• Mongolian	
(1624	senses,	1574	u.	words)
• Nepali	
(41	senses,	42	u.	words)
• Sinhala	
(268	senses,	119	u.	words)
• Sudanese	
(69	senses,	52	u.	words)
• Thai	
(71139	senses,	69998	u.	words)
• Collaboration
• TCL
• ADD	members
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Digitalization
Linked	Open	Data,	Digitized	Thailand,	Thailand-1-Click
④ NECTEC	2009-2013
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Semantic	Link	Generation
•Semantic	Representation	of	the	description
•Keyword	Extraction
• Extract	keywords	in	text	documents	and	link	them	to	appropriate	
articles
•Semantic	Relation	Extraction
• Extract	commons	syntactic	patterns	between	two	keywords	and	
generalize	them	to	a	triple	(ei ,	rij ,	ej)
• Linked	Data
– Set	of	triple	(ei ,	rij ,	ej)
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Virach	Sornlertlamvanich	and	Canasai Kruengkrai.	
Effectiveness	of	Keyword	and	Semantic	Relation	Extraction	for	Knowledge	Map	Generation	,	
Proceedings	of	The	Second	International	Workshop	on	Worldwide	
Language	Service	Infrastructure	(WLSI),	Kyoto	University,	Kyoto,	Japan,	January	22-23,	2015.
Types	of	Semantic	Relation
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
description title
tag
Knowledge	Map
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Infobox
Knowledge	map
ISBUILTIN(พระเจดีย์กลางนํ)า, พ.ศ.2403)
ISLOCATEDAT(พระเจดีย์กลางนํ)า, ตําบลปากนํ)า)
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Infobox
Knowledge	map
Creator
Making
Product
Shop
Semantically	Enhanced
Cultural	Database
[Place,	Person,	Artifact]
Knowledging
Digital	Content	Technology
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Projects	in	Digitized	Thailand,	2009
• DT	PaaS on	the	Cloud
• Digitized	Thailand
(http://www.digitized-thailand.org/)
• Digitized	Lanna	
(http://www.digitized-lanna.com/)
• Digitized	Isan	
(http://www.digitized-isan.com/)
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Digitized	Thailand:	The	Ultimate	Goal
• DT	is	a	framework	for	collaboration	in	technology	and	content	
development
• DT	is	a	platform	for	digital	content	sharing
• Toward	creative	economy,	DT	PaaS will	be	established
Data,	Data,	Data
NLP,	Big	Data,	Deep	Learning,	Social	Computing,	IoT,	AI
⑤ SIIT	2014-…
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
NLP	Challenges
• Internet,	Big	Data,	Machine	Learning,	Deep	Learning have	brought	
along	the	possibilities.
Facebook:-
Adds	0.5	petabyte	(1015)	of	data	every	24	
hours
Twitter:-
Adds	340	million	tweets	per	day
Youtube:-
Adds	100	hours	of	new	videos	every	
minute
Germin8,	Social	Intelligence
The	Evolution	of	Communication
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
NLP	Challenges
Data	Community	DC	(DC2)
Bird	Steven,	Edward	Loper and	Ewan	Klein	(2009),	Natural	Language	Processing	with	Python.	O’Reilly	Media	Inc.
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
Data	Data	Data!!!
• Drastically	increase	number	of	users	on	social	network
• Keywords	in	the	contents	express	
the	concepts	of	the	talk
• Social	media	texts	are	input	
in	a	time	sequence	
• But,	social	media	texts	
are	normally	short,	incomplete	
and	diverse
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
28	August	2017,	ISAI-NLP	2017,	Hua	Hin,	Thailand
NLP,	Big	Data,	Deep	Learning,	Social	Computing,	IoT,	AI

Contenu connexe

Plus de Thammasat University, Musashino University

Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...
Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...
Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...Thammasat University, Musashino University
 

Plus de Thammasat University, Musashino University (15)

When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!
 
Shaping our AI (Strategy)?
Shaping our AI (Strategy)?Shaping our AI (Strategy)?
Shaping our AI (Strategy)?
 
Siit digital-20171011
Siit digital-20171011Siit digital-20171011
Siit digital-20171011
 
How to make Thailand 4.0!!!
How to make Thailand 4.0!!!How to make Thailand 4.0!!!
How to make Thailand 4.0!!!
 
AI, Big Data, IoT
AI, Big Data, IoTAI, Big Data, IoT
AI, Big Data, IoT
 
Traps and Opportunities in Digital Era
Traps and Opportunities in Digital EraTraps and Opportunities in Digital Era
Traps and Opportunities in Digital Era
 
Creative Thinking
Creative ThinkingCreative Thinking
Creative Thinking
 
Global innovation-tj20151211
Global innovation-tj20151211Global innovation-tj20151211
Global innovation-tj20151211
 
Management of japanese company virach
Management of japanese company virachManagement of japanese company virach
Management of japanese company virach
 
RUN Digital Cluster 2017
RUN Digital Cluster 2017RUN Digital Cluster 2017
RUN Digital Cluster 2017
 
Paradigm Shift in Research and Education
Paradigm Shift in Research and EducationParadigm Shift in Research and Education
Paradigm Shift in Research and Education
 
Trendy Technology and Social Media for EGAT Executive
Trendy Technology and Social Media for EGAT ExecutiveTrendy Technology and Social Media for EGAT Executive
Trendy Technology and Social Media for EGAT Executive
 
Challenges of Thailand behind Thai industry
Challenges of Thailand behind Thai industryChallenges of Thailand behind Thai industry
Challenges of Thailand behind Thai industry
 
Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...
Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...
Digital Economy based on Open Data and Open Access Approach โอเพนซอร์ส ภายใต้...
 
Digital Economy, Digital Tourism based on Open Data and Open Access Approach
Digital Economy, Digital Tourism based on Open Data and Open Access ApproachDigital Economy, Digital Tourism based on Open Data and Open Access Approach
Digital Economy, Digital Tourism based on Open Data and Open Access Approach
 

Dernier

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

A 29-Year Journey of Thai NLP