SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
打造面向金融場景的中文自然語言理解引擎
數據研究發展中心
陳皓遠
About me
• Member of AI group, CTBC Data R&D Center
• Past experience on
• Cyber security and defense industry
• Smartphone industry
• Familiar with
• Machine learning
• Natural language processing
• Software development
• Cloud native architecture design
Team
• CTBC Data R&D Center AI group is founded in 2018
• AI group is composed of data scientists and software developers
• Our mission is to realize AI-based solution in banking scenario
• We currently focus on
• Computer Vision (CV)
• Natural Language Processing (NLP)
Retrieved from https://www.ithome.com.tw/news/131697
Achievement
NLP
• Pluto: A Deep Learning based Watchdog for
Anti Money Laundering
• First Vertical AI paradigm in RegTech
field in CTBC globally
• Daily reduce 67% human effort on
adverse media screening
• Publication
• https://www.aclweb.org/anthology/W19-5515
CV
• NIST Face Recognition Verification Test (FRVT)
• Rank 35th globally
• Rank 2nd in Taiwan industry
• X-ATM for fraud avoidance
名次 企業名稱 國家 FRR
10 Sensetine(商湯) 中國 0.0092
18 Face++(曠視) 中國 0.0145
26 CyberLink (訊連) 台灣 0.0195
29 Tencent Deepsea (騰訊) 中國 0.0215
35 CTBC BANK (中國信託) 台灣 0.0250
39 Gorilla Technology(大猩猩) 台灣 0.0291
55 Kneron Inc. (耐能) 台灣 0.0902
Outline
• Background
• Proposed Solution
• Evaluation
• Prototype
• Conclusion
Digitalized channel plays an important role
遠見雜誌 - 2018數位⾦融⼒調查
Retrieved from https://www.gvm.com.tw/article.html?id=54981
Abundant Platform for Conversational Assistants
messaging platform
Google Home Amazon Echo
• A task-oriented dialogue system
• Chat in natural language
• Be realized on Amazon Alexa
Eno, your Capital One dialogue assistant
Motivation
• Realize a task-oriented dialogue system on heterogeneous conversational platforms
in Mandarin to serve customers facing banking scenario
Prerequisite
• A natural language understanding
(NLU)
• intent recognition (IR)
• named entity recognition (NER)
NLU
IR NER
美元定存六個月期的利率是多少
• Intent
• 查詢利率
• Entity
• 幣別:美元
• 帳戶類型:定存
• 期數:六個月
Outline
• Background
• Proposed Solution
• Evaluation
• Prototype
• Conclusion
Key Components in NLU
• Deep Neural Networks (DNN)
• Conditional Random Field (CRF)
• Recurrent Neural Network (RNN)
Preprocessing
Tokenizer POS tagger
Modeling Modeling
Embeddings
Supervised learning method
vectorization
• Intent Recognizer
• Classification problem
• Named Entity Extractor
• Sequence labeling problem
Approach
Data Preparation
• Intent dataset
• 1016 samples over 3 distinct classes
• 試算匯兌, 查詢存款利率, 查詢台外幣餘額
• Named entity dataset
• 977 samples over 6 distinct entities
• amount, money, duration, currency, acnt_type, timestamp
Great
acknowledgment
for
數位金融處
and
個金數位營運處
Intent Classification Techniques
• Preprocessing
• Tokenization (ckiptagger)
• Feature extraction
• Bag of Word (scikit-learn)
Vocabulary
[ “現在”, “台幣”,”美金”, “日圓”,“一
年期”, “定存”,“是”, “多少”]
現在美金一年期定存是多少
Text
現在 美金 一年期 定存 是 多少
Tokens
• Model
• Deep Neural Network
(DNN) (tensorflow)
[ 1 , 0 , 1 , 0 , 1 , 1 ]
Feature vector
Word Count encodingFeature engineering
Model Training
Named Entity Recognition Techniques
• Preprocessing
• Tokenization (ckiptagger)
• POS tagging (ckiptagger)
• Feature extraction
• Text and POS tags
within context
Model I : CRF for Word-Level Feature
現在美金一年期定存是多少
Text
現在(Nd) 美金(Na) 一年期(Na) 定存(Na) 是(SHI) 多少(Neqa)
Tokens
…, ( -1:現在, -1:Nd, 0:美金, 0:Na, 1:一年期, 1:NA ), …
Feature vector
Context windows: 3 tokens
• Model
• Conditional Random Field
(CRF) (scikit-learn)
Feature engineering
Model Training
Named Entity Recognition Techniques
• Preprocessing
• Tokenization (ckiptagger)
Model II : Bi-LSTM-CRF for Word-Level Embedding
現在美金一年期定存是多少
Text
現在 美金 一年期 定存 是 多少
Tokens
• Model
• Embedding Layer (keras)
• Long Short-Term Memory
(LSTM) layer (keras)
• CRF layer (keras)
Embedding learning
Features learning
Model training
Outline
• Background
• Proposed Solution
• Evaluation
• Prototype
• Conclusion
Evaluation
Methodology
Metrics
Precision Recall F1-Score
Confusion Matrix
實際 Yes 實際 No
預測 Yes True Positive (TP) False Positive (FP)
預測 No False Negative (FN) True Negative (TN)
Reference: https://en.wikipedia.org/wiki/Confusion_matrix
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
Evaluation
Precision and Recall
Intent classification
0.91
0.98
0.97
0.94
0.95
0.96
0.93
0.96 0.96
0.88
0.90
0.92
0.94
0.96
0.98
1.00
查詢台外幣餘額 查詢存款利率 試算匯兌
Precision Recall F1-Score
Evaluation
Precision
Named Entity Recognition
0.79
0.75
0.85
0.74
0.55
0.90
0.98
0.93
0.80
0.89
0.81
0.96
0.00
0.20
0.40
0.60
0.80
1.00
1.20
幣別 期數 時間點 帳戶類型 錢 ⾦額
CRF BiLSTM+CRF
Evaluation
Recall
Named Entity Recognition
0.82
0.55
0.78
0.67
0.52
0.940.95
0.67
0.79 0.80
0.89
0.72
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
幣別 期數 時間點 帳戶類型 錢 ⾦額
CRF BiLSTM+CRF
Evaluation
F1-Score
Named Entity Recognition
0.81
0.64
0.82
0.68
0.52
0.92
0.97
0.71 0.72
0.84
0.88
0.82
0.00
0.20
0.40
0.60
0.80
1.00
1.20
幣別 期數 時間點 帳戶類型 錢 ⾦額
CRF BiLSTM+CRF
Outline
• Background
• Proposed Solution
• Evaluation
• Prototype
• Conclusion
Prototype
Conversational AI with Rasa framework: https://github.com/RasaHQ/rasa
NLU
Prototype
Why Rasa ?
Extendible Architecture Open sourceOwn Our Data
• Preserve privacy
• Do not hand data over
to big tech company
• Transparency
• Community support
• Task-oriented dialogue
architecture
• Customizable
components
Rasa characteristics
CTBC strategy
• Customize Mandarin-
based component
• Integration on core
technology
• Compliance on Security and Regulation
• Customized scenario
• Ownership on core technology
Prototype
• Intent recognition
• CKIP Tokenizer (customized)
• EmbeddingIntentClassifier (built-in)
• Named Entity Recognition
• CKIP Tokenizer (customized)
• Bi-LSTM-CRF for Word-Level Embedding
(customized)
Prototype
Demo
Outline
• Background
• Proposed Solution
• Evaluation
• Prototype
• Conclusion
Conclusion
• NLU is a key module in task-oriented dialogue systems
• Intent recognizer and entity extractor are key components to realize NLU by machine
learning techniques and annotated data
• DNN performs generally better than traditional method but not for all tasks
• Rasa powered by open source offers a framework for conversational assistant
development from scratch
Summary
Conclusion
• Transfer learning based on pre-trained word embeddings initialization
• Word-based embeddings vs. char-based embeddings
• Model engineering
What’s next
Q&A

Contenu connexe

Tendances

高位合成ツールVivado hlsのopen cv対応
高位合成ツールVivado hlsのopen cv対応高位合成ツールVivado hlsのopen cv対応
高位合成ツールVivado hlsのopen cv対応marsee101
 
UDEC発表原稿-Udon概論-
UDEC発表原稿-Udon概論-UDEC発表原稿-Udon概論-
UDEC発表原稿-Udon概論-ContrastBar
 
DApps のユーザ認証に web3.eth.personal.sign を使おう!
DApps のユーザ認証に web3.eth.personal.sign を使おう!DApps のユーザ認証に web3.eth.personal.sign を使おう!
DApps のユーザ認証に web3.eth.personal.sign を使おう!Drecom Co., Ltd.
 
Practical REPL-driven Development with Clojure
Practical REPL-driven Development with ClojurePractical REPL-driven Development with Clojure
Practical REPL-driven Development with ClojureKent Ohashi
 
ヘッドレスCMS調査 Strapiを試してみた
ヘッドレスCMS調査 Strapiを試してみたヘッドレスCMS調査 Strapiを試してみた
ヘッドレスCMS調査 Strapiを試してみたSosukeYamada
 
Supabase Edge Functions と Netlify Edge Functions を使ってみる – 機能とその比較 –
Supabase Edge Functions と Netlify Edge Functions を使ってみる – 機能とその比較 –Supabase Edge Functions と Netlify Edge Functions を使ってみる – 機能とその比較 –
Supabase Edge Functions と Netlify Edge Functions を使ってみる – 機能とその比較 –虎の穴 開発室
 
Asyncifying WebAssembly for the modern Web
Asyncifying WebAssembly for the modern WebAsyncifying WebAssembly for the modern Web
Asyncifying WebAssembly for the modern WebIngvar Stepanyan
 
03 第3.6節-第3.8節 ROS2の基本機能(2/2)
03 第3.6節-第3.8節 ROS2の基本機能(2/2)03 第3.6節-第3.8節 ROS2の基本機能(2/2)
03 第3.6節-第3.8節 ROS2の基本機能(2/2)Mori Ken
 
プロダクト開発してわかったDjangoの深〜いパーミッション管理の話 @ PyconJP2017
プロダクト開発してわかったDjangoの深〜いパーミッション管理の話 @ PyconJP2017プロダクト開発してわかったDjangoの深〜いパーミッション管理の話 @ PyconJP2017
プロダクト開発してわかったDjangoの深〜いパーミッション管理の話 @ PyconJP2017hirokiky
 
Jiraの紹介(redmineとの比較視点にて)
Jiraの紹介(redmineとの比較視点にて)Jiraの紹介(redmineとの比較視点にて)
Jiraの紹介(redmineとの比較視点にて)Hiroshi Ohnuki
 
Devsumi 2021 MLOps for Self-driving car
Devsumi 2021 MLOps for Self-driving carDevsumi 2021 MLOps for Self-driving car
Devsumi 2021 MLOps for Self-driving caryusuke shibui
 
本格的に始めるzsh
本格的に始めるzsh本格的に始めるzsh
本格的に始めるzshHideaki Miyake
 
ROS JAPAN Users Group Meetup 03
ROS JAPAN Users Group Meetup 03ROS JAPAN Users Group Meetup 03
ROS JAPAN Users Group Meetup 03Daiki Maekawa
 
基幹システム RDRAモデルサンプル
基幹システム RDRAモデルサンプル基幹システム RDRAモデルサンプル
基幹システム RDRAモデルサンプルZenji Kanzaki
 
RailsでReact.jsを動かしてみた話
RailsでReact.jsを動かしてみた話RailsでReact.jsを動かしてみた話
RailsでReact.jsを動かしてみた話yoshioka_cb
 
みんなのPython勉強会#77 パッケージングしよう
みんなのPython勉強会#77 パッケージングしようみんなのPython勉強会#77 パッケージングしよう
みんなのPython勉強会#77 パッケージングしようAtsushi Odagiri
 

Tendances (20)

高位合成ツールVivado hlsのopen cv対応
高位合成ツールVivado hlsのopen cv対応高位合成ツールVivado hlsのopen cv対応
高位合成ツールVivado hlsのopen cv対応
 
UDEC発表原稿-Udon概論-
UDEC発表原稿-Udon概論-UDEC発表原稿-Udon概論-
UDEC発表原稿-Udon概論-
 
DApps のユーザ認証に web3.eth.personal.sign を使おう!
DApps のユーザ認証に web3.eth.personal.sign を使おう!DApps のユーザ認証に web3.eth.personal.sign を使おう!
DApps のユーザ認証に web3.eth.personal.sign を使おう!
 
Practical REPL-driven Development with Clojure
Practical REPL-driven Development with ClojurePractical REPL-driven Development with Clojure
Practical REPL-driven Development with Clojure
 
ヘッドレスCMS調査 Strapiを試してみた
ヘッドレスCMS調査 Strapiを試してみたヘッドレスCMS調査 Strapiを試してみた
ヘッドレスCMS調査 Strapiを試してみた
 
Supabase Edge Functions と Netlify Edge Functions を使ってみる – 機能とその比較 –
Supabase Edge Functions と Netlify Edge Functions を使ってみる – 機能とその比較 –Supabase Edge Functions と Netlify Edge Functions を使ってみる – 機能とその比較 –
Supabase Edge Functions と Netlify Edge Functions を使ってみる – 機能とその比較 –
 
Asyncifying WebAssembly for the modern Web
Asyncifying WebAssembly for the modern WebAsyncifying WebAssembly for the modern Web
Asyncifying WebAssembly for the modern Web
 
世界最強のソフトウェアアーキテクト
世界最強のソフトウェアアーキテクト世界最強のソフトウェアアーキテクト
世界最強のソフトウェアアーキテクト
 
03 第3.6節-第3.8節 ROS2の基本機能(2/2)
03 第3.6節-第3.8節 ROS2の基本機能(2/2)03 第3.6節-第3.8節 ROS2の基本機能(2/2)
03 第3.6節-第3.8節 ROS2の基本機能(2/2)
 
Learning from Mistakes
Learning from MistakesLearning from Mistakes
Learning from Mistakes
 
プロダクト開発してわかったDjangoの深〜いパーミッション管理の話 @ PyconJP2017
プロダクト開発してわかったDjangoの深〜いパーミッション管理の話 @ PyconJP2017プロダクト開発してわかったDjangoの深〜いパーミッション管理の話 @ PyconJP2017
プロダクト開発してわかったDjangoの深〜いパーミッション管理の話 @ PyconJP2017
 
Jiraの紹介(redmineとの比較視点にて)
Jiraの紹介(redmineとの比較視点にて)Jiraの紹介(redmineとの比較視点にて)
Jiraの紹介(redmineとの比較視点にて)
 
Devsumi 2021 MLOps for Self-driving car
Devsumi 2021 MLOps for Self-driving carDevsumi 2021 MLOps for Self-driving car
Devsumi 2021 MLOps for Self-driving car
 
本格的に始めるzsh
本格的に始めるzsh本格的に始めるzsh
本格的に始めるzsh
 
ROS JAPAN Users Group Meetup 03
ROS JAPAN Users Group Meetup 03ROS JAPAN Users Group Meetup 03
ROS JAPAN Users Group Meetup 03
 
基幹システム RDRAモデルサンプル
基幹システム RDRAモデルサンプル基幹システム RDRAモデルサンプル
基幹システム RDRAモデルサンプル
 
RailsでReact.jsを動かしてみた話
RailsでReact.jsを動かしてみた話RailsでReact.jsを動かしてみた話
RailsでReact.jsを動かしてみた話
 
レガシーコードに向き合ってみた話
レガシーコードに向き合ってみた話レガシーコードに向き合ってみた話
レガシーコードに向き合ってみた話
 
みんなのPython勉強会#77 パッケージングしよう
みんなのPython勉強会#77 パッケージングしようみんなのPython勉強会#77 パッケージングしよう
みんなのPython勉強会#77 パッケージングしよう
 
Ops meets NoOps
Ops meets NoOpsOps meets NoOps
Ops meets NoOps
 

Similaire à 打造面向金融場景的中文自然語言理解引擎

Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001Vijay Desai
 
Machine learning techniques in fraud prevention
Machine learning techniques in fraud preventionMachine learning techniques in fraud prevention
Machine learning techniques in fraud preventionVolodymyr Syzonenko
 
[Qraft] asset allocation with deep learning hyojunmoon
[Qraft] asset allocation with deep learning hyojunmoon[Qraft] asset allocation with deep learning hyojunmoon
[Qraft] asset allocation with deep learning hyojunmoon형식 김
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detectionMk Kim
 
Towards a Practice of Token Engineering
Towards a Practice of Token EngineeringTowards a Practice of Token Engineering
Towards a Practice of Token EngineeringTrent McConaghy
 
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...Edge AI and Vision Alliance
 
Technical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvertTechnical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvertISSA LA
 
AI/ML Week: Support Fraud Analytics & Risk Management
AI/ML Week: Support Fraud Analytics & Risk ManagementAI/ML Week: Support Fraud Analytics & Risk Management
AI/ML Week: Support Fraud Analytics & Risk ManagementAmazon Web Services
 
ScreenIT October 2012
ScreenIT October 2012ScreenIT October 2012
ScreenIT October 2012snapstreak
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Provectus
 
SANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for DefenseSANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for DefenseJohn Bambenek
 
Nitin Resume Java
Nitin Resume JavaNitin Resume Java
Nitin Resume JavaNitin Gupta
 
Data mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageData mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageq-Maxim
 
DOUGS GOOD PRESENTATION
DOUGS GOOD PRESENTATIONDOUGS GOOD PRESENTATION
DOUGS GOOD PRESENTATIONDoug Rosen
 
Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...
 Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ... Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...
Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...Databricks
 
Automated cheque recognition
Automated cheque recognitionAutomated cheque recognition
Automated cheque recognitioninfo_jojo
 

Similaire à 打造面向金融場景的中文自然語言理解引擎 (20)

Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001
 
Machine learning techniques in fraud prevention
Machine learning techniques in fraud preventionMachine learning techniques in fraud prevention
Machine learning techniques in fraud prevention
 
[Qraft] asset allocation with deep learning hyojunmoon
[Qraft] asset allocation with deep learning hyojunmoon[Qraft] asset allocation with deep learning hyojunmoon
[Qraft] asset allocation with deep learning hyojunmoon
 
01-pengantar.pdf
01-pengantar.pdf01-pengantar.pdf
01-pengantar.pdf
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
 
Towards a Practice of Token Engineering
Towards a Practice of Token EngineeringTowards a Practice of Token Engineering
Towards a Practice of Token Engineering
 
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
 
Technical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvertTechnical track chris calvert-1 30 pm-issa conference-calvert
Technical track chris calvert-1 30 pm-issa conference-calvert
 
Big databigideasit4bc
Big databigideasit4bcBig databigideasit4bc
Big databigideasit4bc
 
AI/ML Week: Support Fraud Analytics & Risk Management
AI/ML Week: Support Fraud Analytics & Risk ManagementAI/ML Week: Support Fraud Analytics & Risk Management
AI/ML Week: Support Fraud Analytics & Risk Management
 
ScreenIT October 2012
ScreenIT October 2012ScreenIT October 2012
ScreenIT October 2012
 
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
Data Summer Conf 2018, “How we build Computer vision as a service (ENG)” — Ro...
 
SANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for DefenseSANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
SANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
 
Nitin Resume Java
Nitin Resume JavaNitin Resume Java
Nitin Resume Java
 
Data mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageData mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid language
 
Sexy defense
Sexy defenseSexy defense
Sexy defense
 
DOUGS GOOD PRESENTATION
DOUGS GOOD PRESENTATIONDOUGS GOOD PRESENTATION
DOUGS GOOD PRESENTATION
 
"Navigate the MDR Marketplace Like a Pro!"
 "Navigate the MDR Marketplace Like a Pro!" "Navigate the MDR Marketplace Like a Pro!"
"Navigate the MDR Marketplace Like a Pro!"
 
Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...
 Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ... Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...
Deep Learning-Based Opinion Mining for Bitcoin Price Prediction with Joyesh ...
 
Automated cheque recognition
Automated cheque recognitionAutomated cheque recognition
Automated cheque recognition
 

Dernier

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Dernier (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

打造面向金融場景的中文自然語言理解引擎

  • 2. About me • Member of AI group, CTBC Data R&D Center • Past experience on • Cyber security and defense industry • Smartphone industry • Familiar with • Machine learning • Natural language processing • Software development • Cloud native architecture design
  • 3. Team • CTBC Data R&D Center AI group is founded in 2018 • AI group is composed of data scientists and software developers • Our mission is to realize AI-based solution in banking scenario • We currently focus on • Computer Vision (CV) • Natural Language Processing (NLP) Retrieved from https://www.ithome.com.tw/news/131697
  • 4. Achievement NLP • Pluto: A Deep Learning based Watchdog for Anti Money Laundering • First Vertical AI paradigm in RegTech field in CTBC globally • Daily reduce 67% human effort on adverse media screening • Publication • https://www.aclweb.org/anthology/W19-5515 CV • NIST Face Recognition Verification Test (FRVT) • Rank 35th globally • Rank 2nd in Taiwan industry • X-ATM for fraud avoidance 名次 企業名稱 國家 FRR 10 Sensetine(商湯) 中國 0.0092 18 Face++(曠視) 中國 0.0145 26 CyberLink (訊連) 台灣 0.0195 29 Tencent Deepsea (騰訊) 中國 0.0215 35 CTBC BANK (中國信託) 台灣 0.0250 39 Gorilla Technology(大猩猩) 台灣 0.0291 55 Kneron Inc. (耐能) 台灣 0.0902
  • 5. Outline • Background • Proposed Solution • Evaluation • Prototype • Conclusion
  • 6. Digitalized channel plays an important role 遠見雜誌 - 2018數位⾦融⼒調查 Retrieved from https://www.gvm.com.tw/article.html?id=54981
  • 7. Abundant Platform for Conversational Assistants messaging platform Google Home Amazon Echo
  • 8. • A task-oriented dialogue system • Chat in natural language • Be realized on Amazon Alexa Eno, your Capital One dialogue assistant
  • 9. Motivation • Realize a task-oriented dialogue system on heterogeneous conversational platforms in Mandarin to serve customers facing banking scenario Prerequisite • A natural language understanding (NLU) • intent recognition (IR) • named entity recognition (NER) NLU IR NER 美元定存六個月期的利率是多少 • Intent • 查詢利率 • Entity • 幣別:美元 • 帳戶類型:定存 • 期數:六個月
  • 10. Outline • Background • Proposed Solution • Evaluation • Prototype • Conclusion
  • 11. Key Components in NLU • Deep Neural Networks (DNN) • Conditional Random Field (CRF) • Recurrent Neural Network (RNN) Preprocessing Tokenizer POS tagger Modeling Modeling Embeddings Supervised learning method vectorization • Intent Recognizer • Classification problem • Named Entity Extractor • Sequence labeling problem Approach
  • 12. Data Preparation • Intent dataset • 1016 samples over 3 distinct classes • 試算匯兌, 查詢存款利率, 查詢台外幣餘額 • Named entity dataset • 977 samples over 6 distinct entities • amount, money, duration, currency, acnt_type, timestamp Great acknowledgment for 數位金融處 and 個金數位營運處
  • 13. Intent Classification Techniques • Preprocessing • Tokenization (ckiptagger) • Feature extraction • Bag of Word (scikit-learn) Vocabulary [ “現在”, “台幣”,”美金”, “日圓”,“一 年期”, “定存”,“是”, “多少”] 現在美金一年期定存是多少 Text 現在 美金 一年期 定存 是 多少 Tokens • Model • Deep Neural Network (DNN) (tensorflow) [ 1 , 0 , 1 , 0 , 1 , 1 ] Feature vector Word Count encodingFeature engineering Model Training
  • 14. Named Entity Recognition Techniques • Preprocessing • Tokenization (ckiptagger) • POS tagging (ckiptagger) • Feature extraction • Text and POS tags within context Model I : CRF for Word-Level Feature 現在美金一年期定存是多少 Text 現在(Nd) 美金(Na) 一年期(Na) 定存(Na) 是(SHI) 多少(Neqa) Tokens …, ( -1:現在, -1:Nd, 0:美金, 0:Na, 1:一年期, 1:NA ), … Feature vector Context windows: 3 tokens • Model • Conditional Random Field (CRF) (scikit-learn) Feature engineering Model Training
  • 15. Named Entity Recognition Techniques • Preprocessing • Tokenization (ckiptagger) Model II : Bi-LSTM-CRF for Word-Level Embedding 現在美金一年期定存是多少 Text 現在 美金 一年期 定存 是 多少 Tokens • Model • Embedding Layer (keras) • Long Short-Term Memory (LSTM) layer (keras) • CRF layer (keras) Embedding learning Features learning Model training
  • 16. Outline • Background • Proposed Solution • Evaluation • Prototype • Conclusion
  • 17. Evaluation Methodology Metrics Precision Recall F1-Score Confusion Matrix 實際 Yes 實際 No 預測 Yes True Positive (TP) False Positive (FP) 預測 No False Negative (FN) True Negative (TN) Reference: https://en.wikipedia.org/wiki/Confusion_matrix 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
  • 18. Evaluation Precision and Recall Intent classification 0.91 0.98 0.97 0.94 0.95 0.96 0.93 0.96 0.96 0.88 0.90 0.92 0.94 0.96 0.98 1.00 查詢台外幣餘額 查詢存款利率 試算匯兌 Precision Recall F1-Score
  • 20. Evaluation Recall Named Entity Recognition 0.82 0.55 0.78 0.67 0.52 0.940.95 0.67 0.79 0.80 0.89 0.72 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 幣別 期數 時間點 帳戶類型 錢 ⾦額 CRF BiLSTM+CRF
  • 21. Evaluation F1-Score Named Entity Recognition 0.81 0.64 0.82 0.68 0.52 0.92 0.97 0.71 0.72 0.84 0.88 0.82 0.00 0.20 0.40 0.60 0.80 1.00 1.20 幣別 期數 時間點 帳戶類型 錢 ⾦額 CRF BiLSTM+CRF
  • 22. Outline • Background • Proposed Solution • Evaluation • Prototype • Conclusion
  • 23. Prototype Conversational AI with Rasa framework: https://github.com/RasaHQ/rasa NLU
  • 24. Prototype Why Rasa ? Extendible Architecture Open sourceOwn Our Data • Preserve privacy • Do not hand data over to big tech company • Transparency • Community support • Task-oriented dialogue architecture • Customizable components Rasa characteristics CTBC strategy • Customize Mandarin- based component • Integration on core technology • Compliance on Security and Regulation • Customized scenario • Ownership on core technology
  • 25. Prototype • Intent recognition • CKIP Tokenizer (customized) • EmbeddingIntentClassifier (built-in) • Named Entity Recognition • CKIP Tokenizer (customized) • Bi-LSTM-CRF for Word-Level Embedding (customized)
  • 27. Outline • Background • Proposed Solution • Evaluation • Prototype • Conclusion
  • 28. Conclusion • NLU is a key module in task-oriented dialogue systems • Intent recognizer and entity extractor are key components to realize NLU by machine learning techniques and annotated data • DNN performs generally better than traditional method but not for all tasks • Rasa powered by open source offers a framework for conversational assistant development from scratch Summary
  • 29. Conclusion • Transfer learning based on pre-trained word embeddings initialization • Word-based embeddings vs. char-based embeddings • Model engineering What’s next
  • 30. Q&A