SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
Bitnine
Dec. 2018
Meetup
TextMiner
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
텍스트 마이닝 성공 사례
2
Quiz
만두 시장에서 압도적 1등으로 자리매김한 이 제품은 무엇일까요?
● 국내 대기업 C그룹은 최근 3년 동안, 한국인의 만두 소비와
관련하여 각종 SNS의 글 약 42억만 건을 조사 함
● 조사 결과, ‘만두와 맥주 안주’를 키워드로 언급한 글이
년간 35,000건 → 49,000건 → 73,000건으로 크게 증가함
● 기타 긍정적인 키워드는 ‘조리하기 쉬움’, ‘간편함’, ‘만두 육즙이
맥주와 잘 어울림’ 등이 있었음
● C그룹은 맥주 안주로 만두 수요가 많이 증가한 것이라 판단 함
● C그룹은 ‘맥주 안주 마케팅’을 통해, 만두의 매출을 늘렸고,
해당 시장에서 압도적인 1등으로 자리매김 함
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
TextMiner
1. Overview
2. Theory
3. Practice
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
I. Overview
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
What is Mining?
5
Overview
원석을 채굴하고, 연마 및 가공하여, 가치있는 보석으로 만드는 과정
Gemstone Mining DiamondPolishing
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
What is Text Mining?
6
Overview
대량 문서 데이터를 가공 및 분석하여, 유용한 패턴 및 인사이트를 발견하는 과정 (cf. The Oxford English Dictionary)
Insight
Text Mining KnowledgeAnalysis
Source Data
Mining
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
다양한 비정형 데이터 발생
Why use Text Mining?
7
Overview
고객과 사용자의 목소리가 주로 비정형 데이터로 생산되며, 이를 분석해야 더 나은 가치를 제공할 수 있음
인터넷 사용자의 폭발적인 증가 업무 생산의 기본은 문서화
인터넷에서 60초 동안 일어나는 일 직장인의 업무 유형별 수행 비율
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
II. Theory
ⓒ 2017 by Bitnine Co, Ltd. All Rights Reserved.
AgensGraph
AgensBrowser
Discovery(발견)___________
9
Offering - TextMiner
비트나인은 AgensGraph 기반의 5가지 핵심 오퍼링 중, TextMiner를 아래와 같이 정의하고 상세화 함
Visualization, monitoring
and management
of complex enterprise
Analyzing massive
documents and discover
insights from unstructured
data
Proactive pattern detection
from network behaviors for
threat analysis and crime
investigation
Distributed ledger, transaction
and user monitoring for
operational excellence of
private Blockchain
Supporting for user’s
decision for specific domains
by subjective probability
algorithm
AssetManager TextMiner PatternDetector ChainKeeper DecisionTutor
Value
Offering
Definition
Graph
Theory
Target
Industry
• Manufacturing
• Utility & Telco
• Government
• Education
• Government
• Banking, Card
• Banking, card
• Logistics
• Education
• Retail
Node Edge Graph Node Edge & Treemapping Graph Node Edge Graph Node Edge Graph Node Edge Graph
Motif extraction Motif extraction
Similarity Similarity Similarity
Statistical Method Statistical Method Statistical Method
Community Detection Community Detection Community Detection
Centrality Centrality Centrality
MST MST MST
Shortest Path Shortest Path
Text Mining
Time Series Time Series
Bayesian Bayesian
Presentation(표현) Inference(추론)
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 10
TextMiner - Process
TextMiner는 대상 데이터를 추출하여 구조 데이터로 가공하고, 이를 통해 의미있는 패턴을 분석하는 프로세스로
구성 됨
Data KnowledgeText Mining UtilizationGraph Analysis
TextMiner Process
1 2 3
● Data Targeting
● Text Extract
● Pre-processing
● Visualization
● Application
● Graph Modelling
● Graph Analysis
● Evaluation
Structured
Data
Graph
Pattern
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 11
TextMiner - Text Mining
Text Mining을 하는 과정은 분석 목적을 설정하고, 해당 목적에 따라 데이터를 추출하여 전처리하는 과정임
UtilizationText Mining Graph Analysis
Data Targeting
● 분석 목적 및 목표 설정
(특정 분야 인사이트 발견,
신제품 고객 반응 등)
Data Extract
● 다양한 비정형 문서
● 특정 데이터베이스
● SNS/블로그 포스팅
● 기타
Pre-Processing
● 토큰화(Tokenization)
● 정제(Cleaning)
● 품사 태깅(PoS Tagging)
● 기타
?
Tweepy
Confluent Kafka
KoNLPy
MECAB aka 은전한닢
KOSAC(Sentiment Analysis Corpus)
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
ㅋㅋㅋ. ^^
구문 분석(Syntactic Parsing)
품사 태깅(PoS Tagging)
개념 추출
(Concept
Extraction)
나이
정제(Cleaning)
토큰화(Tokenization)
12
TextMiner - Pre-processing
텍스트를 전처리하는 과정은 분석 목적에 따라, 다양한 기법을 적용하여 말뭉치를 생성하는 것임
말뭉치(Corpus)
Structured Data
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 13
TextMiner - Pre-processing
데이터 과학자들은 텍스트 마이닝 과정 중, 데이터를 수집하고, 정제하는데 80% 시간을 할애 함
Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 14
TextMiner - Scope of Text Mining
Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications
G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012
Text Mining
Text Mining은
텍스트로부터 의미있는 가치를 찾기
위한 목적으로,
다양한 분야로부터 포괄적으로
연구되어 졌음
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
Statistics
15
TextMiner - Scope of Text Mining
Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications
G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012
Text Mining
Information
Extraction
비정형 텍스트로부터 관련 사실과 관계의 식별과 추출;
비정형/반정형 텍스트로 정형 데이터를 만드는 작업
Information Extraction; IE
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
Statistics
Library and
Information
Sciences
16
TextMiner - Scope of Text Mining
Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications
G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012
Text Mining
Information
Extraction
Information
Retrieval
검색 엔진 및 키워드 검색을 포함한 텍스트 문서의
저장 및 검색
Information Retrieval; IR
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
Statistics
Library and
Information
Sciences
17
TextMiner - Scope of Text Mining
Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications
G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012
Computational
Linguistics
Text Mining
Concept
Extraction
단어와 구에 대한 의미 부여 및 유사 그룹으로 그룹화
Concept Extraction; CE
사람 이름 연도
과학이론학위
“홍길동 박사가 2018년에 텍스트마이닝
관련 서적을 출간했다.”
책/동의어 출판/동의어
책 출판
Information
Extraction
Information
Retrieval
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
Statistics
Library and
Information
Sciences
18
TextMiner - Scope of Text Mining
Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications
G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012
Data Mining
Computational
Linguistics
Text Mining
Concept
Extraction
Document
Classification
조각, 단락 또는 문서를 그룹화하고 분류
Document Classification
B
C
D
E
A
A
B
C
D
E
Data Categories
Information
Extraction
Information
Retrieval
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
Statistics
Library and
Information
Sciences
19
TextMiner - Scope of Text Mining
Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications
G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012
Data Mining
Computational
Linguistics
Text Mining
Concept
Extraction
Document
Clustering
Document
Classification
용어, 발췌 문장, 단락 또는 문서를 그룹화하고 범주화
Document Clustering
Information
Extraction
Information
Retrieval
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
Statistics
Library and
Information
Sciences
20
TextMiner - Scope of Text Mining
Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications
G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012
Data Mining
Computational
Linguistics
Text Mining
Concept
Extraction
Document
Clustering
Document
Classification
웹의 상호 연관성에 초점을 맞춘 인터넷 데이터 및
텍스트 마이닝
Web
Mining
Web Mining
웹 로그, 하이퍼링크, 사용자 활동, 웹 콘텐츠
Information
Extraction
Information
Retrieval
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
Statistics
Library and
Information
Sciences
21
TextMiner - Scope of Text Mining
Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications
G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012
Data Mining
Computational
Linguistics
AI and
Machine Learning
Databases
Text Mining
Information
Extraction
Natural
Language
Processing
Concept
ExtractionInformation
Retrieval
Document
Clustering
Document
Classification
자연 언어를 문단,문장,단어 분류, 품사 표기 등 처리
Natural Language Processing; NLP
Web
Mining
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 22
TextMiner - Scope of Text Mining
아래 6가지 질문을 통해, 솔루션을 찾을 수 있으며,
이를 통해 대부분의 텍스트마이닝 문제를 해결할 수
있음
Text Mining Foundations
1) SNS 상에서 이번에 출시한 신제품에 대한
고객 반응을 알고 싶다.
→ 단어(해쉬태그, 본문)의 의미분석 필요
→ CE(Sentiment Anaysis) / NLP
2) 관심 분야의 논문의 참조 관계 및 연구자
활동을 비교하여 분야별 영향력을 알고
싶다.
→ 문서(논문)의 연구자와 참조문헌의 연결
및 연관관계 분석이 필요
→ Web Mining / Clustering / Classification
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 23
TextMiner - Graph Analysis & Utilization
정제된 말뭉치를 통해, 분석 목적에 맞게 그래프 모델링 후, 그래프 분석을 통해 시각화 하는 과정 임
Graph Modelling Graph Analysis Visualization
● 정제된 말뭉치를 통해, 분석
목적에 맞게 그래프 모델링
● Vertex는 문서, 단어 등
● Edge는 관계, 구문 등
● Property는 의미, 품사 등
● Page Rank / Text Rank
● Similarity / Clustering /
Classification
● Community Detection
● 기타
● Graph Pattern
● Word Topologies
● Word Cloud
● Infographic
● 기타
UtilizationText Mining Graph Analysis
Web Application,
Jupyter, etc.
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 24
TextMiner - Basic Modelling
그래프 분석을 위한 기본적인 그래프 모델은 아래와 같음
Document
or
Word
● Meaning
● PoS Tag
● Sentiment
Relation / Syntax
Vertex
Edge
Property
동
해
물
과
백
두
산
이
마
르
고
닳
도
록
● 동쪽바다
● 명사
● 닳다
● 동사
● 부정
● 하다
● 보조동사
하
느
님
보
우
하
사
우
리
나
라
만
세
● 조사
Basic Model
M
o
r
p
h
e
m
e
S
e
n
t
e
n
c
e
S y n t e x
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
그래프 DB는 텍스트 마이닝한 모든 데이터를 하나의 그래프로 표현하고 유연하게 활용할 수 있으며, 연산량이
적어 빠르고 간단함
25
TextMiner - 예제. 문서의 핵심문장 찾기
Sentence - Sentence
Matrix
Calculating Central Score
● Centrality Degree
● Closeness Centrality Degree
● Weighted Closeness Centrality
Degree
S01 S06
S03 S05S02
S04
1
11
1
1 1
2
22
Graph DB Store
Sentence Morpheme
Mapping
Relational DB Store
Building Setence’s
Social Network
UtilizationText Mining Graph Analysis
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
Marketing
상품 추천, 고객 취향 분석
Education
학습 수준 진단 및 최적 진도 분석
문서 군집화 및 분류를 통한 통합 분석
Investment
글로벌 공시 기업들의 투자 패턴 분석
Interaction
음성인식비서, 챗봇
26
TextMiner 적용 분야
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
28

Contenu connexe

Dernier

Merge (Kitworks Team Study 이성수 발표자료 240426)
Merge (Kitworks Team Study 이성수 발표자료 240426)Merge (Kitworks Team Study 이성수 발표자료 240426)
Merge (Kitworks Team Study 이성수 발표자료 240426)Wonjun Hwang
 
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...Kim Daeun
 
캐드앤그래픽스 2024년 5월호 목차
캐드앤그래픽스 2024년 5월호 목차캐드앤그래픽스 2024년 5월호 목차
캐드앤그래픽스 2024년 5월호 목차캐드앤그래픽스
 
Console API (Kitworks Team Study 백혜인 발표자료)
Console API (Kitworks Team Study 백혜인 발표자료)Console API (Kitworks Team Study 백혜인 발표자료)
Console API (Kitworks Team Study 백혜인 발표자료)Wonjun Hwang
 
A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)Tae Young Lee
 
MOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution DetectionMOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution DetectionKim Daeun
 

Dernier (6)

Merge (Kitworks Team Study 이성수 발표자료 240426)
Merge (Kitworks Team Study 이성수 발표자료 240426)Merge (Kitworks Team Study 이성수 발표자료 240426)
Merge (Kitworks Team Study 이성수 발표자료 240426)
 
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
Continual Active Learning for Efficient Adaptation of Machine LearningModels ...
 
캐드앤그래픽스 2024년 5월호 목차
캐드앤그래픽스 2024년 5월호 목차캐드앤그래픽스 2024년 5월호 목차
캐드앤그래픽스 2024년 5월호 목차
 
Console API (Kitworks Team Study 백혜인 발표자료)
Console API (Kitworks Team Study 백혜인 발표자료)Console API (Kitworks Team Study 백혜인 발표자료)
Console API (Kitworks Team Study 백혜인 발표자료)
 
A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)A future that integrates LLMs and LAMs (Symposium)
A future that integrates LLMs and LAMs (Symposium)
 
MOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution DetectionMOODv2 : Masked Image Modeling for Out-of-Distribution Detection
MOODv2 : Masked Image Modeling for Out-of-Distribution Detection
 

En vedette

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

En vedette (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

Graph Database Meetup in Korea #7. Graph Database 5 Offerings_ TextMiner (그래프 데이터베이스 활용사례_ 텍스트 마이닝)

  • 1. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. Bitnine Dec. 2018 Meetup TextMiner
  • 2. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 텍스트 마이닝 성공 사례 2 Quiz 만두 시장에서 압도적 1등으로 자리매김한 이 제품은 무엇일까요? ● 국내 대기업 C그룹은 최근 3년 동안, 한국인의 만두 소비와 관련하여 각종 SNS의 글 약 42억만 건을 조사 함 ● 조사 결과, ‘만두와 맥주 안주’를 키워드로 언급한 글이 년간 35,000건 → 49,000건 → 73,000건으로 크게 증가함 ● 기타 긍정적인 키워드는 ‘조리하기 쉬움’, ‘간편함’, ‘만두 육즙이 맥주와 잘 어울림’ 등이 있었음 ● C그룹은 맥주 안주로 만두 수요가 많이 증가한 것이라 판단 함 ● C그룹은 ‘맥주 안주 마케팅’을 통해, 만두의 매출을 늘렸고, 해당 시장에서 압도적인 1등으로 자리매김 함
  • 3. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. TextMiner 1. Overview 2. Theory 3. Practice
  • 4. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. I. Overview
  • 5. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. What is Mining? 5 Overview 원석을 채굴하고, 연마 및 가공하여, 가치있는 보석으로 만드는 과정 Gemstone Mining DiamondPolishing
  • 6. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. What is Text Mining? 6 Overview 대량 문서 데이터를 가공 및 분석하여, 유용한 패턴 및 인사이트를 발견하는 과정 (cf. The Oxford English Dictionary) Insight Text Mining KnowledgeAnalysis Source Data Mining
  • 7. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 다양한 비정형 데이터 발생 Why use Text Mining? 7 Overview 고객과 사용자의 목소리가 주로 비정형 데이터로 생산되며, 이를 분석해야 더 나은 가치를 제공할 수 있음 인터넷 사용자의 폭발적인 증가 업무 생산의 기본은 문서화 인터넷에서 60초 동안 일어나는 일 직장인의 업무 유형별 수행 비율
  • 8. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. II. Theory
  • 9. ⓒ 2017 by Bitnine Co, Ltd. All Rights Reserved. AgensGraph AgensBrowser Discovery(발견)___________ 9 Offering - TextMiner 비트나인은 AgensGraph 기반의 5가지 핵심 오퍼링 중, TextMiner를 아래와 같이 정의하고 상세화 함 Visualization, monitoring and management of complex enterprise Analyzing massive documents and discover insights from unstructured data Proactive pattern detection from network behaviors for threat analysis and crime investigation Distributed ledger, transaction and user monitoring for operational excellence of private Blockchain Supporting for user’s decision for specific domains by subjective probability algorithm AssetManager TextMiner PatternDetector ChainKeeper DecisionTutor Value Offering Definition Graph Theory Target Industry • Manufacturing • Utility & Telco • Government • Education • Government • Banking, Card • Banking, card • Logistics • Education • Retail Node Edge Graph Node Edge & Treemapping Graph Node Edge Graph Node Edge Graph Node Edge Graph Motif extraction Motif extraction Similarity Similarity Similarity Statistical Method Statistical Method Statistical Method Community Detection Community Detection Community Detection Centrality Centrality Centrality MST MST MST Shortest Path Shortest Path Text Mining Time Series Time Series Bayesian Bayesian Presentation(표현) Inference(추론)
  • 10. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 10 TextMiner - Process TextMiner는 대상 데이터를 추출하여 구조 데이터로 가공하고, 이를 통해 의미있는 패턴을 분석하는 프로세스로 구성 됨 Data KnowledgeText Mining UtilizationGraph Analysis TextMiner Process 1 2 3 ● Data Targeting ● Text Extract ● Pre-processing ● Visualization ● Application ● Graph Modelling ● Graph Analysis ● Evaluation Structured Data Graph Pattern
  • 11. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 11 TextMiner - Text Mining Text Mining을 하는 과정은 분석 목적을 설정하고, 해당 목적에 따라 데이터를 추출하여 전처리하는 과정임 UtilizationText Mining Graph Analysis Data Targeting ● 분석 목적 및 목표 설정 (특정 분야 인사이트 발견, 신제품 고객 반응 등) Data Extract ● 다양한 비정형 문서 ● 특정 데이터베이스 ● SNS/블로그 포스팅 ● 기타 Pre-Processing ● 토큰화(Tokenization) ● 정제(Cleaning) ● 품사 태깅(PoS Tagging) ● 기타 ? Tweepy Confluent Kafka KoNLPy MECAB aka 은전한닢 KOSAC(Sentiment Analysis Corpus)
  • 12. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. ㅋㅋㅋ. ^^ 구문 분석(Syntactic Parsing) 품사 태깅(PoS Tagging) 개념 추출 (Concept Extraction) 나이 정제(Cleaning) 토큰화(Tokenization) 12 TextMiner - Pre-processing 텍스트를 전처리하는 과정은 분석 목적에 따라, 다양한 기법을 적용하여 말뭉치를 생성하는 것임 말뭉치(Corpus) Structured Data UtilizationText Mining Graph Analysis
  • 13. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 13 TextMiner - Pre-processing 데이터 과학자들은 텍스트 마이닝 과정 중, 데이터를 수집하고, 정제하는데 80% 시간을 할애 함 Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says UtilizationText Mining Graph Analysis
  • 14. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 14 TextMiner - Scope of Text Mining Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Text Mining Text Mining은 텍스트로부터 의미있는 가치를 찾기 위한 목적으로, 다양한 분야로부터 포괄적으로 연구되어 졌음 UtilizationText Mining Graph Analysis
  • 15. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. Statistics 15 TextMiner - Scope of Text Mining Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Text Mining Information Extraction 비정형 텍스트로부터 관련 사실과 관계의 식별과 추출; 비정형/반정형 텍스트로 정형 데이터를 만드는 작업 Information Extraction; IE UtilizationText Mining Graph Analysis
  • 16. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. Statistics Library and Information Sciences 16 TextMiner - Scope of Text Mining Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Text Mining Information Extraction Information Retrieval 검색 엔진 및 키워드 검색을 포함한 텍스트 문서의 저장 및 검색 Information Retrieval; IR UtilizationText Mining Graph Analysis
  • 17. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. Statistics Library and Information Sciences 17 TextMiner - Scope of Text Mining Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Computational Linguistics Text Mining Concept Extraction 단어와 구에 대한 의미 부여 및 유사 그룹으로 그룹화 Concept Extraction; CE 사람 이름 연도 과학이론학위 “홍길동 박사가 2018년에 텍스트마이닝 관련 서적을 출간했다.” 책/동의어 출판/동의어 책 출판 Information Extraction Information Retrieval UtilizationText Mining Graph Analysis
  • 18. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. Statistics Library and Information Sciences 18 TextMiner - Scope of Text Mining Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Data Mining Computational Linguistics Text Mining Concept Extraction Document Classification 조각, 단락 또는 문서를 그룹화하고 분류 Document Classification B C D E A A B C D E Data Categories Information Extraction Information Retrieval UtilizationText Mining Graph Analysis
  • 19. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. Statistics Library and Information Sciences 19 TextMiner - Scope of Text Mining Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Data Mining Computational Linguistics Text Mining Concept Extraction Document Clustering Document Classification 용어, 발췌 문장, 단락 또는 문서를 그룹화하고 범주화 Document Clustering Information Extraction Information Retrieval UtilizationText Mining Graph Analysis
  • 20. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. Statistics Library and Information Sciences 20 TextMiner - Scope of Text Mining Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Data Mining Computational Linguistics Text Mining Concept Extraction Document Clustering Document Classification 웹의 상호 연관성에 초점을 맞춘 인터넷 데이터 및 텍스트 마이닝 Web Mining Web Mining 웹 로그, 하이퍼링크, 사용자 활동, 웹 콘텐츠 Information Extraction Information Retrieval UtilizationText Mining Graph Analysis
  • 21. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. Statistics Library and Information Sciences 21 TextMiner - Scope of Text Mining Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, and R. Nisbet, Elsevier, January 2012 Data Mining Computational Linguistics AI and Machine Learning Databases Text Mining Information Extraction Natural Language Processing Concept ExtractionInformation Retrieval Document Clustering Document Classification 자연 언어를 문단,문장,단어 분류, 품사 표기 등 처리 Natural Language Processing; NLP Web Mining UtilizationText Mining Graph Analysis
  • 22. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 22 TextMiner - Scope of Text Mining 아래 6가지 질문을 통해, 솔루션을 찾을 수 있으며, 이를 통해 대부분의 텍스트마이닝 문제를 해결할 수 있음 Text Mining Foundations 1) SNS 상에서 이번에 출시한 신제품에 대한 고객 반응을 알고 싶다. → 단어(해쉬태그, 본문)의 의미분석 필요 → CE(Sentiment Anaysis) / NLP 2) 관심 분야의 논문의 참조 관계 및 연구자 활동을 비교하여 분야별 영향력을 알고 싶다. → 문서(논문)의 연구자와 참조문헌의 연결 및 연관관계 분석이 필요 → Web Mining / Clustering / Classification UtilizationText Mining Graph Analysis
  • 23. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 23 TextMiner - Graph Analysis & Utilization 정제된 말뭉치를 통해, 분석 목적에 맞게 그래프 모델링 후, 그래프 분석을 통해 시각화 하는 과정 임 Graph Modelling Graph Analysis Visualization ● 정제된 말뭉치를 통해, 분석 목적에 맞게 그래프 모델링 ● Vertex는 문서, 단어 등 ● Edge는 관계, 구문 등 ● Property는 의미, 품사 등 ● Page Rank / Text Rank ● Similarity / Clustering / Classification ● Community Detection ● 기타 ● Graph Pattern ● Word Topologies ● Word Cloud ● Infographic ● 기타 UtilizationText Mining Graph Analysis Web Application, Jupyter, etc.
  • 24. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 24 TextMiner - Basic Modelling 그래프 분석을 위한 기본적인 그래프 모델은 아래와 같음 Document or Word ● Meaning ● PoS Tag ● Sentiment Relation / Syntax Vertex Edge Property 동 해 물 과 백 두 산 이 마 르 고 닳 도 록 ● 동쪽바다 ● 명사 ● 닳다 ● 동사 ● 부정 ● 하다 ● 보조동사 하 느 님 보 우 하 사 우 리 나 라 만 세 ● 조사 Basic Model M o r p h e m e S e n t e n c e S y n t e x UtilizationText Mining Graph Analysis
  • 25. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 그래프 DB는 텍스트 마이닝한 모든 데이터를 하나의 그래프로 표현하고 유연하게 활용할 수 있으며, 연산량이 적어 빠르고 간단함 25 TextMiner - 예제. 문서의 핵심문장 찾기 Sentence - Sentence Matrix Calculating Central Score ● Centrality Degree ● Closeness Centrality Degree ● Weighted Closeness Centrality Degree S01 S06 S03 S05S02 S04 1 11 1 1 1 2 22 Graph DB Store Sentence Morpheme Mapping Relational DB Store Building Setence’s Social Network UtilizationText Mining Graph Analysis
  • 26. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. Marketing 상품 추천, 고객 취향 분석 Education 학습 수준 진단 및 최적 진도 분석 문서 군집화 및 분류를 통한 통합 분석 Investment 글로벌 공시 기업들의 투자 패턴 분석 Interaction 음성인식비서, 챗봇 26 TextMiner 적용 분야
  • 27. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved.
  • 28. ⓒ 2018 by Bitnine Co, Ltd. All Rights Reserved. 28