Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
[2018 台灣人工智慧學校校友年會] Textual Data Analytics in Finance / 王釧茹
1. Talk @ Taiwan AI Academy, November 17, 2018
Textual Data Analytics in Finance
Dr. Chuan-Ju Wang (王釧茹)
Research Center for Information Technology
Innovation, Academia Sinica
Computational Finance and Data Analytics
Laboratory (CFDA Lab)
http://cfda.csie.org
2. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Quant — Data Scientist
Source: http://www.indeed.com/jobtrends
Source: http://www.computerweekly.com/blogs/Data-Matters/2014/06/data-scientist-the-new-quant.html
3. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Data Science in Finance
4. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Text Analytics
❖ Big Data
❖ Structured Data
❖ user logs, sensor logs, click through logs, …
❖ Unstructured Data
❖ web texts, user conversions, public opinions, reports…
❖ Big Data for Unstructured Text – Text Analytics
❖ Goal — Turn text into data for analysis, via application of
natural language processing (NLP) and analytical methods
https://insidebigdata.com/2015/06/05/text-analytics-the-next-generation-of-big-data/
5. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Textual Sentiment Analysis for
Financial Risk Prediction
On the Risk Prediction and Analysis of Soft
Information in Finance Reports. European Journal of
Operational Research (EJOR), 257(1), 243-250, 2017.
6. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Soft and Hard Information in Finance
❖ Growing amount of financial data makes it more and more important
to learn how to discover valuable information for various financial
applications.
❖ In finance, there are typically two kinds of information:
❖ Soft information: text, including opinions, ideas, and market
commentary.
❖ Hard information: numerical values, such as financial measures and
historical prices.
❖ Our work aims to exploit soft information for financial risk prediction.
7. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Risk Proxy: Stock Return Volatility
❖ Stock return
❖ Stock return volatility
❖ A common risk metric measured by the standard
deviation of returns over a period of time.
Rt =
(St St 1)
St 1
v[t n,t] =
t
i=t n(Ri R)2
n
, where R =
t
i=t n
Ri
(n + 1)
.
8. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Financial Sentiment Analysis
❖ In this work, we attempt to apply sentiment analysis on the
risk prediction task.
❖ A finance-specific sentiment lexicon is adopted for analysis.
❖ Two machine learning techniques are adopted for the task:
❖ Regression approach: Predict the stock return volatilities.
❖ Ranking approach: Rank the companies to be in line
with their relative risk levels.
9. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Financial Sentiment Lexicon
❖ Words in finance domain and in general usage usually have
different meanings, such as
❖ vice: immoral or wicked behavior
❖ vice: secondary (in finance context)
❖ Almost three-fourths of the words in the 10-K financial reports
from year 1994 to 2008, which are identified as negative by the
widely used Harvard Psychosociological Dictionary, are
typically not considered negative in financial contexts.
10. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Six Finance-Specific Lexicons
❖ Loughran and McDonald (2011)
❖ When is a liability not a liability? textual analysis, dictionaries,
and 10-ks. Journal of Finance.
11. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Problem Formulation
❖ Predict target: Future’s stock return volatility (regression) and
future’s relative risk levels (ranking)
❖ Features
❖ Soft textual information: All words or financial sentiment words
❖ Hard numerical information: The twelve months before the
report volatility for each company
v(+12)
2007/3/222006/3/22
Report filing date
2005/3/22
v(-12)
12. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Corpora: The 10-K Corpus
❖ A Form 10-K is an annual report required by the U.S. Securities and Exchange Commission (SEC)
❖ Only section 7 “management’s discussion and analysis of financial conditions and results of operations”(MD&A)
❖ The Sarbanes-Oxley Act of 2002: Explain the drastic increase in length during the 2002-2003 period
13. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Experimental Results
14. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Financial Sentiment Terms Analysis
amend
deficit
forbear
delist
defaultsureti
discontinu
wherebi
unabl
disput
concern
profit
violat
regain
uncom
-plet
accid
abl
integr
grantor
ceg
nasdaq
gnb
coven
forbear
waiver
sureti
excelsior
rais
ebix
shelbour
nplacement
syndic
pfc
stage
same
driver
default
small-
cap
seri
hearth
awg
amend
libert
special
benefici sever
breach
doubt
Fin-Neg
Fin-Pos
Fin-Lit
Fin-Unc
Non
SEN
ORG
1
1
2
3
4
5
2
3
4
5
deficit
deficits
default
defaulted
defaulting
defaults
delist
delisted
deslisting
delists
amend
amendable
amendatory
amended
amending
amendment
amendments
amends
forbear
forbearance
forbearances
forbearing
forbears
15. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
FIN10K Prototype Demo
https://cfda.csie.org/10K/
FIN10K: A Web-based Information System for
Financial Report Analysis and Visualization.
ACM CIKM (Demo paper), 2016.
16. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Financial Keyword Expansion via
Continuous Word Vector Representations
Discovering Finance Keywords via Continuous
Space Language Models. ACM Transactions on
Management Information Systems, 7(3), 7:1-7:17, 2016.
17. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Sentiment Analysis — the Lexicon
❖ For sentiment analysis, the lexicon is one of the most
important and common resources.
❖ Usually have a great impact on results and the
corresponding analyses
❖ In finance, the lexicon is usually semi-manually generated.
❖ Result in inadequate words
❖ In this work, we attempt to use the advanced continuous space
language models to expand finance keywords automatically.
18. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Continuous Space Language Models
❖ “You shall know a word by the company it keeps”
(J. R. Firth 1957)
❖ One of the most successful ideas of modern statistical NLP!
19. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Continuous Space Language Models
❖ Continuous space language models
❖ a.k.a. Continuous word embeddings
❖ Words are represented as low-rank dense vectors.
❖ Recent studies show their superiority in capturing
syntactic and contextual regularities in language.
20. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Keyword Expansion
❖ Our Proposed Keyword Expansion Method
❖ Adapt this technique to incorporate syntactic
information to capture more similarly meaningful
keywords.
❖ Learn vector representations of words via a large
collection of financial reports (domain-specific)
❖ Words in the financial sentiment lexicon are used as seed
words to obtain those within the top N cosine distances.
21. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Keyword Expansion
❖ Keyword Expansion with Syntactic Information
22. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
The New 10-K Corpus
23. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Four Prediction Tasks
❖ Four prediction tasks are conducted.
❖ To demonstrate that our approach is effective for
discovering predictability keywords
1) Post-event volatility
2) Stock volatility
3) Abnormal trading volume
4) Excess returns
24. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Postevent Volatility Prediction
25. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
FIN10K Prototype Demo
https://cfda.csie.org/10K/
FIN10K: A Web-based Information System for Financial Report Analysis
and Visualization. ACM CIKM (Demo paper), 2016.
26. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Beyond Word-Level Analysis
❖ Multi-word expression detection and analysis
❖ Beyond Word-Level to Sentence-Level Sentiment Analysis for
Financial Reports
❖ RiskFinder: A Sentence-level Risk Detector for Financial Reports,
NAACL’18
❖ https://cfda.csie.org/RiskFinder/
❖ FRIDAYS: A Financial Risk Information Detecting and Analyzing
System, AAAI’18
❖ https://cfda.csie.org/FRIDAYS/
27. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Summary
❖ If structured data is big, then unstructured data is huge.
❖ 20% (structured) vs. 80% (unstructured)
❖ There is a massive potential waiting to be leveraged in
the analysis of unstructured data in the field of finance.
28. Chuan-Ju Wang (CITI, AS) Talk @ Taiwan AI Academy November 17, 2018
Thanks for Your Listening!