SlideShare une entreprise Scribd logo
1  sur  28
< sen·ti·ment >
        the prevailing attitude of investors as to anticipated
                   price development in a market.




Tim Harbers, CTO SNTMNT
DataScienceNL Meetup November 8th 2012
Tim Harbers

             Background

   BSc Computer Science
   MSc Computer Science
              Researcher
              Data Miner
     Technical Consultant
   Co-Founder and COO
    Co-Founder and CTO
The Rockstars
Vincent van Leeuwen
Customer Development
                          ‣ Balanced multidisciplinary team

                          ‣ Two machine learning experts in
                            predictive analysis and large
Kees van Nunen              datasets
Product Development
                          ‣ Academic degrees in Behavioral
                            Finance, Portfolio Finance,
                            Strategic Management & Artificial
Durk Kingma
                            Intelligence
Data Mining Expert
                          ‣ Strong network in (Dutch) financial
                            industry

                          ‣ Young, enthusiastic team with a
Tim Harbers                 proven entrepreneurial mindset
Machine Learning Expert
How to select the right stock
        to invest in?
Our solution:
Predicting stock price movement
based on online buzz




        Engineered based on academic research:

        Van Leeuwen (2011)          Bollen, et al, (2010)
        Sprenger and Welpe (2010)   Sehgal and Song (2007)
Why would this work?
   Very different from traditional indicators
   News travels faster via social than traditional media
   Tremendous amount of data
   (Almost) nobody uses it yet
Why focus on Twitter?
   Public data & easily accessible
   Structured language
   400M tweets per day
Historic Research
Bollen (2010)
Created a model based on Twitter mood states, which
was 86% accurate on the DJI.

Sprenger and Welpe (2011)
Analyzed correlation of the stock market and micro
blogs
Financial Sentiment vs Brand Sentiment

Financial Sentiment      Brand Sentiment
   Tweets relating to      Tweets relating to
    stocks                   brands
   Written by traders      Written by consumers
   Trader mumbo jumbo      Any language
   More relevant           Larger dataset
   Shorter term            Longer term
Data setup
Period
June 2010 to April 2012


Stocks
Top 15 most tweeted stocks in S&P 500


Tweets
Financial Dataset Timm Sprenger (4 million)
4 Million tweets Topsy Brand Tweets (100+ million tweets)


Other
Klout
Peerindex
Sentiment
Scoring
Financial tweets
Commercial tweets
Sentiment analysis:
Enabling computers to derive sentiment
from natural language
Naive Approach: Dictionaries
   Use a dictionary of common positive and negative
    terms
   Count the number of positive and negative terms
   Use the difference between the two.
SNTMNT’s approach: machine learning
   Label a training set of tweets (target)
   Use preprocessing techniques
   Use several feature extractors
   Create a sparse dataset.
   Use supervised learning to train a machine learning
    model.
Labeling
  • 25K Financial tweets hand labeled
  • 30K Commercial tweets hand labeled
  • 1M #happy vs. #sad
Difficulties in sentiment analysis
   Authors / Urls
   Foreign languages



   Slang
       aykm
       lol
       tgsttttptct
   Negation



   Target Sentiment Analysis
Results
Financial tweets
84.3% accurate on 2-point scale (Baseline: 60.4%)
76.8% accurate on 3-point scale (Baseline: 65.0%)
Beat Lexalytics (84.3% vs. 70.3%)
Commercial tweets
84.7% accurate on 2-point scale (Baseline: 61.0%)
86.9% accurate on 3-point scale (baseline: 81.1%)
Stock
Stock Regression
   Input:
       Sentiment scores
       Mood states
       Meta Data
       Stock
   Output:
       Trading Indication
       Confidence
Many dimensions
   Tweet period
   Trading period
   Financial Tweets or Commercial Tweets
   Tweet Crunchers
   Models
   Trading strategy
Tweet Aggregation Problem


                       Tweet volume
                       Volume positive tweets
                       Avg sentiment
                       Sentiment Growth
                       Etc.
Machine Learning Models
   Linear Regression
   Bayesian Approaches
   Decision Trees
   Neural Nets
   Support Vector Machines
Results
   R2 < 0.01
   Not usable as an independent trading model after
    transaction costs.
   Still usable as an extra indicator to be used by proven
    trading models.
Products - next steps:

 Sentiment APIs               Stock Dashboard   Trading Indicator API
 (B2B)                        (B2B2C)           (B2B)



 ‣ Market leader and
   thought leader financial
   sentiment analysis.
                                                  ‣ Getting more insights
 ‣ Extend scope to further                          into added value of
   niche domains and                                SNTMNT algorithm as
   languages.                                       indicator next to
                                                    fundamental and
                                                    technical analysis.
For more info, visit:

www.SNTMNT.com


                    Any questions?

Contenu connexe

Tendances

Mutual funds-products- f
Mutual funds-products- fMutual funds-products- f
Mutual funds-products- fIshanDhoble1
 
Understanding Tech Market Sizing
Understanding Tech Market SizingUnderstanding Tech Market Sizing
Understanding Tech Market SizingBlaine Mathieu
 
Practical AI use cases in Customer Service
Practical AI use cases in Customer ServicePractical AI use cases in Customer Service
Practical AI use cases in Customer ServiceDenys Holovatyi
 
SCB Networking at the Watershed March 2013 - Sizing the Market Greville Commins
SCB Networking at the Watershed March 2013 - Sizing the Market Greville ComminsSCB Networking at the Watershed March 2013 - Sizing the Market Greville Commins
SCB Networking at the Watershed March 2013 - Sizing the Market Greville ComminsScience City Bristol
 
C:\Documents And Settings\Administrator\Desktop\Sales Presentation
C:\Documents And Settings\Administrator\Desktop\Sales PresentationC:\Documents And Settings\Administrator\Desktop\Sales Presentation
C:\Documents And Settings\Administrator\Desktop\Sales Presentationboga12
 
Machine learning prediction of stock markets
Machine learning prediction of stock marketsMachine learning prediction of stock markets
Machine learning prediction of stock marketsNikola Milosevic
 
Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)Luciano Pesci, PhD
 

Tendances (10)

Mutual funds-products- f
Mutual funds-products- fMutual funds-products- f
Mutual funds-products- f
 
Understanding Tech Market Sizing
Understanding Tech Market SizingUnderstanding Tech Market Sizing
Understanding Tech Market Sizing
 
Practical AI use cases in Customer Service
Practical AI use cases in Customer ServicePractical AI use cases in Customer Service
Practical AI use cases in Customer Service
 
SCB Networking at the Watershed March 2013 - Sizing the Market Greville Commins
SCB Networking at the Watershed March 2013 - Sizing the Market Greville ComminsSCB Networking at the Watershed March 2013 - Sizing the Market Greville Commins
SCB Networking at the Watershed March 2013 - Sizing the Market Greville Commins
 
C:\Documents And Settings\Administrator\Desktop\Sales Presentation
C:\Documents And Settings\Administrator\Desktop\Sales PresentationC:\Documents And Settings\Administrator\Desktop\Sales Presentation
C:\Documents And Settings\Administrator\Desktop\Sales Presentation
 
Machine learning prediction of stock markets
Machine learning prediction of stock marketsMachine learning prediction of stock markets
Machine learning prediction of stock markets
 
Ml conference slides
Ml conference slidesMl conference slides
Ml conference slides
 
Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)Lifetime Value - The Only Metric That Matters (DMC September 2018)
Lifetime Value - The Only Metric That Matters (DMC September 2018)
 
Lance fair child
Lance fair childLance fair child
Lance fair child
 
Crowdsourced hedge funds
Crowdsourced hedge funds Crowdsourced hedge funds
Crowdsourced hedge funds
 

En vedette

Talk data sciencemeetup
Talk data sciencemeetupTalk data sciencemeetup
Talk data sciencemeetupdatasciencenl
 
The evidential value of mobile phone co-location
The evidential value of mobile phone co-locationThe evidential value of mobile phone co-location
The evidential value of mobile phone co-locationdatasciencenl
 
Презентация ЦТТ "Кулон"
Презентация ЦТТ "Кулон"Презентация ЦТТ "Кулон"
Презентация ЦТТ "Кулон"CTTkulon
 
Cosmic corporate park 9311099558
Cosmic corporate park 9311099558Cosmic corporate park 9311099558
Cosmic corporate park 9311099558aksingh3867
 
Building a Data Science Toolbox
Building a Data Science ToolboxBuilding a Data Science Toolbox
Building a Data Science Toolboxdatasciencenl
 
Bayes in competition
Bayes in competitionBayes in competition
Bayes in competitiondatasciencenl
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNEYan Xu
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
 
скво&алюмоводород Startup vilage eng
скво&алюмоводород Startup vilage engскво&алюмоводород Startup vilage eng
скво&алюмоводород Startup vilage engCTTkulon
 
цтт кулон Eng
цтт кулон Engцтт кулон Eng
цтт кулон EngCTTkulon
 
Integrasi tanaman kopi dengan lebah madu
Integrasi tanaman kopi dengan lebah maduIntegrasi tanaman kopi dengan lebah madu
Integrasi tanaman kopi dengan lebah maduLailya Nuriska
 
Integrasi Beternak Lebah Madu di Kebun Kopi
Integrasi Beternak Lebah Madu di Kebun KopiIntegrasi Beternak Lebah Madu di Kebun Kopi
Integrasi Beternak Lebah Madu di Kebun KopiLailya Nuriska
 

En vedette (15)

R fun utrecht
R fun utrechtR fun utrecht
R fun utrecht
 
Talk data sciencemeetup
Talk data sciencemeetupTalk data sciencemeetup
Talk data sciencemeetup
 
The evidential value of mobile phone co-location
The evidential value of mobile phone co-locationThe evidential value of mobile phone co-location
The evidential value of mobile phone co-location
 
Презентация ЦТТ "Кулон"
Презентация ЦТТ "Кулон"Презентация ЦТТ "Кулон"
Презентация ЦТТ "Кулон"
 
Cosmic corporate park 9311099558
Cosmic corporate park 9311099558Cosmic corporate park 9311099558
Cosmic corporate park 9311099558
 
Building a Data Science Toolbox
Building a Data Science ToolboxBuilding a Data Science Toolbox
Building a Data Science Toolbox
 
Bayes in competition
Bayes in competitionBayes in competition
Bayes in competition
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
скво&алюмоводород Startup vilage eng
скво&алюмоводород Startup vilage engскво&алюмоводород Startup vilage eng
скво&алюмоводород Startup vilage eng
 
цтт кулон Eng
цтт кулон Engцтт кулон Eng
цтт кулон Eng
 
Koperasi
KoperasiKoperasi
Koperasi
 
Integrasi tanaman kopi dengan lebah madu
Integrasi tanaman kopi dengan lebah maduIntegrasi tanaman kopi dengan lebah madu
Integrasi tanaman kopi dengan lebah madu
 
Integrasi Beternak Lebah Madu di Kebun Kopi
Integrasi Beternak Lebah Madu di Kebun KopiIntegrasi Beternak Lebah Madu di Kebun Kopi
Integrasi Beternak Lebah Madu di Kebun Kopi
 
Urea Mollases Block
Urea Mollases BlockUrea Mollases Block
Urea Mollases Block
 

Similaire à 20121108 sntmnt data_sciencenl

Analytics what to look for sustaining your growing business-
Analytics   what to look for sustaining your growing business-Analytics   what to look for sustaining your growing business-
Analytics what to look for sustaining your growing business-Ajay Ohri
 
Big data and Marketing by Edward Chenard
Big data and Marketing by Edward ChenardBig data and Marketing by Edward Chenard
Big data and Marketing by Edward ChenardEdward Chenard
 
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...amdia
 
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...Quantopian
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionSkyl.ai
 
Commercializing Alternative Data
Commercializing Alternative DataCommercializing Alternative Data
Commercializing Alternative DataDatabricks
 
Are you getting the most out of your data?
Are you getting the most out of your data?Are you getting the most out of your data?
Are you getting the most out of your data?SAS Canada
 
Big data for sales and marketing people
Big data for sales and marketing peopleBig data for sales and marketing people
Big data for sales and marketing peopleEdward Chenard
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesDerek Kane
 
Customer Insights Summit Toronto 2012
Customer Insights Summit Toronto 2012Customer Insights Summit Toronto 2012
Customer Insights Summit Toronto 2012Fabiana Pereira
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction documentrajatkr
 
Strategy session 5 - unlocking the data dividend - andy steer
Strategy   session 5 - unlocking the data dividend - andy steerStrategy   session 5 - unlocking the data dividend - andy steer
Strategy session 5 - unlocking the data dividend - andy steerAndy Steer
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 

Similaire à 20121108 sntmnt data_sciencenl (20)

Analytics what to look for sustaining your growing business-
Analytics   what to look for sustaining your growing business-Analytics   what to look for sustaining your growing business-
Analytics what to look for sustaining your growing business-
 
Big data and Marketing by Edward Chenard
Big data and Marketing by Edward ChenardBig data and Marketing by Edward Chenard
Big data and Marketing by Edward Chenard
 
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
#MarketingShake - Edward Chenard - Descubrí el poder del Big Data para Transf...
 
Studium general material
Studium general materialStudium general material
Studium general material
 
Kba symposium 6th april
Kba symposium 6th aprilKba symposium 6th april
Kba symposium 6th april
 
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
“Real Time Machine Learning Architecture and Sentiment Analysis Applied to Fi...
 
2.11.14
2.11.142.11.14
2.11.14
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity Recognition
 
Selling Text Analytics to your boss
Selling Text Analytics to your bossSelling Text Analytics to your boss
Selling Text Analytics to your boss
 
Frakture Deck
Frakture DeckFrakture Deck
Frakture Deck
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
Commercializing Alternative Data
Commercializing Alternative DataCommercializing Alternative Data
Commercializing Alternative Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Are you getting the most out of your data?
Are you getting the most out of your data?Are you getting the most out of your data?
Are you getting the most out of your data?
 
Big data for sales and marketing people
Big data for sales and marketing peopleBig data for sales and marketing people
Big data for sales and marketing people
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
Customer Insights Summit Toronto 2012
Customer Insights Summit Toronto 2012Customer Insights Summit Toronto 2012
Customer Insights Summit Toronto 2012
 
Veda Semantics - introduction document
Veda Semantics - introduction documentVeda Semantics - introduction document
Veda Semantics - introduction document
 
Strategy session 5 - unlocking the data dividend - andy steer
Strategy   session 5 - unlocking the data dividend - andy steerStrategy   session 5 - unlocking the data dividend - andy steer
Strategy session 5 - unlocking the data dividend - andy steer
 
Agile data science
Agile data scienceAgile data science
Agile data science
 

20121108 sntmnt data_sciencenl

  • 1. < sen·ti·ment > the prevailing attitude of investors as to anticipated price development in a market. Tim Harbers, CTO SNTMNT DataScienceNL Meetup November 8th 2012
  • 2. Tim Harbers Background BSc Computer Science MSc Computer Science Researcher Data Miner Technical Consultant Co-Founder and COO Co-Founder and CTO
  • 3. The Rockstars Vincent van Leeuwen Customer Development ‣ Balanced multidisciplinary team ‣ Two machine learning experts in predictive analysis and large Kees van Nunen datasets Product Development ‣ Academic degrees in Behavioral Finance, Portfolio Finance, Strategic Management & Artificial Durk Kingma Intelligence Data Mining Expert ‣ Strong network in (Dutch) financial industry ‣ Young, enthusiastic team with a Tim Harbers proven entrepreneurial mindset Machine Learning Expert
  • 4. How to select the right stock to invest in?
  • 5. Our solution: Predicting stock price movement based on online buzz Engineered based on academic research: Van Leeuwen (2011) Bollen, et al, (2010) Sprenger and Welpe (2010) Sehgal and Song (2007)
  • 6. Why would this work?  Very different from traditional indicators  News travels faster via social than traditional media  Tremendous amount of data  (Almost) nobody uses it yet
  • 7. Why focus on Twitter?  Public data & easily accessible  Structured language  400M tweets per day
  • 8. Historic Research Bollen (2010) Created a model based on Twitter mood states, which was 86% accurate on the DJI. Sprenger and Welpe (2011) Analyzed correlation of the stock market and micro blogs
  • 9. Financial Sentiment vs Brand Sentiment Financial Sentiment Brand Sentiment  Tweets relating to  Tweets relating to stocks brands  Written by traders  Written by consumers  Trader mumbo jumbo  Any language  More relevant  Larger dataset  Shorter term  Longer term
  • 10. Data setup Period June 2010 to April 2012 Stocks Top 15 most tweeted stocks in S&P 500 Tweets Financial Dataset Timm Sprenger (4 million) 4 Million tweets Topsy Brand Tweets (100+ million tweets) Other Klout Peerindex
  • 11.
  • 15. Sentiment analysis: Enabling computers to derive sentiment from natural language
  • 16. Naive Approach: Dictionaries  Use a dictionary of common positive and negative terms  Count the number of positive and negative terms  Use the difference between the two.
  • 17. SNTMNT’s approach: machine learning  Label a training set of tweets (target)  Use preprocessing techniques  Use several feature extractors  Create a sparse dataset.  Use supervised learning to train a machine learning model.
  • 18. Labeling • 25K Financial tweets hand labeled • 30K Commercial tweets hand labeled • 1M #happy vs. #sad
  • 19. Difficulties in sentiment analysis  Authors / Urls  Foreign languages  Slang  aykm  lol  tgsttttptct  Negation  Target Sentiment Analysis
  • 20. Results Financial tweets 84.3% accurate on 2-point scale (Baseline: 60.4%) 76.8% accurate on 3-point scale (Baseline: 65.0%) Beat Lexalytics (84.3% vs. 70.3%) Commercial tweets 84.7% accurate on 2-point scale (Baseline: 61.0%) 86.9% accurate on 3-point scale (baseline: 81.1%)
  • 21. Stock
  • 22. Stock Regression  Input:  Sentiment scores  Mood states  Meta Data  Stock  Output:  Trading Indication  Confidence
  • 23. Many dimensions  Tweet period  Trading period  Financial Tweets or Commercial Tweets  Tweet Crunchers  Models  Trading strategy
  • 24. Tweet Aggregation Problem  Tweet volume  Volume positive tweets  Avg sentiment  Sentiment Growth  Etc.
  • 25. Machine Learning Models  Linear Regression  Bayesian Approaches  Decision Trees  Neural Nets  Support Vector Machines
  • 26. Results  R2 < 0.01  Not usable as an independent trading model after transaction costs.  Still usable as an extra indicator to be used by proven trading models.
  • 27. Products - next steps: Sentiment APIs Stock Dashboard Trading Indicator API (B2B) (B2B2C) (B2B) ‣ Market leader and thought leader financial sentiment analysis. ‣ Getting more insights ‣ Extend scope to further into added value of niche domains and SNTMNT algorithm as languages. indicator next to fundamental and technical analysis.
  • 28. For more info, visit: www.SNTMNT.com Any questions?

Notes de l'éditeur

  1. -
  2. -