SlideShare a Scribd company logo
1 of 26
Presented by Vidar Brekke,
                                Social Intent LLC




SOCIAL TEXT
ANALYTICS FOR
ENTERPRISE AND
CONSUMER
APPLICATIONS
The International Association of Software
Architects. October 23, 2012




                                  @ividar #nlproc
What is Text Analytics?




                          Processes that uncover
                          business value in
          A               unstructured text via the
                          application of statistical,
      B
                          linguistic, machine
              C           learning, and data analysis
                          and visualization
                          techniques


                                              @ividar #nlproc   2
Text analytics help answer
business questions faster and
cheaper than before, uncovering
new, hidden insights!




                          @ividar #nlproc   3
Text analytics is a Big Data problem




 Volume Velocity Variety
                                                        Hundreds of
                                                        languages
                     Social media,
                     help inquiries,
                     email, texts,
                     surveys


  10.2 Million
  tweets sent                                                  Cryptic (vertical
  during the first                     Formal, inform          industry or
  presidential                         al or                   criminal activity)
  debate                               ridiculously
                                       informal




                                                                         @ividar #nlproc   4
I’m So Intextuated With You




                              Unstructured text represents the
                              biggest opportunity and problem
                              in Big Data

                              Text, as opposed to most other
                              enterprise data, it’s very dirty
                              data




                                                     @ividar #nlproc   5
Correlating consumer confidence with mentions of “jobs” on
Twitter




                                                    @ividar #nlproc   6
Yay! Steve Jobs launches a new iPhone!




                                         @ividar #nlproc   7
You can trade on Twitter




            @ividar #nlproc   8
Low Signal/Noise Ratio + Naïve Metrics Lead to Wrong Conclusions




                    •   Lack of relevance: Many conversations you think
                        are about you, aren’t.

                    •   Poor accuracy: Many automated sentiment
                        solutions are as good as a coin flip.

                    •   Generic: All analysis is applied the same way
                        across domains

                    •   Language Evolves: Slang, sarcasm is rampant in
                        social media. Dictionary-based approaches are
                        largely ineffective.




                                                             @ividar #nlproc   9
Relevancy: It’s not all about you.



    Let me finish my drink before you drive me to the
    Betty Ford clinic!

    Call me a bigot, but white guys can’t sprint!
    #london2012

    My husband is such a baby. He won’t even taste raw
    food.

    Is Delta’s food prepared by Purina? So much for first
    class.

                                                    @ividar #nlproc   10
Search and Destroy (the data you’re looking for)



    Text analytics got traction in the 80s, but the use-cases
    were different than today.

    “Word spotting” – not different from a Google search.

             Show me all documents containing:
                   Ford NOT Harrison


               But it doesn’t scale

                                                     @ividar #nlproc   11
Booleans are like woodcarving with a chainsaw



   Query: Ford NOT Harrison ….

                      …would miss this tweet



   Carguy231: Me and a dozen others
   have lined up outside the Harrison, NY
   Ford dealership to test drive the new
   Fusion!



                                                @ividar #nlproc   12
Booleans are like woodcarving with a chainsaw



   Query: Ford AND Fusion….

                      …would get this tweet



   Roadrunner123: Stuck with my dad in
   his ford listening to horrible jazz fusion




                                                @ividar #nlproc   13
Sentiment Analysis



          Early sentiment analysis tools also use word spotting.


                              “Awesome” = good

                                 “Sucks” = bad


                     What about sarcasm, slang, new words?

   Additionally, the analysis is typically on overall contextual polarity, rather
   than targeted.


        “I love the new Camaro, it’s better than the Mustang”



                                                                         @ividar #nlproc   14
You can’t use word spotting for sentiment detection



   “It took all morning to sign the lease papers for my new Mustang!”

     “I stood on line all morning to get the last Mustang on the lot!”



     “The brakes on the Mustang are surprisingly unpredictable.”

     “The TV ads for the Mustang are surprisingly unpredictable!”



                  “The Mustang has never been good”

               “The Mustang has never been this good”

                                                               @ividar #nlproc   15
Nu-School text analytics is based on Machine Learning



   Using training-data to help the system to recognize patterns. We
   develop a statistical probability that a sentence is
   positive, negative, etc.

   What are training data?
   These are samples of text annotated by humans in an effort to
   show the machine what the right answer is

             “I love my iPhone, but hate AT&T”

                 | iPhone | Positive | AT&T | Negative


       Much easier and quicker to develop new languages than
                    dictionary based approaches

                                                           @ividar #nlproc   16
Test: What’s the sentiment here?




                                   “Reuters reports that
                                   Assad continues the
                                   massacre of his own
                                   people amid sanctions
                                   from the international
                                   community.”




                                                   @ividar #nlproc   17
How to evaluate a text analytics platform



   The accuracy of a sentiment analysis system is, in
   principle, how well it agrees with human judgments.



    “I can’t believe the bar has a hidden gambling room in
                            the back!”



   An automated system can never be better than
   humans. Or can it?



                                                   @ividar #nlproc   18
Using Human Parallel Coding to Establish Gold Standards




        Confusion Matrix: Human as Gold Standard


             POSITIVE   NEGATIVE   NEUTRAL         TOTAL
  POSITIVE     365        24         159           548
  NEGATIVE     57         81          65           203         Raw Accuracy:
                                                                  61.5%
  NEUTRAL      274        60         415           749
    TOTAL      696        165        639           1500



    If human agrees with a machine around 60% percent of the time, the
    machine would be performing as well as a human being.


                                                               @ividar #nlproc   19
Using A Credit Matrix to Create Improved Measurement




            POSITIVE    NEGATIVE   NEUTRAL
 POSITIVE    100%         0%           50%
NEGATIVE      0%         100%          50%         Credit Matrix
 NEUTRAL     50%          50%       100%


                                                     Partial Credit Figure of Merit:
                                                     82.3%

                                        POSITIVE    NEGATIVE       NEUTRAL
    Confusion Matrix:       POSITIVE         365        24           159
    Human 1 as Gold         NEGATIVE         57         81           65
    Standard
                            NEUTRAL          274        60           415


                                                                     @ividar #nlproc   20
Precision & Recall (sentiment as an example)



   Precision is the fraction of retrieved instances
   that are relevant
   E.g. How many instances labeled as positive, were
   actually positive

   Recall is the fraction of relevant instances that are
   retrieved
   E.g. How many positive instances the system
   detected compared to all positive instances.




                                                 @ividar #nlproc   21
Top business applications of text/content analytics*

                                                                             *Alta Plana, 2011

   •   Brand / product / reputation management
        • Market research and social media monitoring, i.e. what are people saying
          about my brand or products

   •   Voice of the Customer / Customer Experience Management
        • Do I need to step in and offer customer service?
        • How many people recommend my brand vs. advocate against it?

   •   Search, Information Access, or Questions Answering
        • Which bloggers are negative toward Obamacare?
        • Which of the hotels on Yelp.com get great reviews for the room service?
        • What are some articles similar to this one?

   •   Competitive intelligence
        • What competing products are people considering and why
        • Are competitor’s media spend generating purchase intent?


                                                                         @ividar #nlproc         22
Growing areas for is text analytics being applied




                        Product development

          Intelligence and counter-terrorism, law enforcement

                    Pharmaceutical drug discovery

                   Financial services and insurance

                   Media, publishing & advertising

                          Political research

                                 CRM


                                                            @ividar #nlproc   23
Still awake?




   There is money in text analytics.

   Here’s a stock tip worth the price of admission
   alone

   (YMMV….)


                                            @ividar #nlproc   24
Strange Bedfellows




  Whenever Anne Hathaway's
  name appeared with any
  regularity in news
  stories, Berkshire Hathaway A
  shares rose in value.




                                  @ividar #nlproc   25
Thx & txt u l8tr

                            Vidar Brekke
                   vidar@socialintent.com
                                  @ividar




                                   @ividar #nlproc

More Related Content

Similar to SOCIAL TEXT ANALYTICS FOR ENTERPRISE AND CONSUMER APPLICATIONS

#1NWebinar: Digital on the Runway
#1NWebinar: Digital on the Runway#1NWebinar: Digital on the Runway
#1NWebinar: Digital on the RunwayOne North
 
Machine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our WorldMachine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our WorldKen Tabor
 
Greenfield Effect: Patterns for Effective Disaster Delivery
Greenfield Effect: Patterns for Effective Disaster DeliveryGreenfield Effect: Patterns for Effective Disaster Delivery
Greenfield Effect: Patterns for Effective Disaster DeliveryJulian Warszawski
 
Moving beyond Vulnerability Testing
Moving beyond Vulnerability TestingMoving beyond Vulnerability Testing
Moving beyond Vulnerability TestingCapgemini
 
Social Search: A Little Help From My Friends
Social Search: A Little Help From My FriendsSocial Search: A Little Help From My Friends
Social Search: A Little Help From My FriendsBrynn Evans
 
Are You Listening? Real time data and social media
Are You Listening? Real time data and social mediaAre You Listening? Real time data and social media
Are You Listening? Real time data and social mediaAndrew Walker
 
Are you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and MonitoringAre you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and MonitoringKlaxon
 
How to Build Your Future in the Internet of Things Economy. Jennifer Riggins
How to Build Your Future in the Internet of Things Economy. Jennifer RigginsHow to Build Your Future in the Internet of Things Economy. Jennifer Riggins
How to Build Your Future in the Internet of Things Economy. Jennifer RigginsFuture Insights
 
AI, Machine Learning, and their Application for Growth - #GHConf18
AI, Machine Learning, and their Application for Growth - #GHConf18AI, Machine Learning, and their Application for Growth - #GHConf18
AI, Machine Learning, and their Application for Growth - #GHConf18GrowthHackers
 
AI and ChatGPT in Online Education
AI and ChatGPT in Online Education AI and ChatGPT in Online Education
AI and ChatGPT in Online Education D2L Barry
 
[DSC Europe 23] Shahab Anbarjafari - Generative AI: Impact of Responsible AI
[DSC Europe 23] Shahab Anbarjafari - Generative AI: Impact of Responsible AI[DSC Europe 23] Shahab Anbarjafari - Generative AI: Impact of Responsible AI
[DSC Europe 23] Shahab Anbarjafari - Generative AI: Impact of Responsible AIDataScienceConferenc1
 
Ethical Artificial Intelligence
Ethical Artificial IntelligenceEthical Artificial Intelligence
Ethical Artificial IntelligenceRudradeb Mitra
 
Artificial Intelligence 101: What is It and Why is it Suddenly a Big Deal Again?
Artificial Intelligence 101: What is It and Why is it Suddenly a Big Deal Again?Artificial Intelligence 101: What is It and Why is it Suddenly a Big Deal Again?
Artificial Intelligence 101: What is It and Why is it Suddenly a Big Deal Again?Andrew Ferrier
 
The Need for Deep Learning Transparency
The Need for Deep Learning TransparencyThe Need for Deep Learning Transparency
The Need for Deep Learning Transparencyinside-BigData.com
 
Bigger than Any One: Solving Large Scale Data Problems with People and Machines
Bigger than Any One: Solving Large Scale Data Problems with People and MachinesBigger than Any One: Solving Large Scale Data Problems with People and Machines
Bigger than Any One: Solving Large Scale Data Problems with People and MachinesTyler Bell
 
Another Day In Paradise
Another Day In ParadiseAnother Day In Paradise
Another Day In Paradisekum72
 
Ar design reality2018
Ar design reality2018Ar design reality2018
Ar design reality2018Anselm Hook
 
Using Data for Decisions TechinAsia Singapore 2015
Using Data for Decisions TechinAsia Singapore 2015Using Data for Decisions TechinAsia Singapore 2015
Using Data for Decisions TechinAsia Singapore 2015Eli Schwartz
 
Machines are the new Digital Natives
Machines are the new Digital NativesMachines are the new Digital Natives
Machines are the new Digital NativesMiel Vander Sande
 
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationBiting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationAlex Pinto
 

Similar to SOCIAL TEXT ANALYTICS FOR ENTERPRISE AND CONSUMER APPLICATIONS (20)

#1NWebinar: Digital on the Runway
#1NWebinar: Digital on the Runway#1NWebinar: Digital on the Runway
#1NWebinar: Digital on the Runway
 
Machine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our WorldMachine Learning: Understanding the Invisible Force Changing Our World
Machine Learning: Understanding the Invisible Force Changing Our World
 
Greenfield Effect: Patterns for Effective Disaster Delivery
Greenfield Effect: Patterns for Effective Disaster DeliveryGreenfield Effect: Patterns for Effective Disaster Delivery
Greenfield Effect: Patterns for Effective Disaster Delivery
 
Moving beyond Vulnerability Testing
Moving beyond Vulnerability TestingMoving beyond Vulnerability Testing
Moving beyond Vulnerability Testing
 
Social Search: A Little Help From My Friends
Social Search: A Little Help From My FriendsSocial Search: A Little Help From My Friends
Social Search: A Little Help From My Friends
 
Are You Listening? Real time data and social media
Are You Listening? Real time data and social mediaAre You Listening? Real time data and social media
Are You Listening? Real time data and social media
 
Are you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and MonitoringAre you listening? Real Time Measurement and Monitoring
Are you listening? Real Time Measurement and Monitoring
 
How to Build Your Future in the Internet of Things Economy. Jennifer Riggins
How to Build Your Future in the Internet of Things Economy. Jennifer RigginsHow to Build Your Future in the Internet of Things Economy. Jennifer Riggins
How to Build Your Future in the Internet of Things Economy. Jennifer Riggins
 
AI, Machine Learning, and their Application for Growth - #GHConf18
AI, Machine Learning, and their Application for Growth - #GHConf18AI, Machine Learning, and their Application for Growth - #GHConf18
AI, Machine Learning, and their Application for Growth - #GHConf18
 
AI and ChatGPT in Online Education
AI and ChatGPT in Online Education AI and ChatGPT in Online Education
AI and ChatGPT in Online Education
 
[DSC Europe 23] Shahab Anbarjafari - Generative AI: Impact of Responsible AI
[DSC Europe 23] Shahab Anbarjafari - Generative AI: Impact of Responsible AI[DSC Europe 23] Shahab Anbarjafari - Generative AI: Impact of Responsible AI
[DSC Europe 23] Shahab Anbarjafari - Generative AI: Impact of Responsible AI
 
Ethical Artificial Intelligence
Ethical Artificial IntelligenceEthical Artificial Intelligence
Ethical Artificial Intelligence
 
Artificial Intelligence 101: What is It and Why is it Suddenly a Big Deal Again?
Artificial Intelligence 101: What is It and Why is it Suddenly a Big Deal Again?Artificial Intelligence 101: What is It and Why is it Suddenly a Big Deal Again?
Artificial Intelligence 101: What is It and Why is it Suddenly a Big Deal Again?
 
The Need for Deep Learning Transparency
The Need for Deep Learning TransparencyThe Need for Deep Learning Transparency
The Need for Deep Learning Transparency
 
Bigger than Any One: Solving Large Scale Data Problems with People and Machines
Bigger than Any One: Solving Large Scale Data Problems with People and MachinesBigger than Any One: Solving Large Scale Data Problems with People and Machines
Bigger than Any One: Solving Large Scale Data Problems with People and Machines
 
Another Day In Paradise
Another Day In ParadiseAnother Day In Paradise
Another Day In Paradise
 
Ar design reality2018
Ar design reality2018Ar design reality2018
Ar design reality2018
 
Using Data for Decisions TechinAsia Singapore 2015
Using Data for Decisions TechinAsia Singapore 2015Using Data for Decisions TechinAsia Singapore 2015
Using Data for Decisions TechinAsia Singapore 2015
 
Machines are the new Digital Natives
Machines are the new Digital NativesMachines are the new Digital Natives
Machines are the new Digital Natives
 
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationBiting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
 

More from Meddle

Social Selling for Inside Sales Teams
Social Selling for Inside Sales TeamsSocial Selling for Inside Sales Teams
Social Selling for Inside Sales TeamsMeddle
 
Employee-powered Content Marketing for Enterprises
Employee-powered Content Marketing for EnterprisesEmployee-powered Content Marketing for Enterprises
Employee-powered Content Marketing for EnterprisesMeddle
 
Understanding the potential of the Facebook Open Graph and Graph API
Understanding the potential of the Facebook Open Graph and Graph APIUnderstanding the potential of the Facebook Open Graph and Graph API
Understanding the potential of the Facebook Open Graph and Graph APIMeddle
 
Understanding the Open Graph
Understanding the Open GraphUnderstanding the Open Graph
Understanding the Open GraphMeddle
 
Getting Started With Social Media Technologies
Getting Started With Social Media TechnologiesGetting Started With Social Media Technologies
Getting Started With Social Media TechnologiesMeddle
 
Social Media for Business - Presentation for Outsourcing Institute
Social Media for Business - Presentation for Outsourcing InstituteSocial Media for Business - Presentation for Outsourcing Institute
Social Media for Business - Presentation for Outsourcing InstituteMeddle
 
Facebook Pages 101
Facebook Pages 101Facebook Pages 101
Facebook Pages 101Meddle
 
Crowdsourcing 101 - tapping into the wisdom of crowds
Crowdsourcing 101 - tapping into the wisdom of crowdsCrowdsourcing 101 - tapping into the wisdom of crowds
Crowdsourcing 101 - tapping into the wisdom of crowdsMeddle
 
Social Apps 101
Social Apps 101Social Apps 101
Social Apps 101Meddle
 
Brands Can Make Friends Too
Brands Can Make Friends TooBrands Can Make Friends Too
Brands Can Make Friends TooMeddle
 

More from Meddle (10)

Social Selling for Inside Sales Teams
Social Selling for Inside Sales TeamsSocial Selling for Inside Sales Teams
Social Selling for Inside Sales Teams
 
Employee-powered Content Marketing for Enterprises
Employee-powered Content Marketing for EnterprisesEmployee-powered Content Marketing for Enterprises
Employee-powered Content Marketing for Enterprises
 
Understanding the potential of the Facebook Open Graph and Graph API
Understanding the potential of the Facebook Open Graph and Graph APIUnderstanding the potential of the Facebook Open Graph and Graph API
Understanding the potential of the Facebook Open Graph and Graph API
 
Understanding the Open Graph
Understanding the Open GraphUnderstanding the Open Graph
Understanding the Open Graph
 
Getting Started With Social Media Technologies
Getting Started With Social Media TechnologiesGetting Started With Social Media Technologies
Getting Started With Social Media Technologies
 
Social Media for Business - Presentation for Outsourcing Institute
Social Media for Business - Presentation for Outsourcing InstituteSocial Media for Business - Presentation for Outsourcing Institute
Social Media for Business - Presentation for Outsourcing Institute
 
Facebook Pages 101
Facebook Pages 101Facebook Pages 101
Facebook Pages 101
 
Crowdsourcing 101 - tapping into the wisdom of crowds
Crowdsourcing 101 - tapping into the wisdom of crowdsCrowdsourcing 101 - tapping into the wisdom of crowds
Crowdsourcing 101 - tapping into the wisdom of crowds
 
Social Apps 101
Social Apps 101Social Apps 101
Social Apps 101
 
Brands Can Make Friends Too
Brands Can Make Friends TooBrands Can Make Friends Too
Brands Can Make Friends Too
 

Recently uploaded

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

SOCIAL TEXT ANALYTICS FOR ENTERPRISE AND CONSUMER APPLICATIONS

  • 1. Presented by Vidar Brekke, Social Intent LLC SOCIAL TEXT ANALYTICS FOR ENTERPRISE AND CONSUMER APPLICATIONS The International Association of Software Architects. October 23, 2012 @ividar #nlproc
  • 2. What is Text Analytics? Processes that uncover business value in A unstructured text via the application of statistical, B linguistic, machine C learning, and data analysis and visualization techniques @ividar #nlproc 2
  • 3. Text analytics help answer business questions faster and cheaper than before, uncovering new, hidden insights! @ividar #nlproc 3
  • 4. Text analytics is a Big Data problem Volume Velocity Variety Hundreds of languages Social media, help inquiries, email, texts, surveys 10.2 Million tweets sent Cryptic (vertical during the first Formal, inform industry or presidential al or criminal activity) debate ridiculously informal @ividar #nlproc 4
  • 5. I’m So Intextuated With You Unstructured text represents the biggest opportunity and problem in Big Data Text, as opposed to most other enterprise data, it’s very dirty data @ividar #nlproc 5
  • 6. Correlating consumer confidence with mentions of “jobs” on Twitter @ividar #nlproc 6
  • 7. Yay! Steve Jobs launches a new iPhone! @ividar #nlproc 7
  • 8. You can trade on Twitter @ividar #nlproc 8
  • 9. Low Signal/Noise Ratio + Naïve Metrics Lead to Wrong Conclusions • Lack of relevance: Many conversations you think are about you, aren’t. • Poor accuracy: Many automated sentiment solutions are as good as a coin flip. • Generic: All analysis is applied the same way across domains • Language Evolves: Slang, sarcasm is rampant in social media. Dictionary-based approaches are largely ineffective. @ividar #nlproc 9
  • 10. Relevancy: It’s not all about you. Let me finish my drink before you drive me to the Betty Ford clinic! Call me a bigot, but white guys can’t sprint! #london2012 My husband is such a baby. He won’t even taste raw food. Is Delta’s food prepared by Purina? So much for first class. @ividar #nlproc 10
  • 11. Search and Destroy (the data you’re looking for) Text analytics got traction in the 80s, but the use-cases were different than today. “Word spotting” – not different from a Google search. Show me all documents containing: Ford NOT Harrison But it doesn’t scale @ividar #nlproc 11
  • 12. Booleans are like woodcarving with a chainsaw Query: Ford NOT Harrison …. …would miss this tweet Carguy231: Me and a dozen others have lined up outside the Harrison, NY Ford dealership to test drive the new Fusion! @ividar #nlproc 12
  • 13. Booleans are like woodcarving with a chainsaw Query: Ford AND Fusion…. …would get this tweet Roadrunner123: Stuck with my dad in his ford listening to horrible jazz fusion @ividar #nlproc 13
  • 14. Sentiment Analysis Early sentiment analysis tools also use word spotting. “Awesome” = good “Sucks” = bad What about sarcasm, slang, new words? Additionally, the analysis is typically on overall contextual polarity, rather than targeted. “I love the new Camaro, it’s better than the Mustang” @ividar #nlproc 14
  • 15. You can’t use word spotting for sentiment detection “It took all morning to sign the lease papers for my new Mustang!” “I stood on line all morning to get the last Mustang on the lot!” “The brakes on the Mustang are surprisingly unpredictable.” “The TV ads for the Mustang are surprisingly unpredictable!” “The Mustang has never been good” “The Mustang has never been this good” @ividar #nlproc 15
  • 16. Nu-School text analytics is based on Machine Learning Using training-data to help the system to recognize patterns. We develop a statistical probability that a sentence is positive, negative, etc. What are training data? These are samples of text annotated by humans in an effort to show the machine what the right answer is “I love my iPhone, but hate AT&T” | iPhone | Positive | AT&T | Negative Much easier and quicker to develop new languages than dictionary based approaches @ividar #nlproc 16
  • 17. Test: What’s the sentiment here? “Reuters reports that Assad continues the massacre of his own people amid sanctions from the international community.” @ividar #nlproc 17
  • 18. How to evaluate a text analytics platform The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. “I can’t believe the bar has a hidden gambling room in the back!” An automated system can never be better than humans. Or can it? @ividar #nlproc 18
  • 19. Using Human Parallel Coding to Establish Gold Standards Confusion Matrix: Human as Gold Standard POSITIVE NEGATIVE NEUTRAL TOTAL POSITIVE 365 24 159 548 NEGATIVE 57 81 65 203 Raw Accuracy: 61.5% NEUTRAL 274 60 415 749 TOTAL 696 165 639 1500 If human agrees with a machine around 60% percent of the time, the machine would be performing as well as a human being. @ividar #nlproc 19
  • 20. Using A Credit Matrix to Create Improved Measurement POSITIVE NEGATIVE NEUTRAL POSITIVE 100% 0% 50% NEGATIVE 0% 100% 50% Credit Matrix NEUTRAL 50% 50% 100% Partial Credit Figure of Merit: 82.3% POSITIVE NEGATIVE NEUTRAL Confusion Matrix: POSITIVE 365 24 159 Human 1 as Gold NEGATIVE 57 81 65 Standard NEUTRAL 274 60 415 @ividar #nlproc 20
  • 21. Precision & Recall (sentiment as an example) Precision is the fraction of retrieved instances that are relevant E.g. How many instances labeled as positive, were actually positive Recall is the fraction of relevant instances that are retrieved E.g. How many positive instances the system detected compared to all positive instances. @ividar #nlproc 21
  • 22. Top business applications of text/content analytics* *Alta Plana, 2011 • Brand / product / reputation management • Market research and social media monitoring, i.e. what are people saying about my brand or products • Voice of the Customer / Customer Experience Management • Do I need to step in and offer customer service? • How many people recommend my brand vs. advocate against it? • Search, Information Access, or Questions Answering • Which bloggers are negative toward Obamacare? • Which of the hotels on Yelp.com get great reviews for the room service? • What are some articles similar to this one? • Competitive intelligence • What competing products are people considering and why • Are competitor’s media spend generating purchase intent? @ividar #nlproc 22
  • 23. Growing areas for is text analytics being applied Product development Intelligence and counter-terrorism, law enforcement Pharmaceutical drug discovery Financial services and insurance Media, publishing & advertising Political research CRM @ividar #nlproc 23
  • 24. Still awake? There is money in text analytics. Here’s a stock tip worth the price of admission alone (YMMV….) @ividar #nlproc 24
  • 25. Strange Bedfellows Whenever Anne Hathaway's name appeared with any regularity in news stories, Berkshire Hathaway A shares rose in value. @ividar #nlproc 25
  • 26. Thx & txt u l8tr Vidar Brekke vidar@socialintent.com @ividar @ividar #nlproc

Editor's Notes

  1. The green cells here are where the two coders agree. We can use this to derive a “raw” accuracy score. We add up the total number of instances where the two coders agree (the green cells) and divide by the total number of instances (1500) – to get a raw accuracy score of 61.5%.This raw accuracy score provides the first benchmark against which we can assess machine performance. Put concretely, if we can get a machine to classify documents for sentiment where a human would agree with its classifications around 60% percent of the time, our machine would be performing as well as a human being.
  2. Remember, we said before that not all mistakes are made equally. It depends on the use to which you’re putting the data. In most situations, however, it’s worse to mislabel something positive as negative than it is to mislabel something positive as neutral. This is true both for a human or machine coder.We can factor in these relative weights by using what is called a Credit Matrix. This says that you get 100% when your label agrees with the gold standardUltimately, the PCFM will establish the baseline against which we measure the performance of our machine learning algorithm.