SlideShare une entreprise Scribd logo
1  sur  42
Intro 2 text analytics | Ben Taylor @bentaylordata
Text Analytics Are Awesome!
Thank you to our Sponsors!
HIREVUE | TALENT INTERACTION
Agenda
SPAM
Levenshtein distance (word, sentence, cloud)
2
3
4
Text handling, introduction1
Map Reduce / Clustering5
Interview text analytics6
Sentiment
Text handling
Input not expected?
HIREVUE | TALENT INTERACTION
HIREVUE | TALENT INTERACTION
Model 
M
Input Output
HIREVUE | TALENT INTERACTION
Model 
M
Input
HIREVUE | TALENT INTERACTION
Model 
M
Input Output
Stderr:
You’re an idiot &
I don’t like you anymore
HIREVUE | TALENT INTERACTION
Input
HIREVUE | TALENT INTERACTION @BENTAYLORDATA
HIREVUE | TALENT INTERACTION
HIREVUE | TALENT INTERACTION
HIREVUE | TALENT INTERACTION @BENTAYLORDATA
Need to map unstructured text to summary metric
Sentiment
How are you feeling?
HIREVUE | TALENT INTERACTION
HIREVUE | TALENT INTERACTION
Let’s make this easy.
Problem statement:
Expletives + @skullcandy mention?
Good or bad?
HIREVUE | TALENT INTERACTION
Negative Sentiment
 1048940088:
 "I've got two pairs of Ink'd earbuds by @Skullcandy and they both broke in two weeks. I
$#@&ing hate @Skullcandy! #$#@&You”
 1054044204:
 “$#@& only one headphone stopped working stupid $#@&ing headphones y is it only one
headphone i blame you @skullcandy”
 1376767884:
 "@skullcandy never buyin another pair of skull candy headphones this is the fourth pair in the
last 2 months that $#@&ed up”
 141343855:
 “My headphones blew $#@& you skullcandy -___-”
 16352011:
 “BAHHHHH My SkullCandys are $#@&ing up AGAIN!”
 1376767884:
 "@skullcandy $#@& skullcandy"
HIREVUE | TALENT INTERACTION
Positive Sentiment
 161547390:
 "Getting some skullcandy fix's. #tight #skullcandy #$#@&ingpumped"
 1306207039:
 "@skullcandy @VegasJarhead @justine_mom $#@& yeah!"
 1117713458:
 "@skullcandy $#@&in bass is badass",
 1117713458:
 "@skullcandy ur headphones are bad ass and have awsome $#@&in bass"
 1086228384:
 "Just bough a pair of Skullcandy supreme sound Hesh's $#@&ING AWSOME!!! the bass is
truly amazing :)"
 132303540:
 "@K$#@&INGP I thought you were a man not a pussy. Try Skullcandy. Hit me back and I'll
hook you up."
HIREVUE | TALENT INTERACTION
Neutral Sentiment
 1104061464:
 "@autoerotique @skullcandy #crushers First pair
died after 2 days. Day 2 for new pair. The Alarm is
thrashing my head, un$#@&me these rock”
HIREVUE | TALENT INTERACTION
Conclusion
Sentiment Classification Count
Negative 6
Positive 6
Neutral 1
46% chance tweet is negative, now what?
Welcome to the majority of the sentiment solutions on
the market:
Single-word naïve Bayesian classification
HIREVUE | TALENT INTERACTION
Positive Sentiment (second pass)
 161547390:
 "Getting some skullcandy fix's. #tight #skullcandy #$#@&ingpumped"
 1306207039:
 "@skullcandy @VegasJarhead @justine_mom $#@& yeah!"
 1117713458:
 "@skullcandy $#@&in bass is badass",
 1117713458:
 "@skullcandy ur headphones are bad ass and have awsome $#@&in bass"
 1086228384:
 "Just bough a pair of Skullcandy supreme sound Hesh's $#@&ING AWSOME!!! the bass is truly amazing :)"
 132303540:
 "@K$#@&INGP I thought you were a man not a pussy. Try Skullcandy. Hit me back and I'll hook you up.”
 1104061464:
 "@autoerotique @skullcandy #crushers First pair died after 2 days. Day 2 for new pair. The Alarm is
thrashing my head, un$#@&me these rock”
HIREVUE | TALENT INTERACTION
Conclusion
Sentiment Classification Count
Negative 6
Positive ~0
Neutral ~0
~100% chance tweet is negative with tuple assistance. How to find complex
tuples automatically!?
Bayesian bootstrap matrix
Unique words in training cloud
Uniquewordsintrainingcloud
HIREVUE | TALENT INTERACTION
Basic sentiment output
Credit: Ben Peters
Keyword Negative positive
warranty 28.7 1
cant 11.8 1
back 11.8 1
break 11.8 1
after 11.1 1
what 9.1 1
never 9.1 1
Don’t 9.1 1
second 8.4 1
side 8.4 1
SPAM
I can’t handle this
HIREVUE | TALENT INTERACTION
HIREVUE | TALENT INTERACTION
Lost future
customer
HIREVUE | TALENT INTERACTION
SPAM examples:
>80%
HIREVUE | TALENT INTERACTION
SPAM list
Keyword spam good
@nikesb 52.0 1
@lrgskate 52.0 1
live 34.0 1
know 1 28.8
have 1 22.3
pair 1 16.3
earbud 16.1 1
Non-ascii-chars 12.4 1
some 1 11.9
check 1 11.6
Credit: Ben Peters
HIREVUE | TALENT INTERACTION
Training….
Where do you get your training set?
What about @#tags? Misspellings?  ?
HIREVUE | TALENT INTERACTION
Training….
Where do you get your training set?
What about @#tags? Misspellings?  ? SPAM?
HIREVUE | TALENT INTERACTION
Manual trainer
http://54.186.199.209/
Credit: Ben Peters
Levenshtein
Now things are getting interesting
HIREVUE | TALENT INTERACTION
The things we take for granted
You type: Awsome
Computer: It’s actually spelled Awesome
HIREVUE | TALENT INTERACTION
① kitten → sitten (substitution of "s" for "k")
② sitten → sittin (substitution of "i" for "e")
③ sittin → sitting (insertion of "g" at the end)
Levenshtein word level
Ref:
I am going skiing tomorrow
Hyp:
I am going skiing on Saturday
HIREVUE | TALENT INTERACTION
Levenshtein word-cloud level
Ref:
alphanumeric_sort(word_cloud_1)
alphanumeric_sort(unique(word_cloud_1))
Hyp:
alphanumeric_sort(word_cloud_2)
alphanumeric_sort(unique(word_cloud_2))
HIREVUE | TALENT INTERACTION
>> wer(str1,str1)
ans = 0
>> wer(strjoin(sort(strsplit(str1,' ')),' '),str1)
ans = 15
MapReduce
Great for
Text processing
i.e. word counts
HIREVUE | TALENT INTERACTION
CLUSTERING
Now things are getting interesting
HIREVUE | TALENT INTERACTION
Group of tweets?
 Once we have categorized tweets we can
build word clouds!!!
Category A
(could be negative sentiment,
low selling areas, etc..)
Category B
(could be positive sentiment,
high selling areas, etc..)
words
words
words
words
words
words
words
words
Levenshtein wordcloud similarity
Levenshtein wordcloud similarity
Cluster 1 example
Camping
VirginGamingBattlefield
Cluster 2 example
Skiing
winterStringray
Cluster 3 example
MMABoxing
Skateboarding
Twitter Surgery
- =
Training a blacklist filter
 Acting…
 Getting…
 Holding…
 Going…
 Brings…
 Turning..
Blacklist dictionary

Contenu connexe

Similaire à Text analytics intro

Presentation skills demo 1
Presentation skills demo 1Presentation skills demo 1
Presentation skills demo 1Rahul Guru
 
Social Proof Landing Page Conversion Optimization
Social Proof Landing Page Conversion OptimizationSocial Proof Landing Page Conversion Optimization
Social Proof Landing Page Conversion OptimizationThree Deep Marketing
 
Social Proof Tips to Boost Landing Page Conversions
Social Proof Tips to Boost Landing Page ConversionsSocial Proof Tips to Boost Landing Page Conversions
Social Proof Tips to Boost Landing Page ConversionsAngie Schottmuller
 
21 Content Marketing Tools and Tactics by @staceycav at #TTLPresents - Septem...
21 Content Marketing Tools and Tactics by @staceycav at #TTLPresents - Septem...21 Content Marketing Tools and Tactics by @staceycav at #TTLPresents - Septem...
21 Content Marketing Tools and Tactics by @staceycav at #TTLPresents - Septem...Stacey MacNaught
 
Step AFK: Practical Advice for Career Adavancement
Step AFK: Practical Advice for Career AdavancementStep AFK: Practical Advice for Career Adavancement
Step AFK: Practical Advice for Career AdavancementNathen Harvey
 
Lessons from Link Building in 2009 that Apply Today by @staceycav at #brighto...
Lessons from Link Building in 2009 that Apply Today by @staceycav at #brighto...Lessons from Link Building in 2009 that Apply Today by @staceycav at #brighto...
Lessons from Link Building in 2009 that Apply Today by @staceycav at #brighto...Stacey MacNaught
 
Enghouse Interactive@ICT Nspire2019
Enghouse Interactive@ICT Nspire2019Enghouse Interactive@ICT Nspire2019
Enghouse Interactive@ICT Nspire2019Enghouse Interactive
 
Doing customer development (and stop wasting your time)
Doing customer development (and stop wasting your time)Doing customer development (and stop wasting your time)
Doing customer development (and stop wasting your time)Hans van Gent
 
The Art of Finding Your Story: NetSquared Vancouver 2013-08-14
The Art of Finding Your Story: NetSquared Vancouver 2013-08-14The Art of Finding Your Story: NetSquared Vancouver 2013-08-14
The Art of Finding Your Story: NetSquared Vancouver 2013-08-14NetSquared Vancouver
 
Brief Lecture On Sentiment Analysis
Brief Lecture On Sentiment AnalysisBrief Lecture On Sentiment Analysis
Brief Lecture On Sentiment AnalysisDeolu Adeleye
 
Doing customer development (and stop wasting your time) - StartupBus edition
Doing customer development (and stop wasting your time) -  StartupBus editionDoing customer development (and stop wasting your time) -  StartupBus edition
Doing customer development (and stop wasting your time) - StartupBus editionHans van Gent
 
Canada Games How to Tweet Presentation
Canada Games How to Tweet PresentationCanada Games How to Tweet Presentation
Canada Games How to Tweet PresentationChristina Carew
 
Nonprofit management academy 2013
Nonprofit management academy 2013Nonprofit management academy 2013
Nonprofit management academy 2013Christoph Trappe
 
Dance Floor Theory - Air Force Training & Curriculum Conference
Dance Floor Theory - Air Force Training & Curriculum ConferenceDance Floor Theory - Air Force Training & Curriculum Conference
Dance Floor Theory - Air Force Training & Curriculum ConferenceTom Krieglstein
 
Communicating Across Channels - Iowa Nonprofit Summit 2013
Communicating Across Channels - Iowa Nonprofit Summit 2013Communicating Across Channels - Iowa Nonprofit Summit 2013
Communicating Across Channels - Iowa Nonprofit Summit 2013Christoph Trappe
 
The Soft Side of Software Development / Devoxx 2019
The Soft Side of Software Development / Devoxx 2019The Soft Side of Software Development / Devoxx 2019
The Soft Side of Software Development / Devoxx 2019🎤 Hanno Embregts 🎸
 
#hack - A desi guide to getting things done.
#hack - A desi guide to getting things done.#hack - A desi guide to getting things done.
#hack - A desi guide to getting things done.Jatin Khosla
 
BoS Conference USA 2019 Feedback Loop Workshop Slide Deck
BoS Conference USA 2019 Feedback Loop Workshop Slide DeckBoS Conference USA 2019 Feedback Loop Workshop Slide Deck
BoS Conference USA 2019 Feedback Loop Workshop Slide DeckBusiness of Software Conference
 

Similaire à Text analytics intro (20)

Presentation skills demo 1
Presentation skills demo 1Presentation skills demo 1
Presentation skills demo 1
 
Social Proof Landing Page Conversion Optimization
Social Proof Landing Page Conversion OptimizationSocial Proof Landing Page Conversion Optimization
Social Proof Landing Page Conversion Optimization
 
Social Proof Tips to Boost Landing Page Conversions
Social Proof Tips to Boost Landing Page ConversionsSocial Proof Tips to Boost Landing Page Conversions
Social Proof Tips to Boost Landing Page Conversions
 
21 Content Marketing Tools and Tactics by @staceycav at #TTLPresents - Septem...
21 Content Marketing Tools and Tactics by @staceycav at #TTLPresents - Septem...21 Content Marketing Tools and Tactics by @staceycav at #TTLPresents - Septem...
21 Content Marketing Tools and Tactics by @staceycav at #TTLPresents - Septem...
 
Code reviews
Code reviewsCode reviews
Code reviews
 
Step AFK: Practical Advice for Career Adavancement
Step AFK: Practical Advice for Career AdavancementStep AFK: Practical Advice for Career Adavancement
Step AFK: Practical Advice for Career Adavancement
 
Lessons from Link Building in 2009 that Apply Today by @staceycav at #brighto...
Lessons from Link Building in 2009 that Apply Today by @staceycav at #brighto...Lessons from Link Building in 2009 that Apply Today by @staceycav at #brighto...
Lessons from Link Building in 2009 that Apply Today by @staceycav at #brighto...
 
Enghouse Interactive@ICT Nspire2019
Enghouse Interactive@ICT Nspire2019Enghouse Interactive@ICT Nspire2019
Enghouse Interactive@ICT Nspire2019
 
Doing customer development (and stop wasting your time)
Doing customer development (and stop wasting your time)Doing customer development (and stop wasting your time)
Doing customer development (and stop wasting your time)
 
Social Media Playbook
Social Media PlaybookSocial Media Playbook
Social Media Playbook
 
The Art of Finding Your Story: NetSquared Vancouver 2013-08-14
The Art of Finding Your Story: NetSquared Vancouver 2013-08-14The Art of Finding Your Story: NetSquared Vancouver 2013-08-14
The Art of Finding Your Story: NetSquared Vancouver 2013-08-14
 
Brief Lecture On Sentiment Analysis
Brief Lecture On Sentiment AnalysisBrief Lecture On Sentiment Analysis
Brief Lecture On Sentiment Analysis
 
Doing customer development (and stop wasting your time) - StartupBus edition
Doing customer development (and stop wasting your time) -  StartupBus editionDoing customer development (and stop wasting your time) -  StartupBus edition
Doing customer development (and stop wasting your time) - StartupBus edition
 
Canada Games How to Tweet Presentation
Canada Games How to Tweet PresentationCanada Games How to Tweet Presentation
Canada Games How to Tweet Presentation
 
Nonprofit management academy 2013
Nonprofit management academy 2013Nonprofit management academy 2013
Nonprofit management academy 2013
 
Dance Floor Theory - Air Force Training & Curriculum Conference
Dance Floor Theory - Air Force Training & Curriculum ConferenceDance Floor Theory - Air Force Training & Curriculum Conference
Dance Floor Theory - Air Force Training & Curriculum Conference
 
Communicating Across Channels - Iowa Nonprofit Summit 2013
Communicating Across Channels - Iowa Nonprofit Summit 2013Communicating Across Channels - Iowa Nonprofit Summit 2013
Communicating Across Channels - Iowa Nonprofit Summit 2013
 
The Soft Side of Software Development / Devoxx 2019
The Soft Side of Software Development / Devoxx 2019The Soft Side of Software Development / Devoxx 2019
The Soft Side of Software Development / Devoxx 2019
 
#hack - A desi guide to getting things done.
#hack - A desi guide to getting things done.#hack - A desi guide to getting things done.
#hack - A desi guide to getting things done.
 
BoS Conference USA 2019 Feedback Loop Workshop Slide Deck
BoS Conference USA 2019 Feedback Loop Workshop Slide DeckBoS Conference USA 2019 Feedback Loop Workshop Slide Deck
BoS Conference USA 2019 Feedback Loop Workshop Slide Deck
 

Plus de Benjamin Taylor

Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesBenjamin Taylor
 
Predicting Candidate Performance From Text NLP
Predicting Candidate Performance From Text NLP Predicting Candidate Performance From Text NLP
Predicting Candidate Performance From Text NLP Benjamin Taylor
 
#SIOP15 Presentation On Performance Sorting Using Video Interviews
#SIOP15 Presentation On Performance Sorting Using Video Interviews#SIOP15 Presentation On Performance Sorting Using Video Interviews
#SIOP15 Presentation On Performance Sorting Using Video InterviewsBenjamin Taylor
 
#SIOP15 Presentation on
#SIOP15 Presentation on #SIOP15 Presentation on
#SIOP15 Presentation on Benjamin Taylor
 
How To Model Text Like A Rockstar
How To Model Text Like A RockstarHow To Model Text Like A Rockstar
How To Model Text Like A RockstarBenjamin Taylor
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Benjamin Taylor
 
How to simulate semiconductor yield
How to simulate semiconductor yieldHow to simulate semiconductor yield
How to simulate semiconductor yieldBenjamin Taylor
 
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionUtah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionBenjamin Taylor
 

Plus de Benjamin Taylor (11)

Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From Resumes
 
Deep learning for_devs
Deep learning for_devsDeep learning for_devs
Deep learning for_devs
 
Predicting Candidate Performance From Text NLP
Predicting Candidate Performance From Text NLP Predicting Candidate Performance From Text NLP
Predicting Candidate Performance From Text NLP
 
Python genetics
Python geneticsPython genetics
Python genetics
 
Homeless story
Homeless storyHomeless story
Homeless story
 
#SIOP15 Presentation On Performance Sorting Using Video Interviews
#SIOP15 Presentation On Performance Sorting Using Video Interviews#SIOP15 Presentation On Performance Sorting Using Video Interviews
#SIOP15 Presentation On Performance Sorting Using Video Interviews
 
#SIOP15 Presentation on
#SIOP15 Presentation on #SIOP15 Presentation on
#SIOP15 Presentation on
 
How To Model Text Like A Rockstar
How To Model Text Like A RockstarHow To Model Text Like A Rockstar
How To Model Text Like A Rockstar
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial
 
How to simulate semiconductor yield
How to simulate semiconductor yieldHow to simulate semiconductor yield
How to simulate semiconductor yield
 
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality predictionUtah, the greatest SMOG on earth. Harvesting data for air quality prediction
Utah, the greatest SMOG on earth. Harvesting data for air quality prediction
 

Dernier

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Dernier (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Text analytics intro

  • 1. Intro 2 text analytics | Ben Taylor @bentaylordata Text Analytics Are Awesome!
  • 2. Thank you to our Sponsors!
  • 3. HIREVUE | TALENT INTERACTION Agenda SPAM Levenshtein distance (word, sentence, cloud) 2 3 4 Text handling, introduction1 Map Reduce / Clustering5 Interview text analytics6 Sentiment
  • 4. Text handling Input not expected? HIREVUE | TALENT INTERACTION
  • 5. HIREVUE | TALENT INTERACTION Model  M Input Output
  • 6. HIREVUE | TALENT INTERACTION Model  M Input
  • 7. HIREVUE | TALENT INTERACTION Model  M Input Output Stderr: You’re an idiot & I don’t like you anymore
  • 8. HIREVUE | TALENT INTERACTION Input
  • 9. HIREVUE | TALENT INTERACTION @BENTAYLORDATA
  • 10. HIREVUE | TALENT INTERACTION
  • 11. HIREVUE | TALENT INTERACTION
  • 12. HIREVUE | TALENT INTERACTION @BENTAYLORDATA Need to map unstructured text to summary metric
  • 13. Sentiment How are you feeling? HIREVUE | TALENT INTERACTION
  • 14. HIREVUE | TALENT INTERACTION Let’s make this easy. Problem statement: Expletives + @skullcandy mention? Good or bad?
  • 15. HIREVUE | TALENT INTERACTION Negative Sentiment  1048940088:  "I've got two pairs of Ink'd earbuds by @Skullcandy and they both broke in two weeks. I $#@&ing hate @Skullcandy! #$#@&You”  1054044204:  “$#@& only one headphone stopped working stupid $#@&ing headphones y is it only one headphone i blame you @skullcandy”  1376767884:  "@skullcandy never buyin another pair of skull candy headphones this is the fourth pair in the last 2 months that $#@&ed up”  141343855:  “My headphones blew $#@& you skullcandy -___-”  16352011:  “BAHHHHH My SkullCandys are $#@&ing up AGAIN!”  1376767884:  "@skullcandy $#@& skullcandy"
  • 16. HIREVUE | TALENT INTERACTION Positive Sentiment  161547390:  "Getting some skullcandy fix's. #tight #skullcandy #$#@&ingpumped"  1306207039:  "@skullcandy @VegasJarhead @justine_mom $#@& yeah!"  1117713458:  "@skullcandy $#@&in bass is badass",  1117713458:  "@skullcandy ur headphones are bad ass and have awsome $#@&in bass"  1086228384:  "Just bough a pair of Skullcandy supreme sound Hesh's $#@&ING AWSOME!!! the bass is truly amazing :)"  132303540:  "@K$#@&INGP I thought you were a man not a pussy. Try Skullcandy. Hit me back and I'll hook you up."
  • 17. HIREVUE | TALENT INTERACTION Neutral Sentiment  1104061464:  "@autoerotique @skullcandy #crushers First pair died after 2 days. Day 2 for new pair. The Alarm is thrashing my head, un$#@&me these rock”
  • 18. HIREVUE | TALENT INTERACTION Conclusion Sentiment Classification Count Negative 6 Positive 6 Neutral 1 46% chance tweet is negative, now what? Welcome to the majority of the sentiment solutions on the market: Single-word naïve Bayesian classification
  • 19. HIREVUE | TALENT INTERACTION Positive Sentiment (second pass)  161547390:  "Getting some skullcandy fix's. #tight #skullcandy #$#@&ingpumped"  1306207039:  "@skullcandy @VegasJarhead @justine_mom $#@& yeah!"  1117713458:  "@skullcandy $#@&in bass is badass",  1117713458:  "@skullcandy ur headphones are bad ass and have awsome $#@&in bass"  1086228384:  "Just bough a pair of Skullcandy supreme sound Hesh's $#@&ING AWSOME!!! the bass is truly amazing :)"  132303540:  "@K$#@&INGP I thought you were a man not a pussy. Try Skullcandy. Hit me back and I'll hook you up.”  1104061464:  "@autoerotique @skullcandy #crushers First pair died after 2 days. Day 2 for new pair. The Alarm is thrashing my head, un$#@&me these rock”
  • 20. HIREVUE | TALENT INTERACTION Conclusion Sentiment Classification Count Negative 6 Positive ~0 Neutral ~0 ~100% chance tweet is negative with tuple assistance. How to find complex tuples automatically!? Bayesian bootstrap matrix Unique words in training cloud Uniquewordsintrainingcloud
  • 21. HIREVUE | TALENT INTERACTION Basic sentiment output Credit: Ben Peters Keyword Negative positive warranty 28.7 1 cant 11.8 1 back 11.8 1 break 11.8 1 after 11.1 1 what 9.1 1 never 9.1 1 Don’t 9.1 1 second 8.4 1 side 8.4 1
  • 22. SPAM I can’t handle this HIREVUE | TALENT INTERACTION
  • 23. HIREVUE | TALENT INTERACTION Lost future customer
  • 24. HIREVUE | TALENT INTERACTION SPAM examples: >80%
  • 25. HIREVUE | TALENT INTERACTION SPAM list Keyword spam good @nikesb 52.0 1 @lrgskate 52.0 1 live 34.0 1 know 1 28.8 have 1 22.3 pair 1 16.3 earbud 16.1 1 Non-ascii-chars 12.4 1 some 1 11.9 check 1 11.6 Credit: Ben Peters
  • 26. HIREVUE | TALENT INTERACTION Training…. Where do you get your training set? What about @#tags? Misspellings?  ?
  • 27. HIREVUE | TALENT INTERACTION Training…. Where do you get your training set? What about @#tags? Misspellings?  ? SPAM?
  • 28. HIREVUE | TALENT INTERACTION Manual trainer http://54.186.199.209/ Credit: Ben Peters
  • 29. Levenshtein Now things are getting interesting HIREVUE | TALENT INTERACTION
  • 30. The things we take for granted You type: Awsome Computer: It’s actually spelled Awesome HIREVUE | TALENT INTERACTION ① kitten → sitten (substitution of "s" for "k") ② sitten → sittin (substitution of "i" for "e") ③ sittin → sitting (insertion of "g" at the end)
  • 31. Levenshtein word level Ref: I am going skiing tomorrow Hyp: I am going skiing on Saturday HIREVUE | TALENT INTERACTION
  • 33. MapReduce Great for Text processing i.e. word counts HIREVUE | TALENT INTERACTION
  • 34. CLUSTERING Now things are getting interesting HIREVUE | TALENT INTERACTION
  • 35. Group of tweets?  Once we have categorized tweets we can build word clouds!!! Category A (could be negative sentiment, low selling areas, etc..) Category B (could be positive sentiment, high selling areas, etc..) words words words words words words words words
  • 42. Training a blacklist filter  Acting…  Getting…  Holding…  Going…  Brings…  Turning.. Blacklist dictionary