SlideShare une entreprise Scribd logo
1  sur  80
Text Analytics:
From Colored Pens and Crumbly Papers to
Custom Machine Classifiers forTwitter
Dr. StuartW. Shulman
Founder & CEO,Texifter
@stuartwshulman
“…a wealth of information creates a poverty of attention.”
- Herbert Simon, 1971
Presentation Outline
1. Moving from pen and paper to machine-learning
2. Overview of the spectrum methods
3. Portfolio identification using the five pillars
4. HowTwitter data is relevant to evaluation of theWBG
Our Core Philosophy
Emergent properties in very well read texts such as
the archetypal “extremist agent of the law”
Agenda-Setting in the Progressive Era Print Press
Relations between Classes
Rates andTerms for Credit
Farm Profitability
Cost of Living
Soil Fertility
Education
Exploration
Speculation
Coding
Validation
Qualitative Methods: Genes,Taste, orTactic?
• Qualitative by birth or choice?
• Some look to words as an alternative to number crunching
• Others rooted in rich and meaningful interpretive traditions
• Another group is fluent in both qual & quant
• Mixed methods open up rather than limits fields of knowledge
• One central goal is valid inferences about phenomena
• Replicable and transparent methods
• Attention to error and corrective measures
• Internal and external validation of results
• Using computers for qualitative data analysis helps, but…
• Rigor still originates with the research design, not the technology
• Software makes better organization and efficiency possible
• Coders enable the researcher to step back while scaling up
Purist Pluralist Positivist
A Spectrum of Approaches toWorking with Qualitative Data
Different types of knowledge claims depending where you sit
deep immersion
closeness to data
antipathy to numbers
credible interpretation
in-depth analysis
contextual
subjective
experimental
mixed method
adaptive hybrid
flexible approach
interdisciplinary
quantitative
focus on error
measurement critical
validity and reliability
replication & objectivity
generalization
hypotheses
These choices can be philosophical, ideological, and ethical in nature
Stuart W. Shulman. 2003. "An Experiment in Digital Government at the
United States National Organic Program," Agriculture and Human Values
20(3), 253-265.
CodingWeb Sites and Focus Groups to Study Agenda-Setting
Annotation to Improve Optical Character Recognition
Over 13,000 hours of video and audio were recorded of the public spaces in a LTC facility’s dementia unit in
suburban Pittsburgh, PA. A codebook of 80+ codes was developed to categorize the behavior of the consenting
residents and staff (only in relation to patients). 22 coders spent more than 4,400 hours over a period of 22
months coding the video data.The data were coded using the Informedia DigitalVideo Library (IDVL), an
interface designed by computer scientists at Carnegie Mellon University.
An Incredibly Important Book
Grimmer & Stewart
“Text as Data” Political Analysis (2013)
Volume is a problem for scholars
Coders are expensive
Groups struggle to accurately label text at scale
Validation of both humans and machines is essential
Some models are easier to validate than others
All models are wrong
Automated models enhance/amplify, but don’t replace humans
There is no one right way to do this
“Validate, validate, validate”
“What should be avoided then, is the blind use of
any method without a validation step.”
Free, Open-Source,Web-basedText AnalyticsToolkit
Original Software Kernel:Tools for Measurement
Text Classification
A 2500 year-old problem
Plato argued it would be frustrating; it still is
Software cannot remove the problem
Computer Science and National Science Foundation
Influences in a Nutshell: Measure Everything!
Fast?
Reliable?
Accurate?
Valid?
Interrater Reliability: A Critical Measurement
Adjudication: Creating a Gold Standard
CoderRank is our key innovation
Patent issued in 2016
Service Mark issued 2017
CoderRank for Enhanced Machine-Learning
CoderRank is to text analytics what PageRank was to search.
Just as Google said not all web pages are created equal,
Texifter argues that not all humans are created equal.When
training machines, it is best to rely most on the humans most
likely to create a valid observation.We proposed a unique way
to rank humans on trust and knowledge vectors.
Pronounced “tech-sifter”- the metaphor is of a sifter
AvoidTennis Elbow
Items load to the screen and the coder hits the keystroke
Keystroke Human Coding
Human coding distributed to individuals, groups & crowds
Data
Code
s
The Five Pillars ofText Analytics
Search
Metadata Filtering
De-duplication and Clustering
Human Coding
Machine-Learning
Pillar One: Search
Pillar One: Defined Multi-term Search
PillarTwo: Metadata Filters
Slicing big piles of text into smaller, more focused sets is key
AllTextAnalytics are FilteringTechniques
Users Drill Into Interactive Displays
Use metadata to examine sub-sets of responses and create reports
PillarThree: Duplicate Detection & Clustering
Latent Dirichlet Allocation (LDA)Topic Models
Topic Composition (SampleTerms and Phrases)
• Topic 1: development; projects; support; financials
• Topic 2: education; training; standards; teaching
• Topic 3: health; coverage; ministry; information
• Topic 4: government; investment; social; policy
• Topic 5: administrative; coordination; market; procedures
• Topic 6: institutional; technical; strengthening; programs
• Topic 7: infrastructure; rehabilitation; maintenance; upgrading
• Topic 8: utility; company; privatization; restructuring; supply
Pillar Four: Human Coding (Labeling orTagging)
Human Coding Converted into Machine Classifiers
Accumulated human coding becomes
training data via machine-learning
Simplified Coding Management
Crowdsourcing accelerates the insight generation process
Synchronous & Asynchronous Collaboration
Pillar Five: Machine-Learning
Our ActiveLearning engine and coding tools combine…
what humans do best… with what computers do best
Humans and machines learning together
Keep humans “in-the-loop” for more accurate results and better insights
Boolean Operators Cannot Solve Every Problem
There are language problems well-suited to machine-learning
We are all training classifiers in daily life
Spam filtering gave way to Amazon & Netflix
Humans and machines are constantly learning together
Interested in Money Banks?
Researching a Politician?
Doing Smoking Research?
Brands?
Studying a SportsTeam?
Super Bowl HistoryVersus Political History?
Twitter Can Feel Overwhelming
Full HistoricalTwitter Access with Free Estimates
PowerTrack Operators for Precise Queries
Create andTest Rules Self Serve
Three Estimates Per Day SentVia Email
# of
Tweets
Cost
Twitter Data Should Be Human Coded
Using theTwitter Display
The rush to CSV is a mistake; data is degraded
Data
Data
Live
Data
Live
Data
Data
Contents
Network
Time Series
Description
Author Description
Overall Metrics
Top Influencers
Top URLs
Top Domains
Top Hashtags
Top Words
Top Word Pairs
Top Replied-To
Top Mentioned
Top Tweeters
Network
sdonnan
Tweet Follow
WorldBank
Tweet Follow
CraigHammerd
Tweet Follow
bijancbayne
Tweet Follow
YouTube
Tweet Follow
TweetsAnup
Tweet Follow
realDonaldTrum
p
Tweet Follow
Nik_6996
Tweet Follow
jeremyhillman
Tweet Follow
alanBStardmp
Created with NodeXL
(http://nodexl.codeplex.com)
from the Social Media Research Foundation
(http://www.smrfoundation.org)
Special Enterprise License Keys
Everyone can have one
Request via an email: info@texifter.com
For more information
discovertext.com
@discovertext
Thank-you for listening!
Dr. Stuart Shulman
@stuartwshulman

Contenu connexe

Tendances

Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeTyrone Grandison
 
Don't Handicap AI without Explicit Knowledge
Don't Handicap AI  without Explicit KnowledgeDon't Handicap AI  without Explicit Knowledge
Don't Handicap AI without Explicit KnowledgeAmit Sheth
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...QuantUniversity
 
Welcome - 2011 Text Analytics Summit
Welcome - 2011 Text Analytics SummitWelcome - 2011 Text Analytics Summit
Welcome - 2011 Text Analytics SummitSeth Grimes
 
Acg Terr Sand2004 2130w
Acg Terr Sand2004 2130wAcg Terr Sand2004 2130w
Acg Terr Sand2004 2130wNKHAYDEN
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Fredrik Olsson
 
CoderRank: Creating Gold Standards
CoderRank: Creating Gold StandardsCoderRank: Creating Gold Standards
CoderRank: Creating Gold StandardsStuart Shulman
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA DallasQuantUniversity
 
Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)Vladimir Kanchev
 
Ethical Issues in Machine Learning Algorithms (Part 2)
Ethical Issues in Machine Learning Algorithms (Part 2)Ethical Issues in Machine Learning Algorithms (Part 2)
Ethical Issues in Machine Learning Algorithms (Part 2)Vladimir Kanchev
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceAbhishek Upadhyay
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial IntelligenceZavain Dar
 
Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Krishnaram Kenthapadi
 
Evaluating the impact of removing less important terms on sentiment analysis
Evaluating the impact of removing less important terms on sentiment analysisEvaluating the impact of removing less important terms on sentiment analysis
Evaluating the impact of removing less important terms on sentiment analysisConference Papers
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)Sonu Gupta
 

Tendances (18)

Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With Purpose
 
Cognitive computing
Cognitive computing Cognitive computing
Cognitive computing
 
Don't Handicap AI without Explicit Knowledge
Don't Handicap AI  without Explicit KnowledgeDon't Handicap AI  without Explicit Knowledge
Don't Handicap AI without Explicit Knowledge
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
 
Big data
Big data Big data
Big data
 
Welcome - 2011 Text Analytics Summit
Welcome - 2011 Text Analytics SummitWelcome - 2011 Text Analytics Summit
Welcome - 2011 Text Analytics Summit
 
Acg Terr Sand2004 2130w
Acg Terr Sand2004 2130wAcg Terr Sand2004 2130w
Acg Terr Sand2004 2130w
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
CoderRank: Creating Gold Standards
CoderRank: Creating Gold StandardsCoderRank: Creating Gold Standards
CoderRank: Creating Gold Standards
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA Dallas
 
Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)
 
Ethical Issues in Machine Learning Algorithms (Part 2)
Ethical Issues in Machine Learning Algorithms (Part 2)Ethical Issues in Machine Learning Algorithms (Part 2)
Ethical Issues in Machine Learning Algorithms (Part 2)
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of Intelligence
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial Intelligence
 
Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)Responsible AI in Industry (ICML 2021 Tutorial)
Responsible AI in Industry (ICML 2021 Tutorial)
 
Evaluating the impact of removing less important terms on sentiment analysis
Evaluating the impact of removing less important terms on sentiment analysisEvaluating the impact of removing less important terms on sentiment analysis
Evaluating the impact of removing less important terms on sentiment analysis
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)
 

Similaire à Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classifiers for Twitter

Qual, Mixed, Machine and Everything in Between
Qual, Mixed, Machine and Everything in BetweenQual, Mixed, Machine and Everything in Between
Qual, Mixed, Machine and Everything in BetweenStuart Shulman
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodKarry Lu
 
Measuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classificationMeasuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classificationStuart Shulman
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressedBonnie Holub
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressedBonnie Holub
 
Digital Reasoning at AirSummit 2014
Digital Reasoning at AirSummit 2014Digital Reasoning at AirSummit 2014
Digital Reasoning at AirSummit 2014Marten den Haring
 
Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)Paolo Missier
 
How to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inHow to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inMary Chitty
 
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...Pistoia Alliance
 
PatternLanguageOfData
PatternLanguageOfDataPatternLanguageOfData
PatternLanguageOfDatakimErwin
 
Health information professionals and Artificial Intelligence
Health information professionals and Artificial IntelligenceHealth information professionals and Artificial Intelligence
Health information professionals and Artificial Intelligencecoxamcoxam
 
Black Box Learning Analytics? Beyond Algorithmic Transparency
Black Box Learning Analytics? Beyond Algorithmic TransparencyBlack Box Learning Analytics? Beyond Algorithmic Transparency
Black Box Learning Analytics? Beyond Algorithmic TransparencySimon Buckingham Shum
 
Future of text analysis forrester briefing
Future of text analysis   forrester briefingFuture of text analysis   forrester briefing
Future of text analysis forrester briefingStuart Shulman
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming DatacentricTimothy Cook
 
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...Brainframes, digital technologies and connected intelligence -Derrick de Kerc...
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...thiteu
 
Cognitive future part 1
Cognitive future part 1Cognitive future part 1
Cognitive future part 1Peter Tutty
 
Cognitive future part 1
Cognitive future part 1Cognitive future part 1
Cognitive future part 1Peter Tutty
 

Similaire à Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classifiers for Twitter (20)

Qual, Mixed, Machine and Everything in Between
Qual, Mixed, Machine and Everything in BetweenQual, Mixed, Machine and Everything in Between
Qual, Mixed, Machine and Everything in Between
 
Summit slide loop ny
Summit slide loop nySummit slide loop ny
Summit slide loop ny
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
Measuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classificationMeasuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classification
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
Minne analytics presentation 2018 12 03 final compressed
Minne analytics presentation 2018 12 03 final   compressedMinne analytics presentation 2018 12 03 final   compressed
Minne analytics presentation 2018 12 03 final compressed
 
Digital Reasoning at AirSummit 2014
Digital Reasoning at AirSummit 2014Digital Reasoning at AirSummit 2014
Digital Reasoning at AirSummit 2014
 
Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)Transparency in ML and AI (humble views from a concerned academic)
Transparency in ML and AI (humble views from a concerned academic)
 
How to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inHow to create a taxonomy for management buy-in
How to create a taxonomy for management buy-in
 
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
 
PatternLanguageOfData
PatternLanguageOfDataPatternLanguageOfData
PatternLanguageOfData
 
Health information professionals and Artificial Intelligence
Health information professionals and Artificial IntelligenceHealth information professionals and Artificial Intelligence
Health information professionals and Artificial Intelligence
 
Black Box Learning Analytics? Beyond Algorithmic Transparency
Black Box Learning Analytics? Beyond Algorithmic TransparencyBlack Box Learning Analytics? Beyond Algorithmic Transparency
Black Box Learning Analytics? Beyond Algorithmic Transparency
 
Future of text analysis forrester briefing
Future of text analysis   forrester briefingFuture of text analysis   forrester briefing
Future of text analysis forrester briefing
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Becoming Datacentric
Becoming DatacentricBecoming Datacentric
Becoming Datacentric
 
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...Brainframes, digital technologies and connected intelligence -Derrick de Kerc...
Brainframes, digital technologies and connected intelligence -Derrick de Kerc...
 
Cognitive future part 1
Cognitive future part 1Cognitive future part 1
Cognitive future part 1
 
Cognitive future part 1
Cognitive future part 1Cognitive future part 1
Cognitive future part 1
 

Plus de Stuart Shulman

Fear and loathing on the social campaign trail
Fear and loathing on the social campaign trailFear and loathing on the social campaign trail
Fear and loathing on the social campaign trailStuart Shulman
 
Fear and Loathing on the Social Campaign Trail
Fear and Loathing on the Social Campaign TrailFear and Loathing on the Social Campaign Trail
Fear and Loathing on the Social Campaign TrailStuart Shulman
 
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!Stuart Shulman
 
Text Analytics for Social Data Using DiscoverText & Sifter
 Text Analytics for Social Data Using DiscoverText & Sifter Text Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & SifterStuart Shulman
 
Text Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & SifterText Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & SifterStuart Shulman
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningStuart Shulman
 
CAQDAS 2014 Pecha Kucha - Stuart Shulman
CAQDAS 2014 Pecha Kucha - Stuart ShulmanCAQDAS 2014 Pecha Kucha - Stuart Shulman
CAQDAS 2014 Pecha Kucha - Stuart ShulmanStuart Shulman
 
Technology for Citizen Voices
Technology for Citizen VoicesTechnology for Citizen Voices
Technology for Citizen VoicesStuart Shulman
 
DiscoverText: Tools for Text
DiscoverText: Tools for TextDiscoverText: Tools for Text
DiscoverText: Tools for TextStuart Shulman
 
Citizen Voices in a Networked Age of #BigData
Citizen Voices in a Networked Age of #BigDataCitizen Voices in a Networked Age of #BigData
Citizen Voices in a Networked Age of #BigDataStuart Shulman
 
DiscoverText Product Overview
DiscoverText Product OverviewDiscoverText Product Overview
DiscoverText Product OverviewStuart Shulman
 
Importing bulk outlook email into DiscoverText - the .pst file upload
Importing bulk outlook email into DiscoverText - the .pst file uploadImporting bulk outlook email into DiscoverText - the .pst file upload
Importing bulk outlook email into DiscoverText - the .pst file uploadStuart Shulman
 

Plus de Stuart Shulman (14)

Fear and loathing on the social campaign trail
Fear and loathing on the social campaign trailFear and loathing on the social campaign trail
Fear and loathing on the social campaign trail
 
Fear and Loathing on the Social Campaign Trail
Fear and Loathing on the Social Campaign TrailFear and Loathing on the Social Campaign Trail
Fear and Loathing on the Social Campaign Trail
 
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!
 
Text Analytics for Social Data Using DiscoverText & Sifter
 Text Analytics for Social Data Using DiscoverText & Sifter Text Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & Sifter
 
Text Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & SifterText Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & Sifter
 
Twitter for Research
Twitter for ResearchTwitter for Research
Twitter for Research
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
 
CAQDAS 2014 Pecha Kucha - Stuart Shulman
CAQDAS 2014 Pecha Kucha - Stuart ShulmanCAQDAS 2014 Pecha Kucha - Stuart Shulman
CAQDAS 2014 Pecha Kucha - Stuart Shulman
 
Technology for Citizen Voices
Technology for Citizen VoicesTechnology for Citizen Voices
Technology for Citizen Voices
 
DiscoverText: Tools for Text
DiscoverText: Tools for TextDiscoverText: Tools for Text
DiscoverText: Tools for Text
 
Citizen Voices in a Networked Age of #BigData
Citizen Voices in a Networked Age of #BigDataCitizen Voices in a Networked Age of #BigData
Citizen Voices in a Networked Age of #BigData
 
DiscoverText Product Overview
DiscoverText Product OverviewDiscoverText Product Overview
DiscoverText Product Overview
 
Importing bulk outlook email into DiscoverText - the .pst file upload
Importing bulk outlook email into DiscoverText - the .pst file uploadImporting bulk outlook email into DiscoverText - the .pst file upload
Importing bulk outlook email into DiscoverText - the .pst file upload
 
Texifter
TexifterTexifter
Texifter
 

Dernier

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 

Dernier (20)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 

Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classifiers for Twitter

  • 1. Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classifiers forTwitter Dr. StuartW. Shulman Founder & CEO,Texifter @stuartwshulman “…a wealth of information creates a poverty of attention.” - Herbert Simon, 1971
  • 2. Presentation Outline 1. Moving from pen and paper to machine-learning 2. Overview of the spectrum methods 3. Portfolio identification using the five pillars 4. HowTwitter data is relevant to evaluation of theWBG
  • 4. Emergent properties in very well read texts such as the archetypal “extremist agent of the law”
  • 5. Agenda-Setting in the Progressive Era Print Press
  • 6. Relations between Classes Rates andTerms for Credit Farm Profitability Cost of Living Soil Fertility Education Exploration Speculation Coding Validation
  • 7. Qualitative Methods: Genes,Taste, orTactic? • Qualitative by birth or choice? • Some look to words as an alternative to number crunching • Others rooted in rich and meaningful interpretive traditions • Another group is fluent in both qual & quant • Mixed methods open up rather than limits fields of knowledge • One central goal is valid inferences about phenomena • Replicable and transparent methods • Attention to error and corrective measures • Internal and external validation of results • Using computers for qualitative data analysis helps, but… • Rigor still originates with the research design, not the technology • Software makes better organization and efficiency possible • Coders enable the researcher to step back while scaling up
  • 8. Purist Pluralist Positivist A Spectrum of Approaches toWorking with Qualitative Data Different types of knowledge claims depending where you sit deep immersion closeness to data antipathy to numbers credible interpretation in-depth analysis contextual subjective experimental mixed method adaptive hybrid flexible approach interdisciplinary quantitative focus on error measurement critical validity and reliability replication & objectivity generalization hypotheses These choices can be philosophical, ideological, and ethical in nature
  • 9. Stuart W. Shulman. 2003. "An Experiment in Digital Government at the United States National Organic Program," Agriculture and Human Values 20(3), 253-265.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. CodingWeb Sites and Focus Groups to Study Agenda-Setting
  • 15. Annotation to Improve Optical Character Recognition
  • 16. Over 13,000 hours of video and audio were recorded of the public spaces in a LTC facility’s dementia unit in suburban Pittsburgh, PA. A codebook of 80+ codes was developed to categorize the behavior of the consenting residents and staff (only in relation to patients). 22 coders spent more than 4,400 hours over a period of 22 months coding the video data.The data were coded using the Informedia DigitalVideo Library (IDVL), an interface designed by computer scientists at Carnegie Mellon University.
  • 17.
  • 19. Grimmer & Stewart “Text as Data” Political Analysis (2013) Volume is a problem for scholars Coders are expensive Groups struggle to accurately label text at scale Validation of both humans and machines is essential Some models are easier to validate than others All models are wrong Automated models enhance/amplify, but don’t replace humans There is no one right way to do this “Validate, validate, validate” “What should be avoided then, is the blind use of any method without a validation step.”
  • 22. Text Classification A 2500 year-old problem Plato argued it would be frustrating; it still is Software cannot remove the problem
  • 23. Computer Science and National Science Foundation Influences in a Nutshell: Measure Everything! Fast? Reliable? Accurate? Valid?
  • 24. Interrater Reliability: A Critical Measurement
  • 25. Adjudication: Creating a Gold Standard
  • 26. CoderRank is our key innovation Patent issued in 2016 Service Mark issued 2017
  • 27. CoderRank for Enhanced Machine-Learning CoderRank is to text analytics what PageRank was to search. Just as Google said not all web pages are created equal, Texifter argues that not all humans are created equal.When training machines, it is best to rely most on the humans most likely to create a valid observation.We proposed a unique way to rank humans on trust and knowledge vectors.
  • 28. Pronounced “tech-sifter”- the metaphor is of a sifter
  • 29.
  • 30.
  • 31. AvoidTennis Elbow Items load to the screen and the coder hits the keystroke
  • 32. Keystroke Human Coding Human coding distributed to individuals, groups & crowds Data Code s
  • 33. The Five Pillars ofText Analytics Search Metadata Filtering De-duplication and Clustering Human Coding Machine-Learning
  • 35. Pillar One: Defined Multi-term Search
  • 36.
  • 37.
  • 39.
  • 40.
  • 41. Slicing big piles of text into smaller, more focused sets is key AllTextAnalytics are FilteringTechniques
  • 42.
  • 43. Users Drill Into Interactive Displays Use metadata to examine sub-sets of responses and create reports
  • 45.
  • 46. Latent Dirichlet Allocation (LDA)Topic Models
  • 47. Topic Composition (SampleTerms and Phrases) • Topic 1: development; projects; support; financials • Topic 2: education; training; standards; teaching • Topic 3: health; coverage; ministry; information • Topic 4: government; investment; social; policy • Topic 5: administrative; coordination; market; procedures • Topic 6: institutional; technical; strengthening; programs • Topic 7: infrastructure; rehabilitation; maintenance; upgrading • Topic 8: utility; company; privatization; restructuring; supply
  • 48. Pillar Four: Human Coding (Labeling orTagging)
  • 49. Human Coding Converted into Machine Classifiers Accumulated human coding becomes training data via machine-learning
  • 51. Crowdsourcing accelerates the insight generation process Synchronous & Asynchronous Collaboration
  • 52.
  • 53.
  • 55. Our ActiveLearning engine and coding tools combine… what humans do best… with what computers do best Humans and machines learning together Keep humans “in-the-loop” for more accurate results and better insights
  • 56. Boolean Operators Cannot Solve Every Problem There are language problems well-suited to machine-learning We are all training classifiers in daily life Spam filtering gave way to Amazon & Netflix Humans and machines are constantly learning together
  • 62. Super Bowl HistoryVersus Political History?
  • 63. Twitter Can Feel Overwhelming
  • 64.
  • 65. Full HistoricalTwitter Access with Free Estimates
  • 66. PowerTrack Operators for Precise Queries
  • 67. Create andTest Rules Self Serve
  • 68. Three Estimates Per Day SentVia Email # of Tweets Cost
  • 69. Twitter Data Should Be Human Coded Using theTwitter Display The rush to CSV is a mistake; data is degraded Data Data Live Data Live Data Data
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77. Contents Network Time Series Description Author Description Overall Metrics Top Influencers Top URLs Top Domains Top Hashtags Top Words Top Word Pairs Top Replied-To Top Mentioned Top Tweeters Network sdonnan Tweet Follow WorldBank Tweet Follow CraigHammerd Tweet Follow bijancbayne Tweet Follow YouTube Tweet Follow TweetsAnup Tweet Follow realDonaldTrum p Tweet Follow Nik_6996 Tweet Follow jeremyhillman Tweet Follow alanBStardmp Created with NodeXL (http://nodexl.codeplex.com) from the Social Media Research Foundation (http://www.smrfoundation.org)
  • 78.
  • 79. Special Enterprise License Keys Everyone can have one Request via an email: info@texifter.com
  • 80. For more information discovertext.com @discovertext Thank-you for listening! Dr. Stuart Shulman @stuartwshulman