This presentation presents recent research into definitions of analytics through analysis of related job adverts. The results help us identify a new categorisation of analytics methodologies, and discusses the implications for the operational research community.
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
A Topic Model of Analytics Job Adverts (Operational Research Society Annual Conference, Sept 2013)
1.
2. Agenda
2
Problem Summary
Confusion about
precise definition of
analytics
Benefit of ‘practical’
definitions
Issues with the
conventional ‘practical’
model of analytics
Model Details
Data source: ‘analytics’
job adverts
Topic modeling &
Latent Dirichlet
Allocation
Model build & data
pre-processing
Implications
Model analysis
An alternative
definition of analytics
Implications for OR/MS
3. Analytics is …
3
…. delivering the right
decision support to the right
people at the right time.
Laursen & Thorlund, 2010, p XII
… the scientific process of
transforming data into insight
for making better decisions
INFORMS
… [the] technologies, systems,
practices, & applications to analyze
critical business data so as to gain
new insights
Lim et al, 2012
… the extensive use of data, statistical
& quantitative analysis, explanatory &
predictive models, & fact-based
management to drive
decisions & actions.
Davenport & Harris , 2007, p 7
… an outgrowth of what is known as
business intelligence *…+ Today’s
expansive, global enterprises generate a
deluge of data that is impossible for a
human to make sense of.
Varshney & Mojsilovic, 2011
Analytics with a capital "A" is an
umbrella term that represents
our industry at a macro level,
and analytics with a small "a"
refers to technology used to
analyze data.
Eckerson, 2011
… information-intensive concepts
and methods to improve business
decision making.
Chiang et al, 2012
… is the process of obtaining
an optimal and realistic
decision based on existing data
Hamel, 2011
… data analysis that changes the
behavior of the organization
Hackathom, 2010
the science of analysis
… the science of analysis
Wikipedia
… the method of logical
analysis
Meriam Webster
… the brains to cloud
computing’s brawn
Croll, 2011
… the process of transforming data,
from a variety of sources and of a
variety of types, into insights that
support, improve and/or automate
business decisions, using
technological, quantitative and
presentation techniques
Mortenson et al, 2013
… a group of approaches, organizational
procedures and tools used in combination
with one another to gain information,
analyze that information, and predict
outcomes of problem solutions
Trkman et al, 2010
… the use of data, information
technology, statistical analysis, quantitative
methods, and mathematical or computer-based
models to help managers gain improved insight
about their business operations and make
better, fact-based decisions
Evans, 2012
• Many contrasting and often contradictory definitions
• Particularly difficult to distinguish analytics from
business intelligence or similar fields
• Does it matter?
Potential confusion
As analytics is multi-disciplinary it is important
that a common language can be established
Important so that the growing job market can be
met with the appropriate training
What is Analytics?
4. Analytics: Practical Definition
4
Source: Blackett, 2012
Advantages
• Focuses on application &
generation of value
• Demonstrates the
disciplines informing
analytics
Issues
• Some methods suggest
different purposes
• Suggesting progression to
prescriptive as advanced
may not always hold
5. Job Adverts
5
• Analyse “analytics” job adverts – following the tradition of
‘ASP’ studies (e.g. Liberatore and Luo, 2012)
• Instead of studying a smaller pool of jobs, we access
through the LinkedIn API
Over 250k jobs online
77% of all jobs are posted on LinkedIn (Dougherty, 2012)
• Scripted using Python & stored in MongoDB
OAuth, SimpleJSON, & PyMongo
• Need to reduce and generalise results from >6,800 adverts
with >50,000 unique words.
6. Topic Models
6
• Topic models assume documents to be a collection of
latent topics. The topics determine which words are used
• Probabilistic models that determine the topics by analysis
of the co-occurrence of the words used
• The most common are Probabilistic Latent Semantic
Indexing (pLSI) and Latent Dirichlet Allocation (LDA)
7. Latent Dirichlet Allocation (LDA)
7
• Basic conception is that a collection of documents has
three layers and contains:
Documents
Words
Words
W
Topics
Z
Topic
Distribution
Ө
Alpha
Parameter
α
Beta
Parameter
β
Adapted from Blei et al, 2003N M
8. Latent Dirichlet Allocation - Process
8
• Model is built by:
1. Estimating topics as product of observed words
2. Use to estimate document topic proportions
3. Evaluate corpus based on the distributions suggested in
(1) & (2)
4. Use (3) to improve topic estimations (1)
5. Reiterate until best fit found
9. Latent Dirichlet Allocation - Assumptions
9
• Bag-of-words / exchangeability
• The number of topics is known and pre-determined (K )
Cross-validation to identify K with the lowest perplexity
• Topic independence
As α is a parameter of a Dirichlet prior, each topic is assumed to
be independent and not correlated
In this research correlation between topics has to be assumed.
Alternative is the correlated topic model (Blei & Lafferty, 2007),
which uses a logistic normal rather than a Dirichlet distribution
10. Data Pre-Processing & Model Build
10
• Strip HTML / XML
• Remove stop words, numbers and punctuation
• Remove words < 3 characters
• Remove most and least frequent words
Python: HTMLParser, GenSim and String
R: TM and TopicModels
• To stem or not to stem?
"the job involves managing analytics projects"
"the job involves the management of analytical projects“
"has experience running projects using management science and analytics"
"managing a team of scientists analysing the experience of runners"
11. Topic Results
• 30 topics identified
• All topics are created equally but some are more topical
than others
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Most Likely Topic per Document as % of Corpus
11
12. Most Likely Terms in Topics
• Analysis of the 3rd, 4th & 5th most likely topics
Digital & Web (8%)
Topic 3 (4th
)
other media
across working
understanding analysis
social projects
responsible required
ensure within
design key
performance digital
company manager
products their
lead tools
role services
Topic 13 (3rd
)
working market
develop project
software process
media reporting
key through
requirements solutions
manager excellent
your strategy
multiple more
service opportunity
manage well
opportunities clients
Consultancy (17%)
12
Topic 9 (5th)
risk systems
design solutions
services other
tools technical
teams related
provide required
position degree
such operations
global skills
project opportunity
clients service
excellent products
Technical (7%)
13. Most Likely Terms in Topics (cont.)
• Analysis of the top two most likely topics
Topic 20 (1st
)
reporting analysis
media required
strategy related
strategic manager
company degree
risk online
products across
drive must
manage responsible
well financial
planning industry
lead software
Topic 21 (2nd
)
services solutions
technology clients
digital consulting
your more
implementation management
oracle technical
capabilities design
provide advisory
strategy integration
technologies sap
career enterprise
solution architecture
Strategic (41%)Computing (20%)
13
14. Model Analysis
• Main five topics:
Technical
Digital/Web
Consultancy
Computing
Strategic
• ‘Digital/Web’ is a specialism within analytics (also ‘Financial’)
• ‘Technical’ & ‘Consultancy’ are specific job types or environments
However, some technical (‘hard’) skills & some consulting-type (‘soft’) skills
are likely to be required in all analytics jobs
• ‘Computing’ & ‘Strategic’?
14
15. The Analytics of Computing?
15
Basic Analytics Capability
SoftHard
Data
Warehouses
Big Data
Architecture
Stock Market
Analysis
Algorithmic
Trading
Fraud
Investigation
Automatic
Fraud
Detection
Customer
Segmentation
Propensity
Modeling
Clickstream
Analysis
Behavioural
Targeting
Qualitative
Text Analysis
Natural
Language
Processing
Reports &
Dashboards
Advanced
Visualisation
Advanced Analytics Capability
Discovery
Analytics
16. The Analytics of Strategy?
16
Basic Analytics Capability
SoftHard
Trial & Error
Experimentation
Optimisation Simulation
Basic
Forecasting
ARIMA Time
Series
Performance
Metrics
Data
Envelopment
Analysis
A/B Testing
Multivariate
Testing
Business
Analysis
Business
Process
Optimisation
Requirements
Gathering
Problem
Structuring
Advanced Analytics Capability
Decision
Analytics
17. An Alternative Definition of Analytics
17
Descriptive Analytics
Predictive Analytics Prescriptive Analytics
Statistical and data modeling techniques designed to describe past
events and answer “what happened”?
Data mining and machine learning
techniques used to predict future
events and answer “what will
happen next”?
OR/MS , advanced statistical and
mathematical models used to
prescribe future actions and answer
“what should we do next”?
18. An Alternative Definition of Analytics
Technological Strategic
Lower Risk Decisions Higher Risk Decisions
18
Discovery Analytics Decision Analytics
Advanced Discovery
Analytics
Reporting & alerts
Market research
Information systems
Basic historical analysis
Performance metrics
Stakeholder consultation
Advanced visualisation
Real time insights
Automated decisions
Advanced Decision
Analytics
Advanced modelling
Problem structuring
Decision analysis
Advanced
19. Summary & Implications for OR/MS
• Implemented a correlated topic model on 6,873 job adverts
• An alternative practical definition of analytics has been
suggested: discovery and decision analytics
Maintains the focus on business value, application & the
disciplines that inform analytics
However, removes the contradictions in the previous model
• OR/MS has an obvious role in advanced decision analytics,
both in hard and soft applications
• Further exploration (and/or promotion) of the role of
OR/MS in advanced discovery analytics
19
20. Contact Details and Questions
Email: m.j.mortenson@lboro.ac.uk
Website: www.whatisanalytics.co.uk
Mobile: 07833 XXXXXX
LinkedIn: http://www.linkedin.com/profile/view?id=114000243&trk=tab_pro
(or search Michael Mortenson)
20