The document provides an overview of Thorhildur Jetzek's background and career. It summarizes her educational qualifications including a Ph.D. in Information Technology Management from CBS in 2015. It also lists some of her past roles working as an economist, IT consultant, and in various positions at CBS where she is currently a postdoctoral fellow. The document then discusses CBS' ranking and focus on industry collaboration through projects like industrial Ph.D. programs and crowdsourcing competitions for students.
2. • Stúdent frá Eðlisfræðibraut I í MR 1991
• B.Sc. in Economics 1994
• M.Sc. in Economics 1998
• Ph.D. in Information Technology Management 2015
• Have worked as a economist, IT consultant, assistant
professor, project manager, program manager, director,
industrial PhD and now postdoctoral researcher.
• Have always focused on use of technology
Who am I?
Traditional career
My career
@Thorhildur Jetzek CBS 2|
3. • High ranking
– 2nd in Europe (behind LSE) & 22 world-wide
• Focus on collaboration with industry
– Industrial PhD (my PhD contract was at KMD)
– Engaged scholarship & collaborative research
(current project of mine sponsored by industry)
– Crowdsourcing events:
• Student competition where CBS students got access to
anonymized data on 100.000 customers of Danske Bank
and socio-economic data from KMD as well as data from
Danske bank´s public Facebook wall
• Financial prices (DKK 75.000 1st price)
@Thorhildur Jetzek CBS 3|
6. Rise of Digitization
An average decline of almost 40% a year in the cost
per gigabyte of consumer hard disk drive from 1998
(OECD, 2013).
38% yearly
decrease in the
cost of shifting one
bit per second
since 1995 (OECD,
2013).
More than 30 million interconnected sensors are now
deployed worldwide, in areas such as security, health
care, transport systems or energy control systems,
and their numbers are growing by around 30% a year
(McKinsey, 2011).
6 billion people
have cellphones
30 billion pieces of
content are shared
on Facebook every
month
2002: The year when the amount of information
stored digitally surpasses non-digital information!
@Thorhildur Jetzek CBS 6|
7. Changes...
Forbes highlights
• IT in the boardroom: Digital strategies
• Changing business models - platforms
• Big data and analytics
• Lacking skills: EU estimates 160% increase in demand for
Big Data specialists between 2013-2020 to 346,000 new
jobs
IDC predicts
• Market for big data analysis services over $16 billion in 2014,
growing six times faster than the entire IT industry
• Cloud-based big data and analytics will grow three times faster
than spending for on premise solutions in 2015
@Thorhildur Jetzek CBS 7|
8. Global open access
FLOSS – Free/Open Source
Software
“…people´s pursuit of visible
carrots is at times interrupted by
the larger quest for the invisible
gold at the end of the rainbow.”
(von Krogh et al., 2012a, p. 671)
• Collaborative projects: Wikipedia, Human Genome Project,
Open.Nasa.gov
• Open NGO data: http://data.worldbank.org/ (and multitude of
similar)
• Open Government Data: http://data.gov / http://data.gov.uk
(and 300 others)
• Open company data (open API´s): Facebook, Twitter, LinkedIn
• Platforms: CouchSurfing.com; NeighbourGoods.net
@Thorhildur Jetzek CBS 8|
9. What is BIG data?
• The jury is still out
– Davenport: New technologies – software and
infrastructure plus the data itself
– Forrester defines Big Data as “techniques and
technologies that make handling data at extreme
scale affordable”
– McKinsey (2011): “Big data” refers to datasets
whose size is beyond the ability of typical
database software tools to capture, store,
manage, and analyze.
@Thorhildur Jetzek CBS 9|
11. Dimensions of big data: 4 V‘s
Source: IBM, http://www.ibmbigdatahub.com/infographic/four-vs-big-data
12. Utilization of data
Source: @PetteriA: http://www.slideshare.net/petterialahuhta/alahuhta-big-
dataandanalytics24sep2014
@Thorhildur Jetzek CBS 12|
14. Liquid open data
@HildaJetzek
Liquidity – reflects
ability to link and stream
data across systems
Openness – reflects ability to
use data outside of
organizational boundaries
Liquid dataIlliquid data
Closed data
Open data
Liquid closed data:
Data are effectively reused
across a variety of systems
within a single organization
Illiquid (silo’ed) closed data:
Data are stored where they
originate and not reused
Illiquid (silo’ed) open data :
Data are used outside of
organizational boundaries but
offer limited potential for
automation or coupling of data
Liquid open data:
Data are used outside of
organizational boundaries
and easily coupled with
other data and integrated
across systems
Combining internal
and external data for
improved insights
Internally
shared data
Most data
within
organizations
Many open
government data
initiatives
15. How do we identify opennes?
@HildaJetzek
Dimension Affordance Explanation
Openness
Strategic Availability Data are open to all by default
Economic Affordability
Data are free or charged for at maximum at
marginal cost of reproduction
Legal Reusability Data are published with open licenses
Liquidity
Conceptual Interoperability
Semantics and syntax are clear, data models
and metadata are published, use of
standard identifiers
Technical
Usability
Data are of high quality, published in
machine readable and standard formats,
using contextual metadata
Discoverability
Data are easily found through central portals
or published with searchable metadata or
using linked data semantics
Accessibility
Data are easily downloadable or ”query-
able” through APIs
16. Binary or continuous?
• Data are not just open or closed, or liquid
or illiquid – a continuous range
• Classification useful for strategy purposes
– A part of an organization’s data need to be
liquid across the company (customer master)
– Other data could be open but illiquid (financial
statement)
– Some data are liquid and open (genomics data,
geospatial data)
@Thorhildur Jetzek CBS 16|
17. Highlights
• Why do we have so much data?
• What are the underlying societal changes
we need to be aware of?
• Why has openness become so popular?
• Does it make sense to make more use of
data, even if it is expensive to re-think
how we handle data in the company?
@Thorhildur Jetzek CBS 17|
19. Machine-generated big data
• Sensors/IoT devices
– From car navigation systems, smart meters,
unmanned security systems, sensors etc.
@Thorhildur Jetzek CBS 19|
20. • Social data
– Sources: Social media websites, blog
sites, product reviews, search results
– Unstructured,
natural language
• Data from mobile phones
People-generated big data
– Most commonly geolocation
– for example used to analyze
traffic or movement of people
or to do geo-tagging
Source: Waze
https://www.waze.com/@Thorhildur Jetzek CBS 20|
21. Measurement data
• Nature/Environment
– Sources: Measurements, such as meteorological,
atmospheric and pollution Big Data
• Geospatial data
Source: @Vishy Iyer, UT
https://news.utexas.edu/2012/09/28/cracking
-the-genetic-code-of-brain-tumors
Cracking the Genetic Code of Brain Tumors
• Lifeforms
– Sources: Genetic
sequencing,
patient databases
@Thorhildur Jetzek CBS 21|
23. Structure of data
• How do we define structured data?
– Very often referred to as data in structured relational
databases
– Known datamodel, identities and tabular formats
(columns and rows)
– Still, a lot of (big) data analytics tools/packages want
tabular formats
• R uses data-frames
• Tableu wants a tabular format
• SAS/SPSS use a tabular format
@Thorhildur Jetzek CBS 23|
25. Semi-structured data
• Typically data such as XML or JSON
– Nested, not tabular but a known
structure all the same
– Could be applied to
text-files such as logs
– Can be transformed into
a tabular structure (with
many empty cells)
@Thorhildur Jetzek CBS 25|
29. Standard analytics
• Data analytics can take many different
forms
• Common forms of data analytics include:
– Static reporting: Annual reports – quarterly
reports etc.)
– Dynamic reporting: Business intelligence,
ability to choose columns and rows and
reorganize data into a format that makes
sense to user
– Simple analysis: sums, filtering, pivot tables,
max and min values, averages etc.
@Thorhildur Jetzek CBS 29|
30. Visual analytics
• To explore and understand data by visualizing
• Most people have an easier time understanding a
chart than t-values or large numeric matrices
• Can range from „traditional“ bar charts or lines to
word clouds (highlights most used words by
making them bigger), heatmaps, placing items on
geographic maps, use of treemaps, bubble
diagrams etc.
• Visual analytics is (like all statistics really) a
combination of art and science. It is difficult to tell
a good story with one picture, but a very powerful
tool if you succceed!
@Thorhildur Jetzek CBS 30|
31. Helps us understand
Source: SAS http://www.sas.com/en_nz/software/business-intelligence/visual-analytics.html
32. Basic analytics
• Use of a bit more advanced statistics using Excel
or Excel add-ins or tools such as SAS or SPSS.
• Correlational/regression analysis.
– Could be used to see if there are any interesting
correlations or explanations which can add to
company decision making
– Can be used for forecasting, for example if there is a
great weather forecast (external data), sales of
icecream are predicted to go up 15% => stock up on
icecream
– Be aware of the uncertainty in such models. This is
not the truth!
– Be aware of spurious correlations:
http://tylervigen.com/spurious-correlations
@Thorhildur Jetzek CBS 32|
33. Basic analytics
• Time series analysis
– Used to predict the future
– Makes use of historical data and looks for
trends in the data
– Seasonal changes, growth etc.
– Use of statistical methods like moving
averages to figure out long term trends
34. Advanced analytics
• There are many more less used statistical
methods for quantitative data (numbers)
– Dimension reduction: Search for any natural
clusters in the measurements (columns) that
help us identify composite variables
– Cluster analysis: Search for any natural clusters
in the data (rows). For instance, marketers like
to cluster the general population of consumers
into market segments with different buying
behaviors
– Social network analysis: Clusters and
relationships. Which groups on Facebook are
likely to connect?
35. Advanced analytics
• Structural equation modelling (SEM):
• Simultaneously estimate multiple equations
(multivariate)
• Estimate variables and paths (relationships)
• Can be based on covariance (CB-SEM) or multiple
regression (PLS-SEM)
• Confirmatory factor analysis based on similar technique
but without estimating the paths
Variable (typically
based on more
than one measures
to reduce risk of
measurment bias)
Path – to understand nature
of relationship@Thorhildur Jetzek 35|
38. Artificial intelligence
• Neural network analysis:
– A computer program modeled after the human
brain and can identify patterns in a similar way that
we do
– This technique is
particularly useful if you
have a large amount of
data, which can reveal
subtle patterns you haven’t
found or modelled ex ante
@Thorhildur Jetzek CBS 38|
39. Machine learning
• Machine learning can use many different
algorithms
– Machine learning can use supervised, semi-
supervised or unsupervised learning
processes
– Despite the fancy connotation, some
machine learning algorithms are not that
complex
– Of course they can also be very complex
(Google‘s self driving car, IBM Watson‘s
chess playing algorithm)
@Thorhildur Jetzek CBS 39|
41. Data mining
• Data mining:
– A process of extracting value from large quantities
of unstructured data, including text, images, voice
and video. Includes pattern recognition, tagging
and annotation
– Data mining can really increase the value of the
data
• Sentiment analysis:
– Seeks to extract subjective opinion or sentiment
from text, video or audio data
– The basic aim is to determine the attitude of an
individual or group regarding a particular topic or
overall context
– Used to understand stakeholder opinion
@Thorhildur Jetzek CBS 41|
42. Text analysis
Source: Zimmerman, C., Stein, M. K., Hardt, D., & Vatrapu, R. (2015). Emergence of Things Felt.
In Proceedings of the Thirty Sixth International Conference on Information Systems. ICIS 2015
Own analysis based on Twitter data, query all tweeds including "open data"
OR opengovdata OR opengov in March 2012/13/14, total of 100k rows
@Thorhildur Jetzek CBS 42|
44. Data storage (no-SQL)
• Hadoop: open-source software
framework for distributed
storage of very large datasets
on computer clusters
• Cloudera: An enterprise solution to help
businesses manage their Hadoop ecosystem
• MongoDB: It’s good for managing data that
changes frequently or data that is unstructured
or semi-structured
• Apache Cassandra: Data replication, scalability
and performance
@Thorhildur Jetzek CBS 44|
45. Middleware:
Data integration and management
• Talend: Master Data Management (MDM) offering,
which combines real-time data, applications, and process
integration with embedded data quality and stewardship
• Pentaho: A Comprehensive data integration and business
analytics platform, incl. embedded analysis
• Splunk: Monitor, search and analyze massive streams of
machine data
• InfoSphere Master Data Management: Helps link
unstructured content from external sources to the
golden record for that enhanced 360-degree view
@Thorhildur Jetzek CBS 45|
46. (Visual) Analytics
• Many of the middleware solutions reach into this space and vice
versa – most of these tools have data integration possibilities and
the others offer some analytics
• Tableau: Has focused on integration to various
data-sources (incl. Hadoop) and easy
visualization of data – very easy to use
• Qlik: Very robust, offering options to create
very nice dashboards, but has a bit steeper
learning curve
• IBM Watson Analytics: Can use natural
language to ask questions that are
„translated“ into a query
@Thorhildur Jetzek CBS 46|
47. Advanced Analytics
• SPSS – tabular data only and narrow
capabilities but relatively easy to use
• SAS – dynamic (a programming language)
and a lot of options for analysis
• Matlab – a lot of flexibility for doing own
programming
• R (open source) – can apply packages but
you still have to do a lot of manual labour
(code). Many options
• For structural equation modelling: specific
packages such as SmartPls or Amos
@Thorhildur Jetzek CBS 47|
49. Economics of data
• I use the economists approach and view
data (of any size) as a resource
• Specific features of digital data
– Low marginal costs – easy to distribute and
reuse
– Can be used for many different things
– Value mostly from downstream activities
From an economic perspective, it
makes sense to reuse data as much
as possible
@Thorhildur Jetzek CBS 49|
50. Data accounting
• Data as a resource
– What data do we have
– Where do they originate from
– Where are they stored, who is responsible
– Are they sensitive or can they be opened for resuse
– Are the streaming or static
– Are they mission critical or less important
– Are we using them optimally?
– Do we have the right skills
– Do we have the right tools
• We know about our human resources, machines,
buildings, cars, production parts... Now we must
have the same knowledge about data
@Thorhildur Jetzek CBS 50|
51. ...capitalizing on the benefits of digitization needs to
be a strategic imperative
Value generation
@Thorhildur Jetzek CBS 51|
52. Business strategy
• Consider the competitive advantage
offered by your own data
• Consider the potential value from using other
available external data
• Consider costs and benefits
• Consider what other companies are doing (not
necessarily in the same industry)
Sometimes it makes sense to reuse data internally
Sometimes it makes sense to fetch and use external
data
Sometimes it makes sense to share own data
@Thorhildur Jetzek CBS 52|
54. Model with 2-sided markets
@HildaJetzek
Soft infrastructure
Sustainable
value
Paying side
Buying and selling
goods and services
Non-paying side
Sharing relevant
content
Cost of high-
speed networks
Openness of data
Societal level impact
(MSPs)
Intermediaries
Information
sharing +
market
mechanisms
= Synergy
Effectiveness of data
and privacy protection
frameworks
Ease of reaching a
skilled workforce
Motivation
AbilityBasic requirements
Resource
Digital leadership of
government
Opportunity
Societal level structures
MSPs = Multi Sided Platforms
@Thorhildur Jetzek CBS 54|
56. Examples of use
• Better understand and target customers
• Understand and optimize business
processes
• Improving health
• Smart cities
• Improving sports performance
@Thorhildur Jetzek CBS 56|
62. Smart cities
• Improve security and save money
• Analyze traffic -> implement automatic
traffic controls
@Thorhildur Jetzek CBS 62|
63. Improving sports performance
Prozone analyses over 750,000 data points per game, 300 GPS data points
per training session and 110 data points per injury to create the most
comprehensive injury database in sport. By analyzing over 36 million data
points per club/per season we aim to identify the subtle patterns in a player’s
performance that predispose them to increased relative risk of injury
allowing club to act to prevent injury.
@Thorhildur Jetzek CBS 63|