SlideShare une entreprise Scribd logo
1  sur  64
Thorhildur Jetzek, Ph.D.
Postdoctoral Fellow
Department of IT management, CBS
• Stúdent frá Eðlisfræðibraut I í MR 1991
• B.Sc. in Economics 1994
• M.Sc. in Economics 1998
• Ph.D. in Information Technology Management 2015
• Have worked as a economist, IT consultant, assistant
professor, project manager, program manager, director,
industrial PhD and now postdoctoral researcher.
• Have always focused on use of technology
Who am I?
Traditional career
My career
@Thorhildur Jetzek CBS 2|
• High ranking
– 2nd in Europe (behind LSE) & 22 world-wide
• Focus on collaboration with industry
– Industrial PhD (my PhD contract was at KMD)
– Engaged scholarship & collaborative research
(current project of mine sponsored by industry)
– Crowdsourcing events:
• Student competition where CBS students got access to
anonymized data on 100.000 customers of Danske Bank
and socio-economic data from KMD as well as data from
Danske bank´s public Facebook wall
• Financial prices (DKK 75.000 1st price)
@Thorhildur Jetzek CBS 3|
Societal context
Five societal megatrends
We are in the eye of the storm….
@Thorhildur Jetzek CBS 5|
Rise of Digitization
An average decline of almost 40% a year in the cost
per gigabyte of consumer hard disk drive from 1998
(OECD, 2013).
38% yearly
decrease in the
cost of shifting one
bit per second
since 1995 (OECD,
2013).
More than 30 million interconnected sensors are now
deployed worldwide, in areas such as security, health
care, transport systems or energy control systems,
and their numbers are growing by around 30% a year
(McKinsey, 2011).
6 billion people
have cellphones
30 billion pieces of
content are shared
on Facebook every
month
2002: The year when the amount of information
stored digitally surpasses non-digital information!
@Thorhildur Jetzek CBS 6|
Changes...
Forbes highlights
• IT in the boardroom: Digital strategies
• Changing business models - platforms
• Big data and analytics
• Lacking skills: EU estimates 160% increase in demand for
Big Data specialists between 2013-2020 to 346,000 new
jobs
IDC predicts
• Market for big data analysis services over $16 billion in 2014,
growing six times faster than the entire IT industry
• Cloud-based big data and analytics will grow three times faster
than spending for on premise solutions in 2015
@Thorhildur Jetzek CBS 7|
Global open access
FLOSS – Free/Open Source
Software
“…people´s pursuit of visible
carrots is at times interrupted by
the larger quest for the invisible
gold at the end of the rainbow.”
(von Krogh et al., 2012a, p. 671)
• Collaborative projects: Wikipedia, Human Genome Project,
Open.Nasa.gov
• Open NGO data: http://data.worldbank.org/ (and multitude of
similar)
• Open Government Data: http://data.gov / http://data.gov.uk
(and 300 others)
• Open company data (open API´s): Facebook, Twitter, LinkedIn
• Platforms: CouchSurfing.com; NeighbourGoods.net
@Thorhildur Jetzek CBS 8|
What is BIG data?
• The jury is still out
– Davenport: New technologies – software and
infrastructure plus the data itself
– Forrester defines Big Data as “techniques and
technologies that make handling data at extreme
scale affordable”
– McKinsey (2011): “Big data” refers to datasets
whose size is beyond the ability of typical
database software tools to capture, store,
manage, and analyze.
@Thorhildur Jetzek CBS 9|
Classification of
big data
Dimensions of big data: 4 V‘s
Source: IBM, http://www.ibmbigdatahub.com/infographic/four-vs-big-data
Utilization of data
Source: @PetteriA: http://www.slideshare.net/petterialahuhta/alahuhta-big-
dataandanalytics24sep2014
@Thorhildur Jetzek CBS 12|
Terminology
@HildaJetzek
13
Social
data
Open
data Master
data
•Structured – kept in
relational databases
•Purposefully entered into
systems
Big data
Small data
Machine
data
@Thorhildur Jetzek CBS 13|
Liquid open data
@HildaJetzek
Liquidity – reflects
ability to link and stream
data across systems
Openness – reflects ability to
use data outside of
organizational boundaries
Liquid dataIlliquid data
Closed data
Open data
Liquid closed data:
Data are effectively reused
across a variety of systems
within a single organization
Illiquid (silo’ed) closed data:
Data are stored where they
originate and not reused
Illiquid (silo’ed) open data :
Data are used outside of
organizational boundaries but
offer limited potential for
automation or coupling of data
Liquid open data:
Data are used outside of
organizational boundaries
and easily coupled with
other data and integrated
across systems
Combining internal
and external data for
improved insights
Internally
shared data
Most data
within
organizations
Many open
government data
initiatives
How do we identify opennes?
@HildaJetzek
Dimension Affordance Explanation
Openness
Strategic Availability Data are open to all by default
Economic Affordability
Data are free or charged for at maximum at
marginal cost of reproduction
Legal Reusability Data are published with open licenses
Liquidity
Conceptual Interoperability
Semantics and syntax are clear, data models
and metadata are published, use of
standard identifiers
Technical
Usability
Data are of high quality, published in
machine readable and standard formats,
using contextual metadata
Discoverability
Data are easily found through central portals
or published with searchable metadata or
using linked data semantics
Accessibility
Data are easily downloadable or ”query-
able” through APIs
Binary or continuous?
• Data are not just open or closed, or liquid
or illiquid – a continuous range
• Classification useful for strategy purposes
– A part of an organization’s data need to be
liquid across the company (customer master)
– Other data could be open but illiquid (financial
statement)
– Some data are liquid and open (genomics data,
geospatial data)
@Thorhildur Jetzek CBS 16|
Highlights
• Why do we have so much data?
• What are the underlying societal changes
we need to be aware of?
• Why has openness become so popular?
• Does it make sense to make more use of
data, even if it is expensive to re-think
how we handle data in the company?
@Thorhildur Jetzek CBS 17|
Types of big data
Machine-generated big data
• Sensors/IoT devices
– From car navigation systems, smart meters,
unmanned security systems, sensors etc.
@Thorhildur Jetzek CBS 19|
• Social data
– Sources: Social media websites, blog
sites, product reviews, search results
– Unstructured,
natural language
• Data from mobile phones
People-generated big data
– Most commonly geolocation
– for example used to analyze
traffic or movement of people
or to do geo-tagging
Source: Waze
https://www.waze.com/@Thorhildur Jetzek CBS 20|
Measurement data
• Nature/Environment
– Sources: Measurements, such as meteorological,
atmospheric and pollution Big Data
• Geospatial data
Source: @Vishy Iyer, UT
https://news.utexas.edu/2012/09/28/cracking
-the-genetic-code-of-brain-tumors
Cracking the Genetic Code of Brain Tumors
• Lifeforms
– Sources: Genetic
sequencing,
patient databases
@Thorhildur Jetzek CBS 21|
Structure
Structure of data
• How do we define structured data?
– Very often referred to as data in structured relational
databases
– Known datamodel, identities and tabular formats
(columns and rows)
– Still, a lot of (big) data analytics tools/packages want
tabular formats
• R uses data-frames
• Tableu wants a tabular format
• SAS/SPSS use a tabular format
@Thorhildur Jetzek CBS 23|
ARghhh
Semi-structured data
• Typically data such as XML or JSON
– Nested, not tabular but a known
structure all the same
– Could be applied to
text-files such as logs
– Can be transformed into
a tabular structure (with
many empty cells)
@Thorhildur Jetzek CBS 25|
Is there structure?
@Thorhildur Jetzek CBS 26|
Unstructured files
• Photos and graphic images
• Videos
• PDF files
• PowerPoint presentations
• Emails
• Blog entries
• Wikis
• Word processing documents
@Thorhildur Jetzek CBS 27|
Analytics
Standard analytics
• Data analytics can take many different
forms
• Common forms of data analytics include:
– Static reporting: Annual reports – quarterly
reports etc.)
– Dynamic reporting: Business intelligence,
ability to choose columns and rows and
reorganize data into a format that makes
sense to user
– Simple analysis: sums, filtering, pivot tables,
max and min values, averages etc.
@Thorhildur Jetzek CBS 29|
Visual analytics
• To explore and understand data by visualizing
• Most people have an easier time understanding a
chart than t-values or large numeric matrices
• Can range from „traditional“ bar charts or lines to
word clouds (highlights most used words by
making them bigger), heatmaps, placing items on
geographic maps, use of treemaps, bubble
diagrams etc.
• Visual analytics is (like all statistics really) a
combination of art and science. It is difficult to tell
a good story with one picture, but a very powerful
tool if you succceed!
@Thorhildur Jetzek CBS 30|
Helps us understand
Source: SAS http://www.sas.com/en_nz/software/business-intelligence/visual-analytics.html
Basic analytics
• Use of a bit more advanced statistics using Excel
or Excel add-ins or tools such as SAS or SPSS.
• Correlational/regression analysis.
– Could be used to see if there are any interesting
correlations or explanations which can add to
company decision making
– Can be used for forecasting, for example if there is a
great weather forecast (external data), sales of
icecream are predicted to go up 15% => stock up on
icecream
– Be aware of the uncertainty in such models. This is
not the truth!
– Be aware of spurious correlations:
http://tylervigen.com/spurious-correlations
@Thorhildur Jetzek CBS 32|
Basic analytics
• Time series analysis
– Used to predict the future
– Makes use of historical data and looks for
trends in the data
– Seasonal changes, growth etc.
– Use of statistical methods like moving
averages to figure out long term trends
Advanced analytics
• There are many more less used statistical
methods for quantitative data (numbers)
– Dimension reduction: Search for any natural
clusters in the measurements (columns) that
help us identify composite variables
– Cluster analysis: Search for any natural clusters
in the data (rows). For instance, marketers like
to cluster the general population of consumers
into market segments with different buying
behaviors
– Social network analysis: Clusters and
relationships. Which groups on Facebook are
likely to connect?
Advanced analytics
• Structural equation modelling (SEM):
• Simultaneously estimate multiple equations
(multivariate)
• Estimate variables and paths (relationships)
• Can be based on covariance (CB-SEM) or multiple
regression (PLS-SEM)
• Confirmatory factor analysis based on similar technique
but without estimating the paths
Variable (typically
based on more
than one measures
to reduce risk of
measurment bias)
Path – to understand nature
of relationship@Thorhildur Jetzek 35|
A combination
@HildaJetzek
@Thorhildur Jetzek CBS 36|
Different types of use
@Thorhildur Jetzek CBS 37|
Artificial intelligence
• Neural network analysis:
– A computer program modeled after the human
brain and can identify patterns in a similar way that
we do
– This technique is
particularly useful if you
have a large amount of
data, which can reveal
subtle patterns you haven’t
found or modelled ex ante
@Thorhildur Jetzek CBS 38|
Machine learning
• Machine learning can use many different
algorithms
– Machine learning can use supervised, semi-
supervised or unsupervised learning
processes
– Despite the fancy connotation, some
machine learning algorithms are not that
complex
– Of course they can also be very complex
(Google‘s self driving car, IBM Watson‘s
chess playing algorithm)
@Thorhildur Jetzek CBS 39|
Recommendation algorithm
Source: http://www.datasciencecentral.com/profiles/blogs/collaborative-filtering-tutorials-across-languages
@Thorhildur Jetzek CBS 40|
Data mining
• Data mining:
– A process of extracting value from large quantities
of unstructured data, including text, images, voice
and video. Includes pattern recognition, tagging
and annotation
– Data mining can really increase the value of the
data
• Sentiment analysis:
– Seeks to extract subjective opinion or sentiment
from text, video or audio data
– The basic aim is to determine the attitude of an
individual or group regarding a particular topic or
overall context
– Used to understand stakeholder opinion
@Thorhildur Jetzek CBS 41|
Text analysis
Source: Zimmerman, C., Stein, M. K., Hardt, D., & Vatrapu, R. (2015). Emergence of Things Felt.
In Proceedings of the Thirty Sixth International Conference on Information Systems. ICIS 2015
Own analysis based on Twitter data, query all tweeds including "open data"
OR opengovdata OR opengov in March 2012/13/14, total of 100k rows
@Thorhildur Jetzek CBS 42|
An example
of solutions
Data storage (no-SQL)
• Hadoop: open-source software
framework for distributed
storage of very large datasets
on computer clusters
• Cloudera: An enterprise solution to help
businesses manage their Hadoop ecosystem
• MongoDB: It’s good for managing data that
changes frequently or data that is unstructured
or semi-structured
• Apache Cassandra: Data replication, scalability
and performance
@Thorhildur Jetzek CBS 44|
Middleware:
Data integration and management
• Talend: Master Data Management (MDM) offering,
which combines real-time data, applications, and process
integration with embedded data quality and stewardship
• Pentaho: A Comprehensive data integration and business
analytics platform, incl. embedded analysis
• Splunk: Monitor, search and analyze massive streams of
machine data
• InfoSphere Master Data Management: Helps link
unstructured content from external sources to the
golden record for that enhanced 360-degree view
@Thorhildur Jetzek CBS 45|
(Visual) Analytics
• Many of the middleware solutions reach into this space and vice
versa – most of these tools have data integration possibilities and
the others offer some analytics
• Tableau: Has focused on integration to various
data-sources (incl. Hadoop) and easy
visualization of data – very easy to use
• Qlik: Very robust, offering options to create
very nice dashboards, but has a bit steeper
learning curve
• IBM Watson Analytics: Can use natural
language to ask questions that are
„translated“ into a query
@Thorhildur Jetzek CBS 46|
Advanced Analytics
• SPSS – tabular data only and narrow
capabilities but relatively easy to use
• SAS – dynamic (a programming language)
and a lot of options for analysis
• Matlab – a lot of flexibility for doing own
programming
• R (open source) – can apply packages but
you still have to do a lot of manual labour
(code). Many options
• For structural equation modelling: specific
packages such as SmartPls or Amos
@Thorhildur Jetzek CBS 47|
Value
generation
Economics of data
• I use the economists approach and view
data (of any size) as a resource
• Specific features of digital data
– Low marginal costs – easy to distribute and
reuse
– Can be used for many different things
– Value mostly from downstream activities
From an economic perspective, it
makes sense to reuse data as much
as possible
@Thorhildur Jetzek CBS 49|
Data accounting
• Data as a resource
– What data do we have
– Where do they originate from
– Where are they stored, who is responsible
– Are they sensitive or can they be opened for resuse
– Are the streaming or static
– Are they mission critical or less important
– Are we using them optimally?
– Do we have the right skills
– Do we have the right tools
• We know about our human resources, machines,
buildings, cars, production parts... Now we must
have the same knowledge about data
@Thorhildur Jetzek CBS 50|
...capitalizing on the benefits of digitization needs to
be a strategic imperative
Value generation
@Thorhildur Jetzek CBS 51|
Business strategy
• Consider the competitive advantage
offered by your own data
• Consider the potential value from using other
available external data
• Consider costs and benefits
• Consider what other companies are doing (not
necessarily in the same industry)
Sometimes it makes sense to reuse data internally
Sometimes it makes sense to fetch and use external
data
Sometimes it makes sense to share own data
@Thorhildur Jetzek CBS 52|
Value generation mechanisms
@HildaJetzek
Exploitation:
Goodgovernance
Exploration:
Drivingchange
Economic:
Market mechanisms
Social:
Information sharing mechanisms
∆ Transparency
∆ Civic engagement
∆ Efficiency
∆ Innovation
Model with 2-sided markets
@HildaJetzek
Soft infrastructure
Sustainable
value
Paying side
Buying and selling
goods and services
Non-paying side
Sharing relevant
content
Cost of high-
speed networks
Openness of data
Societal level impact
(MSPs)
Intermediaries
Information
sharing +
market
mechanisms
= Synergy
Effectiveness of data
and privacy protection
frameworks
Ease of reaching a
skilled workforce
Motivation
AbilityBasic requirements
Resource
Digital leadership of
government
Opportunity
Societal level structures
MSPs = Multi Sided Platforms
@Thorhildur Jetzek CBS 54|
Use cases
Examples of use
• Better understand and target customers
• Understand and optimize business
processes
• Improving health
• Smart cities
• Improving sports performance
@Thorhildur Jetzek CBS 56|
Customer dashboard I
Source: IBM
@Thorhildur Jetzek CBS 57|
Customer Dashboard II
Source: IBM, PRNewswire: https://photos.prnewswire.com/prnvar/20150528/219085?max=1600
Increase efficiency
Source: QuantumBlack http://www.quantumblack.com/
@Thorhildur Jetzek CBS 59|
Improving health
@Thorhildur Jetzek CBS 60|
Health dashboards
Source Headsupheath http://headsuphealth.com/
@Thorhildur Jetzek CBS 61|
Smart cities
• Improve security and save money
• Analyze traffic -> implement automatic
traffic controls
@Thorhildur Jetzek CBS 62|
Improving sports performance
Prozone analyses over 750,000 data points per game, 300 GPS data points
per training session and 110 data points per injury to create the most
comprehensive injury database in sport. By analyzing over 36 million data
points per club/per season we aim to identify the subtle patterns in a player’s
performance that predispose them to increased relative risk of injury
allowing club to act to prevent injury.
@Thorhildur Jetzek CBS 63|
@Thorhildur Jetzek CBS 64|
THANK YOU!

Contenu connexe

Tendances

Service and Support for Science IT -Peter Kunzst, University of Zurich
Service and Support for Science IT-Peter Kunzst, University of ZurichService and Support for Science IT-Peter Kunzst, University of Zurich
Service and Support for Science IT -Peter Kunzst, University of Zurich
Mind the Byte
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.doc
butest
 

Tendances (20)

Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
HICSS - 50
HICSS - 50 HICSS - 50
HICSS - 50
 
Service and Support for Science IT -Peter Kunzst, University of Zurich
Service and Support for Science IT-Peter Kunzst, University of ZurichService and Support for Science IT-Peter Kunzst, University of Zurich
Service and Support for Science IT -Peter Kunzst, University of Zurich
 
Data Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research OpportunitiesData Mining and Big Data Challenges and Research Opportunities
Data Mining and Big Data Challenges and Research Opportunities
 
Diffusion of Big Data and Analytics in Developing Countries
Diffusion of Big Data and Analytics in Developing CountriesDiffusion of Big Data and Analytics in Developing Countries
Diffusion of Big Data and Analytics in Developing Countries
 
Paving The Way To Data Driven
Paving The Way To Data DrivenPaving The Way To Data Driven
Paving The Way To Data Driven
 
1. Data Analytics-introduction
1. Data Analytics-introduction1. Data Analytics-introduction
1. Data Analytics-introduction
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021
 
Strata Big data presentation
Strata Big data presentationStrata Big data presentation
Strata Big data presentation
 
Data Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact SolutionsData Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact Solutions
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013
 
Data Science
Data ScienceData Science
Data Science
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.doc
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics NetherlandsBig Data, the Future of Statistics: Experiences at Statistics Netherlands
Big Data, the Future of Statistics: Experiences at Statistics Netherlands
 
State of Florida Neo4J Graph Briefing - Keynote
State of Florida Neo4J Graph Briefing - KeynoteState of Florida Neo4J Graph Briefing - Keynote
State of Florida Neo4J Graph Briefing - Keynote
 
Big data
Big dataBig data
Big data
 

En vedette (6)

Toward Secure and Efficient Peer-to-Peer Voice over IP Communication in Large...
Toward Secure and Efficient Peer-to-Peer Voice over IP Communication in Large...Toward Secure and Efficient Peer-to-Peer Voice over IP Communication in Large...
Toward Secure and Efficient Peer-to-Peer Voice over IP Communication in Large...
 
Mcrl2 by kashif.namal@gmail.com, adnanskyousafzai@gmail.com
Mcrl2 by kashif.namal@gmail.com, adnanskyousafzai@gmail.comMcrl2 by kashif.namal@gmail.com, adnanskyousafzai@gmail.com
Mcrl2 by kashif.namal@gmail.com, adnanskyousafzai@gmail.com
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
 
Smart Cities are the Internet of Things
Smart Cities are the Internet of ThingsSmart Cities are the Internet of Things
Smart Cities are the Internet of Things
 
Internet of Things for Smart Cities
Internet of Things for Smart CitiesInternet of Things for Smart Cities
Internet of Things for Smart Cities
 
محاضرات تحليل احصائي Spss
محاضرات تحليل احصائي Spssمحاضرات تحليل احصائي Spss
محاضرات تحليل احصائي Spss
 

Similaire à Big data presentation for University of Reykjavik, Iceland, March 22

Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptxExplorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
windu19
 

Similaire à Big data presentation for University of Reykjavik, Iceland, March 22 (20)

Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
 
Big data
Big dataBig data
Big data
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
 
Building blocks for fair digital society
Building blocks for fair digital societyBuilding blocks for fair digital society
Building blocks for fair digital society
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked Data
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Data Science, Big Data, Data Analytics
Data Science, Big Data, Data AnalyticsData Science, Big Data, Data Analytics
Data Science, Big Data, Data Analytics
 
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
November 10, 2015 NISO/ICSTI Joint Webinar: A Pathway from Open Access and Da...
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
A beginner's guide to Big data
A beginner's guide to Big dataA beginner's guide to Big data
A beginner's guide to Big data
 
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptxExplorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
SoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningSoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social Mining
 
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
 
Study: #Big Data in #Austria
Study: #Big Data in #AustriaStudy: #Big Data in #Austria
Study: #Big Data in #Austria
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 

Plus de Thorhildur Jetzek, Ph.D. (6)

T jetzek a value based approach to data science copy
T jetzek a value based approach to data science   copyT jetzek a value based approach to data science   copy
T jetzek a value based approach to data science copy
 
UTmessan 2018 : Societal impact of Artificial Intelligence
UTmessan 2018 : Societal impact of Artificial IntelligenceUTmessan 2018 : Societal impact of Artificial Intelligence
UTmessan 2018 : Societal impact of Artificial Intelligence
 
A Strategy for the Future of Public Sector Data Management
A Strategy for the Future of Public Sector Data ManagementA Strategy for the Future of Public Sector Data Management
A Strategy for the Future of Public Sector Data Management
 
Big Data in Financial Services
Big Data in Financial ServicesBig Data in Financial Services
Big Data in Financial Services
 
Public Gains from Open Data
Public Gains from Open DataPublic Gains from Open Data
Public Gains from Open Data
 
The Sustainable Value of Open Data
The Sustainable Value of Open DataThe Sustainable Value of Open Data
The Sustainable Value of Open Data
 

Dernier

Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
amitlee9823
 
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
amitlee9823
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
lizamodels9
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
dlhescort
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
dlhescort
 

Dernier (20)

Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperity
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceMalegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
Call Girls In Majnu Ka Tilla 959961~3876 Shot 2000 Night 8000
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation Final
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 

Big data presentation for University of Reykjavik, Iceland, March 22

  • 1. Thorhildur Jetzek, Ph.D. Postdoctoral Fellow Department of IT management, CBS
  • 2. • Stúdent frá Eðlisfræðibraut I í MR 1991 • B.Sc. in Economics 1994 • M.Sc. in Economics 1998 • Ph.D. in Information Technology Management 2015 • Have worked as a economist, IT consultant, assistant professor, project manager, program manager, director, industrial PhD and now postdoctoral researcher. • Have always focused on use of technology Who am I? Traditional career My career @Thorhildur Jetzek CBS 2|
  • 3. • High ranking – 2nd in Europe (behind LSE) & 22 world-wide • Focus on collaboration with industry – Industrial PhD (my PhD contract was at KMD) – Engaged scholarship & collaborative research (current project of mine sponsored by industry) – Crowdsourcing events: • Student competition where CBS students got access to anonymized data on 100.000 customers of Danske Bank and socio-economic data from KMD as well as data from Danske bank´s public Facebook wall • Financial prices (DKK 75.000 1st price) @Thorhildur Jetzek CBS 3|
  • 5. Five societal megatrends We are in the eye of the storm…. @Thorhildur Jetzek CBS 5|
  • 6. Rise of Digitization An average decline of almost 40% a year in the cost per gigabyte of consumer hard disk drive from 1998 (OECD, 2013). 38% yearly decrease in the cost of shifting one bit per second since 1995 (OECD, 2013). More than 30 million interconnected sensors are now deployed worldwide, in areas such as security, health care, transport systems or energy control systems, and their numbers are growing by around 30% a year (McKinsey, 2011). 6 billion people have cellphones 30 billion pieces of content are shared on Facebook every month 2002: The year when the amount of information stored digitally surpasses non-digital information! @Thorhildur Jetzek CBS 6|
  • 7. Changes... Forbes highlights • IT in the boardroom: Digital strategies • Changing business models - platforms • Big data and analytics • Lacking skills: EU estimates 160% increase in demand for Big Data specialists between 2013-2020 to 346,000 new jobs IDC predicts • Market for big data analysis services over $16 billion in 2014, growing six times faster than the entire IT industry • Cloud-based big data and analytics will grow three times faster than spending for on premise solutions in 2015 @Thorhildur Jetzek CBS 7|
  • 8. Global open access FLOSS – Free/Open Source Software “…people´s pursuit of visible carrots is at times interrupted by the larger quest for the invisible gold at the end of the rainbow.” (von Krogh et al., 2012a, p. 671) • Collaborative projects: Wikipedia, Human Genome Project, Open.Nasa.gov • Open NGO data: http://data.worldbank.org/ (and multitude of similar) • Open Government Data: http://data.gov / http://data.gov.uk (and 300 others) • Open company data (open API´s): Facebook, Twitter, LinkedIn • Platforms: CouchSurfing.com; NeighbourGoods.net @Thorhildur Jetzek CBS 8|
  • 9. What is BIG data? • The jury is still out – Davenport: New technologies – software and infrastructure plus the data itself – Forrester defines Big Data as “techniques and technologies that make handling data at extreme scale affordable” – McKinsey (2011): “Big data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. @Thorhildur Jetzek CBS 9|
  • 11. Dimensions of big data: 4 V‘s Source: IBM, http://www.ibmbigdatahub.com/infographic/four-vs-big-data
  • 12. Utilization of data Source: @PetteriA: http://www.slideshare.net/petterialahuhta/alahuhta-big- dataandanalytics24sep2014 @Thorhildur Jetzek CBS 12|
  • 13. Terminology @HildaJetzek 13 Social data Open data Master data •Structured – kept in relational databases •Purposefully entered into systems Big data Small data Machine data @Thorhildur Jetzek CBS 13|
  • 14. Liquid open data @HildaJetzek Liquidity – reflects ability to link and stream data across systems Openness – reflects ability to use data outside of organizational boundaries Liquid dataIlliquid data Closed data Open data Liquid closed data: Data are effectively reused across a variety of systems within a single organization Illiquid (silo’ed) closed data: Data are stored where they originate and not reused Illiquid (silo’ed) open data : Data are used outside of organizational boundaries but offer limited potential for automation or coupling of data Liquid open data: Data are used outside of organizational boundaries and easily coupled with other data and integrated across systems Combining internal and external data for improved insights Internally shared data Most data within organizations Many open government data initiatives
  • 15. How do we identify opennes? @HildaJetzek Dimension Affordance Explanation Openness Strategic Availability Data are open to all by default Economic Affordability Data are free or charged for at maximum at marginal cost of reproduction Legal Reusability Data are published with open licenses Liquidity Conceptual Interoperability Semantics and syntax are clear, data models and metadata are published, use of standard identifiers Technical Usability Data are of high quality, published in machine readable and standard formats, using contextual metadata Discoverability Data are easily found through central portals or published with searchable metadata or using linked data semantics Accessibility Data are easily downloadable or ”query- able” through APIs
  • 16. Binary or continuous? • Data are not just open or closed, or liquid or illiquid – a continuous range • Classification useful for strategy purposes – A part of an organization’s data need to be liquid across the company (customer master) – Other data could be open but illiquid (financial statement) – Some data are liquid and open (genomics data, geospatial data) @Thorhildur Jetzek CBS 16|
  • 17. Highlights • Why do we have so much data? • What are the underlying societal changes we need to be aware of? • Why has openness become so popular? • Does it make sense to make more use of data, even if it is expensive to re-think how we handle data in the company? @Thorhildur Jetzek CBS 17|
  • 18. Types of big data
  • 19. Machine-generated big data • Sensors/IoT devices – From car navigation systems, smart meters, unmanned security systems, sensors etc. @Thorhildur Jetzek CBS 19|
  • 20. • Social data – Sources: Social media websites, blog sites, product reviews, search results – Unstructured, natural language • Data from mobile phones People-generated big data – Most commonly geolocation – for example used to analyze traffic or movement of people or to do geo-tagging Source: Waze https://www.waze.com/@Thorhildur Jetzek CBS 20|
  • 21. Measurement data • Nature/Environment – Sources: Measurements, such as meteorological, atmospheric and pollution Big Data • Geospatial data Source: @Vishy Iyer, UT https://news.utexas.edu/2012/09/28/cracking -the-genetic-code-of-brain-tumors Cracking the Genetic Code of Brain Tumors • Lifeforms – Sources: Genetic sequencing, patient databases @Thorhildur Jetzek CBS 21|
  • 23. Structure of data • How do we define structured data? – Very often referred to as data in structured relational databases – Known datamodel, identities and tabular formats (columns and rows) – Still, a lot of (big) data analytics tools/packages want tabular formats • R uses data-frames • Tableu wants a tabular format • SAS/SPSS use a tabular format @Thorhildur Jetzek CBS 23|
  • 25. Semi-structured data • Typically data such as XML or JSON – Nested, not tabular but a known structure all the same – Could be applied to text-files such as logs – Can be transformed into a tabular structure (with many empty cells) @Thorhildur Jetzek CBS 25|
  • 27. Unstructured files • Photos and graphic images • Videos • PDF files • PowerPoint presentations • Emails • Blog entries • Wikis • Word processing documents @Thorhildur Jetzek CBS 27|
  • 29. Standard analytics • Data analytics can take many different forms • Common forms of data analytics include: – Static reporting: Annual reports – quarterly reports etc.) – Dynamic reporting: Business intelligence, ability to choose columns and rows and reorganize data into a format that makes sense to user – Simple analysis: sums, filtering, pivot tables, max and min values, averages etc. @Thorhildur Jetzek CBS 29|
  • 30. Visual analytics • To explore and understand data by visualizing • Most people have an easier time understanding a chart than t-values or large numeric matrices • Can range from „traditional“ bar charts or lines to word clouds (highlights most used words by making them bigger), heatmaps, placing items on geographic maps, use of treemaps, bubble diagrams etc. • Visual analytics is (like all statistics really) a combination of art and science. It is difficult to tell a good story with one picture, but a very powerful tool if you succceed! @Thorhildur Jetzek CBS 30|
  • 31. Helps us understand Source: SAS http://www.sas.com/en_nz/software/business-intelligence/visual-analytics.html
  • 32. Basic analytics • Use of a bit more advanced statistics using Excel or Excel add-ins or tools such as SAS or SPSS. • Correlational/regression analysis. – Could be used to see if there are any interesting correlations or explanations which can add to company decision making – Can be used for forecasting, for example if there is a great weather forecast (external data), sales of icecream are predicted to go up 15% => stock up on icecream – Be aware of the uncertainty in such models. This is not the truth! – Be aware of spurious correlations: http://tylervigen.com/spurious-correlations @Thorhildur Jetzek CBS 32|
  • 33. Basic analytics • Time series analysis – Used to predict the future – Makes use of historical data and looks for trends in the data – Seasonal changes, growth etc. – Use of statistical methods like moving averages to figure out long term trends
  • 34. Advanced analytics • There are many more less used statistical methods for quantitative data (numbers) – Dimension reduction: Search for any natural clusters in the measurements (columns) that help us identify composite variables – Cluster analysis: Search for any natural clusters in the data (rows). For instance, marketers like to cluster the general population of consumers into market segments with different buying behaviors – Social network analysis: Clusters and relationships. Which groups on Facebook are likely to connect?
  • 35. Advanced analytics • Structural equation modelling (SEM): • Simultaneously estimate multiple equations (multivariate) • Estimate variables and paths (relationships) • Can be based on covariance (CB-SEM) or multiple regression (PLS-SEM) • Confirmatory factor analysis based on similar technique but without estimating the paths Variable (typically based on more than one measures to reduce risk of measurment bias) Path – to understand nature of relationship@Thorhildur Jetzek 35|
  • 37. Different types of use @Thorhildur Jetzek CBS 37|
  • 38. Artificial intelligence • Neural network analysis: – A computer program modeled after the human brain and can identify patterns in a similar way that we do – This technique is particularly useful if you have a large amount of data, which can reveal subtle patterns you haven’t found or modelled ex ante @Thorhildur Jetzek CBS 38|
  • 39. Machine learning • Machine learning can use many different algorithms – Machine learning can use supervised, semi- supervised or unsupervised learning processes – Despite the fancy connotation, some machine learning algorithms are not that complex – Of course they can also be very complex (Google‘s self driving car, IBM Watson‘s chess playing algorithm) @Thorhildur Jetzek CBS 39|
  • 41. Data mining • Data mining: – A process of extracting value from large quantities of unstructured data, including text, images, voice and video. Includes pattern recognition, tagging and annotation – Data mining can really increase the value of the data • Sentiment analysis: – Seeks to extract subjective opinion or sentiment from text, video or audio data – The basic aim is to determine the attitude of an individual or group regarding a particular topic or overall context – Used to understand stakeholder opinion @Thorhildur Jetzek CBS 41|
  • 42. Text analysis Source: Zimmerman, C., Stein, M. K., Hardt, D., & Vatrapu, R. (2015). Emergence of Things Felt. In Proceedings of the Thirty Sixth International Conference on Information Systems. ICIS 2015 Own analysis based on Twitter data, query all tweeds including "open data" OR opengovdata OR opengov in March 2012/13/14, total of 100k rows @Thorhildur Jetzek CBS 42|
  • 44. Data storage (no-SQL) • Hadoop: open-source software framework for distributed storage of very large datasets on computer clusters • Cloudera: An enterprise solution to help businesses manage their Hadoop ecosystem • MongoDB: It’s good for managing data that changes frequently or data that is unstructured or semi-structured • Apache Cassandra: Data replication, scalability and performance @Thorhildur Jetzek CBS 44|
  • 45. Middleware: Data integration and management • Talend: Master Data Management (MDM) offering, which combines real-time data, applications, and process integration with embedded data quality and stewardship • Pentaho: A Comprehensive data integration and business analytics platform, incl. embedded analysis • Splunk: Monitor, search and analyze massive streams of machine data • InfoSphere Master Data Management: Helps link unstructured content from external sources to the golden record for that enhanced 360-degree view @Thorhildur Jetzek CBS 45|
  • 46. (Visual) Analytics • Many of the middleware solutions reach into this space and vice versa – most of these tools have data integration possibilities and the others offer some analytics • Tableau: Has focused on integration to various data-sources (incl. Hadoop) and easy visualization of data – very easy to use • Qlik: Very robust, offering options to create very nice dashboards, but has a bit steeper learning curve • IBM Watson Analytics: Can use natural language to ask questions that are „translated“ into a query @Thorhildur Jetzek CBS 46|
  • 47. Advanced Analytics • SPSS – tabular data only and narrow capabilities but relatively easy to use • SAS – dynamic (a programming language) and a lot of options for analysis • Matlab – a lot of flexibility for doing own programming • R (open source) – can apply packages but you still have to do a lot of manual labour (code). Many options • For structural equation modelling: specific packages such as SmartPls or Amos @Thorhildur Jetzek CBS 47|
  • 49. Economics of data • I use the economists approach and view data (of any size) as a resource • Specific features of digital data – Low marginal costs – easy to distribute and reuse – Can be used for many different things – Value mostly from downstream activities From an economic perspective, it makes sense to reuse data as much as possible @Thorhildur Jetzek CBS 49|
  • 50. Data accounting • Data as a resource – What data do we have – Where do they originate from – Where are they stored, who is responsible – Are they sensitive or can they be opened for resuse – Are the streaming or static – Are they mission critical or less important – Are we using them optimally? – Do we have the right skills – Do we have the right tools • We know about our human resources, machines, buildings, cars, production parts... Now we must have the same knowledge about data @Thorhildur Jetzek CBS 50|
  • 51. ...capitalizing on the benefits of digitization needs to be a strategic imperative Value generation @Thorhildur Jetzek CBS 51|
  • 52. Business strategy • Consider the competitive advantage offered by your own data • Consider the potential value from using other available external data • Consider costs and benefits • Consider what other companies are doing (not necessarily in the same industry) Sometimes it makes sense to reuse data internally Sometimes it makes sense to fetch and use external data Sometimes it makes sense to share own data @Thorhildur Jetzek CBS 52|
  • 53. Value generation mechanisms @HildaJetzek Exploitation: Goodgovernance Exploration: Drivingchange Economic: Market mechanisms Social: Information sharing mechanisms ∆ Transparency ∆ Civic engagement ∆ Efficiency ∆ Innovation
  • 54. Model with 2-sided markets @HildaJetzek Soft infrastructure Sustainable value Paying side Buying and selling goods and services Non-paying side Sharing relevant content Cost of high- speed networks Openness of data Societal level impact (MSPs) Intermediaries Information sharing + market mechanisms = Synergy Effectiveness of data and privacy protection frameworks Ease of reaching a skilled workforce Motivation AbilityBasic requirements Resource Digital leadership of government Opportunity Societal level structures MSPs = Multi Sided Platforms @Thorhildur Jetzek CBS 54|
  • 56. Examples of use • Better understand and target customers • Understand and optimize business processes • Improving health • Smart cities • Improving sports performance @Thorhildur Jetzek CBS 56|
  • 57. Customer dashboard I Source: IBM @Thorhildur Jetzek CBS 57|
  • 58. Customer Dashboard II Source: IBM, PRNewswire: https://photos.prnewswire.com/prnvar/20150528/219085?max=1600
  • 59. Increase efficiency Source: QuantumBlack http://www.quantumblack.com/ @Thorhildur Jetzek CBS 59|
  • 61. Health dashboards Source Headsupheath http://headsuphealth.com/ @Thorhildur Jetzek CBS 61|
  • 62. Smart cities • Improve security and save money • Analyze traffic -> implement automatic traffic controls @Thorhildur Jetzek CBS 62|
  • 63. Improving sports performance Prozone analyses over 750,000 data points per game, 300 GPS data points per training session and 110 data points per injury to create the most comprehensive injury database in sport. By analyzing over 36 million data points per club/per season we aim to identify the subtle patterns in a player’s performance that predispose them to increased relative risk of injury allowing club to act to prevent injury. @Thorhildur Jetzek CBS 63|
  • 64. @Thorhildur Jetzek CBS 64| THANK YOU!