SlideShare une entreprise Scribd logo
1  sur  26
BigSurv18 – A Conference
and a Monograph
Lars Lyberg, Inizio
Presentation at Frimis, November 28, 2018
1
2
Top stories - Read all about them
1. Anyone can take pictures of you from a
satellite and there is nothing you can
do about it
2. Better data can drive travel and
conference savings
3. The Health and Human Services
Department will mine data from its
internal social network
4. Vietnam taps Big Data to avoid China’s
traffic chaos
5. Tweets can foretell votes
3
A black swan
A black swan is an undirected and unpredicted event.
It is rare, has an extreme impact but in retrospect we saw it coming
• Internet - yes
• 9/11 - yes
• The Lehman Brothers crash – yes
• Decreasing response rates-yes
• The advent of Big Data –not really
• New data sources other than BD-yes
• Nonprobability sampling-yes and no
4
Monograph Contents
● The new survey landscape
● Total error and data quality
● Big data in official statistics
● Combining big data with survey statistics: methods and
applications
● Combining big data with survey statistics: tools
● Regulations, ethics, privacy
5
A Couple of Giants
Sir Ronald Fisher Jerzy Neyman
6
A Stunning Statement Made by
One of the Keynote Speakers
Surveys are the last resort
7
Design, Measurement, Inference
Adaptation to Available Data
Resource
Traditional Survey Design,
Measurement and Inference
Model-Assisted Survey Design,
Inference
Survey-Assisted Modeling
Modeling, Data Mining,
“Analytics”
InformationContentofAvailableData
High
Low
Source: Heeringa 2018
Why Did Data Become “BIG”
● Technological advances associated with data science and
computational tools and methods.
● Information-based Decision Making
– “Evidence-based”, “Data-Driven”, “Analytics”, “Machine Learning”
● Focus on short-run prediction
– Business decision making
– Health risks (e.g. Google Flu)
– Financial markets
– Political processes
● Style points: “Tail Fins”
Source: Heeringa 2018
The “V” Taxonomy for Big Data: 345(?, Variability, Value)
Some Concepts
● Artificial Intelligence-machines being able to carry out
tasks in a smart way
● Machine Learning-application of AI where we give
machines access to data and let them learn for
themselves via neural networks and natural language
processing
● Data Mining-builds intuition about what is really
happening in some data
● Data Science-combines the application of computer
science, statistics, programming and business
management
11
Hype of Big Data
Gartner’s hype curve
12
Source: Wikipedia
13
An example of Big Data analytics
● Evolv has a database containing information on 984 000
hourly workers in 20 companies
● Data sources: online employee background checks, time
and attendance tracking software, and performance
ranking programs
● Selected results:
– Employees with a criminal record performed slightly
better than others
– Experienced employees did no better than the
inexperienced
14
An example of Big Data analytics (cont’d)
● Potential problems:
– Weak conceptualization
– Not clear if data sources are compatible across clients
– No background variables
– No questions asked to real persons
– Inference is based on big rather than theory
– The analysis might be business-driven
Having said that:
How can data on 984 000 workers be used effectively?
15
Happiness and Well-being
The common survey question: How satisfied are
you with your life?
BD alternative
• 10 million tweets that are coded for happiness
(rainbow, love, beauty, hope, wonderful,
wine…) and non-happiness (damn, boo, ugly,
smoke, hate, lied,…)
• Happiest states: Hawaii, Utah, Idaho, Maine,
Washington
• Saddest states: Louisiana, Mississippi,
Maryland, Michigan, Delaware
16
The Potential Use of Big
Data in Statistics
Production
● Produce statistics based on BD that
can replace surveys
● Combine BD with admin data, sample
surveys, and nonprobability sources
in order to improve statistics
● Explore new topics and concepts
● Data mining to identify new patterns
and models
Examples of Sources of Data
● Censuses
● Other survey programs
● Administrative data systems
● Medical records systems
● Commercially compiled data
● Financial data
● Satellite imagery
● GPS and GIS
● Social media
● Mobile devices
● Wearable measurement
devices
● Sensors (Internet of
Things)
● Visual data: pictures and
video
● Genetic profile data
● Transactional data systems
Adapted from Heeringa 2018
AIS data
● AIS - Automatic Identification System
● Data can be used to follow all vessels
● Messages from vessels are transmitted
with high speed
● Monitor marine traffic in real time
Source:
www.marinetraffi
c.com
Improve current statistics
and produce new statistics
(de Wit et al 2017)
● Statistics on marine traffic
to estimate emission and
identify areas with heavy
traffic
● Port statistics to monitor
how vessels move between
different ports
Combining different data
sources- example from Statistics
Sweden
The left map shows green areas in Lidingö, Stockholm
Sweden. Combining the data with data from the real estate
register we get the green areas that are accessible to the
public in general (the right map).
Source: SCB. Bakgrundskarta © Lantmäteriet
Combining Different Data Sources-
Example from Statistics Netherlands
● Solar energy - power use estimates:
– transmission grid load,
– metrological data,
– areal images,
– electricity meter readings, and
– energy efficient home
improvements.
● How to combine these data sources?
That’s the question.
Satellite imagery: LANDSAT Crop
Layer
23
80 acre sun flower field. Fargo, North Dakota.
Source: USDA
New set of legal and ethical
issues in the big data era
● Data are often collected for one purpose but combined with
other data sources and used for another purpose
● Risk of privacy and confidentiality breaches
● The old way – to get access to survey and administrative data by
using statistical disclosure control techniques and provision of
controlled access through research data centers.
● The new way – unclear legal situation – who owns the new type
of data?
● Consent statements that foresee all potential future use of data -
too complex for anyone to grasp
● There are no data stewards controlling access to individual data
● GDPR does not explicitly mention BD
Take-away points
• New survey developments are taking place
• Our industry needs innovations, less fighting
and more collaboration
• We need to merge with other research cultures
• We need to know more about combining data
sources
• We need to account for all major sources of
uncertainty that are associated with data
collection and analysis of data
• We need to develop new theories for handling
error structures and combining data sources
25
Over and Out
26

Contenu connexe

Tendances

Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
JOSEPH FRANCIS
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
Trillium Software
 
Supply chain management
Supply chain managementSupply chain management
Supply chain management
muditawasthi
 
Capps programoninformationsciencebrownbag
Capps programoninformationsciencebrownbagCapps programoninformationsciencebrownbag
Capps programoninformationsciencebrownbag
Micah Altman
 

Tendances (19)

Big data, Machine learning and the Auditor
Big data, Machine learning and the AuditorBig data, Machine learning and the Auditor
Big data, Machine learning and the Auditor
 
Data science
Data scienceData science
Data science
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
 
Sample
Sample Sample
Sample
 
Importance of Big data for your Business
Importance of Big data for your BusinessImportance of Big data for your Business
Importance of Big data for your Business
 
Big data-analytics-ebook
Big data-analytics-ebookBig data-analytics-ebook
Big data-analytics-ebook
 
Case 3.1 - Big data big rewards
Case 3.1 - Big data big rewardsCase 3.1 - Big data big rewards
Case 3.1 - Big data big rewards
 
Experfy Online Course - An Introduction to Diagnosing Diseases with Patient ...
Experfy  Online Course - An Introduction to Diagnosing Diseases with Patient ...Experfy  Online Course - An Introduction to Diagnosing Diseases with Patient ...
Experfy Online Course - An Introduction to Diagnosing Diseases with Patient ...
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
Big Data Analytics government healthcare
Big Data Analytics government healthcareBig Data Analytics government healthcare
Big Data Analytics government healthcare
 
Bigdatappt
BigdatapptBigdatappt
Bigdatappt
 
Big Data Analytics Proposal #1
Big Data Analytics Proposal #1Big Data Analytics Proposal #1
Big Data Analytics Proposal #1
 
Predictive analytics in uae government organizations
Predictive analytics in uae government organizationsPredictive analytics in uae government organizations
Predictive analytics in uae government organizations
 
Big data seminar at Broadridge
Big data seminar at BroadridgeBig data seminar at Broadridge
Big data seminar at Broadridge
 
BIG DATA -- The Next Big Thing!
BIG DATA -- The Next Big Thing!BIG DATA -- The Next Big Thing!
BIG DATA -- The Next Big Thing!
 
Supply chain management
Supply chain managementSupply chain management
Supply chain management
 
IOT DATA AND BIG DATA
IOT DATA AND BIG DATAIOT DATA AND BIG DATA
IOT DATA AND BIG DATA
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Capps programoninformationsciencebrownbag
Capps programoninformationsciencebrownbagCapps programoninformationsciencebrownbag
Capps programoninformationsciencebrownbag
 

Similaire à Lars Lyberg, Inizio: Rapport från konferensen BigSurv18

June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
June 2015 (142)  MIS Quarterly Executive   67The Big Dat.docxJune 2015 (142)  MIS Quarterly Executive   67The Big Dat.docx
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
croysierkathey
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
maigva
 
Big_data_analytics_for_life_insurers_published
Big_data_analytics_for_life_insurers_publishedBig_data_analytics_for_life_insurers_published
Big_data_analytics_for_life_insurers_published
Shradha Verma
 
how you can use data analytics
how you can use data analytics how you can use data analytics
how you can use data analytics
Dan Bart
 

Similaire à Lars Lyberg, Inizio: Rapport från konferensen BigSurv18 (20)

Big Data Analytics (1).ppt
Big Data Analytics (1).pptBig Data Analytics (1).ppt
Big Data Analytics (1).ppt
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
Applications of Big Data Analytics in Businesses
Applications of Big Data Analytics in BusinessesApplications of Big Data Analytics in Businesses
Applications of Big Data Analytics in Businesses
 
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
June 2015 (142)  MIS Quarterly Executive   67The Big Dat.docxJune 2015 (142)  MIS Quarterly Executive   67The Big Dat.docx
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docx
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
 
ppt1.pptx
ppt1.pptxppt1.pptx
ppt1.pptx
 
Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...Data science and its potential to change business as we know it. The Roadmap ...
Data science and its potential to change business as we know it. The Roadmap ...
 
Big data
Big dataBig data
Big data
 
Interesting ways Big Data is used today
Interesting ways Big Data is used todayInteresting ways Big Data is used today
Interesting ways Big Data is used today
 
Big data analytics for life insurers
Big data analytics for life insurersBig data analytics for life insurers
Big data analytics for life insurers
 
Big_data_analytics_for_life_insurers_published
Big_data_analytics_for_life_insurers_publishedBig_data_analytics_for_life_insurers_published
Big_data_analytics_for_life_insurers_published
 
Emerging technologies in computer science
Emerging technologies in computer scienceEmerging technologies in computer science
Emerging technologies in computer science
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream Processing
 
Data institutions for climate-induced migration Scanning the local data ecosy...
Data institutions for climate-induced migration Scanning the local data ecosy...Data institutions for climate-induced migration Scanning the local data ecosy...
Data institutions for climate-induced migration Scanning the local data ecosy...
 
Big Data for Development
Big Data for DevelopmentBig Data for Development
Big Data for Development
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
how you can use data analytics
how you can use data analytics how you can use data analytics
how you can use data analytics
 

Plus de Alf Fyhrlund

Dwg 2012-oct-07 - european commission open data and public sector information
Dwg 2012-oct-07 - european commission open data and public sector informationDwg 2012-oct-07 - european commission open data and public sector information
Dwg 2012-oct-07 - european commission open data and public sector information
Alf Fyhrlund
 
Sveriges Riksbank - Monetary Policy Report October 2011
Sveriges Riksbank - Monetary Policy Report October 2011Sveriges Riksbank - Monetary Policy Report October 2011
Sveriges Riksbank - Monetary Policy Report October 2011
Alf Fyhrlund
 

Plus de Alf Fyhrlund (16)

3 thomas-laitila-orebro-universitet
3 thomas-laitila-orebro-universitet3 thomas-laitila-orebro-universitet
3 thomas-laitila-orebro-universitet
 
8 sophie-hedestad-meltwater
8 sophie-hedestad-meltwater8 sophie-hedestad-meltwater
8 sophie-hedestad-meltwater
 
5 gunnar-ehrnborg-ericsson
5 gunnar-ehrnborg-ericsson5 gunnar-ehrnborg-ericsson
5 gunnar-ehrnborg-ericsson
 
3 thomas-laitila-orebro-universitet
3 thomas-laitila-orebro-universitet3 thomas-laitila-orebro-universitet
3 thomas-laitila-orebro-universitet
 
2 dan-hedlin-stockholms-universitet
2 dan-hedlin-stockholms-universitet2 dan-hedlin-stockholms-universitet
2 dan-hedlin-stockholms-universitet
 
1 lilli-japec-scb
1 lilli-japec-scb1 lilli-japec-scb
1 lilli-japec-scb
 
Daniel Thorburn, SU: Är bayesianska metoder användbara?
 Daniel Thorburn, SU: Är bayesianska metoder användbara? Daniel Thorburn, SU: Är bayesianska metoder användbara?
Daniel Thorburn, SU: Är bayesianska metoder användbara?
 
Lars Lyberg, Inizio: Ett föränderligt surveylandskap
Lars Lyberg, Inizio: Ett föränderligt surveylandskapLars Lyberg, Inizio: Ett föränderligt surveylandskap
Lars Lyberg, Inizio: Ett föränderligt surveylandskap
 
Anders Holmberg, Statistisk Sentralbyrå, Norge: Att finna och utforska transa...
Anders Holmberg, Statistisk Sentralbyrå, Norge: Att finna och utforska transa...Anders Holmberg, Statistisk Sentralbyrå, Norge: Att finna och utforska transa...
Anders Holmberg, Statistisk Sentralbyrå, Norge: Att finna och utforska transa...
 
Web panel surveys patrick sturgis
Web panel surveys patrick sturgisWeb panel surveys patrick sturgis
Web panel surveys patrick sturgis
 
Web panel surveys maja fromseier petersen
Web panel surveys maja fromseier petersenWeb panel surveys maja fromseier petersen
Web panel surveys maja fromseier petersen
 
Web panel surveys maja fromseier petersen
Web panel surveys maja fromseier petersenWeb panel surveys maja fromseier petersen
Web panel surveys maja fromseier petersen
 
Dwg 2012-oct-07 - european commission open data and public sector information
Dwg 2012-oct-07 - european commission open data and public sector informationDwg 2012-oct-07 - european commission open data and public sector information
Dwg 2012-oct-07 - european commission open data and public sector information
 
Mpu oh 111220
Mpu oh 111220Mpu oh 111220
Mpu oh 111220
 
Sveriges Riksbank: Monetary Policy Update December 2011
Sveriges Riksbank: Monetary Policy Update December 2011Sveriges Riksbank: Monetary Policy Update December 2011
Sveriges Riksbank: Monetary Policy Update December 2011
 
Sveriges Riksbank - Monetary Policy Report October 2011
Sveriges Riksbank - Monetary Policy Report October 2011Sveriges Riksbank - Monetary Policy Report October 2011
Sveriges Riksbank - Monetary Policy Report October 2011
 

Dernier

Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
ZurliaSoop
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
David Celestin
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Hung Le
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 

Dernier (17)

My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 

Lars Lyberg, Inizio: Rapport från konferensen BigSurv18

  • 1. BigSurv18 – A Conference and a Monograph Lars Lyberg, Inizio Presentation at Frimis, November 28, 2018 1
  • 2. 2
  • 3. Top stories - Read all about them 1. Anyone can take pictures of you from a satellite and there is nothing you can do about it 2. Better data can drive travel and conference savings 3. The Health and Human Services Department will mine data from its internal social network 4. Vietnam taps Big Data to avoid China’s traffic chaos 5. Tweets can foretell votes 3
  • 4. A black swan A black swan is an undirected and unpredicted event. It is rare, has an extreme impact but in retrospect we saw it coming • Internet - yes • 9/11 - yes • The Lehman Brothers crash – yes • Decreasing response rates-yes • The advent of Big Data –not really • New data sources other than BD-yes • Nonprobability sampling-yes and no 4
  • 5. Monograph Contents ● The new survey landscape ● Total error and data quality ● Big data in official statistics ● Combining big data with survey statistics: methods and applications ● Combining big data with survey statistics: tools ● Regulations, ethics, privacy 5
  • 6. A Couple of Giants Sir Ronald Fisher Jerzy Neyman 6
  • 7. A Stunning Statement Made by One of the Keynote Speakers Surveys are the last resort 7
  • 8. Design, Measurement, Inference Adaptation to Available Data Resource Traditional Survey Design, Measurement and Inference Model-Assisted Survey Design, Inference Survey-Assisted Modeling Modeling, Data Mining, “Analytics” InformationContentofAvailableData High Low Source: Heeringa 2018
  • 9. Why Did Data Become “BIG” ● Technological advances associated with data science and computational tools and methods. ● Information-based Decision Making – “Evidence-based”, “Data-Driven”, “Analytics”, “Machine Learning” ● Focus on short-run prediction – Business decision making – Health risks (e.g. Google Flu) – Financial markets – Political processes ● Style points: “Tail Fins” Source: Heeringa 2018
  • 10. The “V” Taxonomy for Big Data: 345(?, Variability, Value)
  • 11. Some Concepts ● Artificial Intelligence-machines being able to carry out tasks in a smart way ● Machine Learning-application of AI where we give machines access to data and let them learn for themselves via neural networks and natural language processing ● Data Mining-builds intuition about what is really happening in some data ● Data Science-combines the application of computer science, statistics, programming and business management 11
  • 12. Hype of Big Data Gartner’s hype curve 12 Source: Wikipedia
  • 13. 13
  • 14. An example of Big Data analytics ● Evolv has a database containing information on 984 000 hourly workers in 20 companies ● Data sources: online employee background checks, time and attendance tracking software, and performance ranking programs ● Selected results: – Employees with a criminal record performed slightly better than others – Experienced employees did no better than the inexperienced 14
  • 15. An example of Big Data analytics (cont’d) ● Potential problems: – Weak conceptualization – Not clear if data sources are compatible across clients – No background variables – No questions asked to real persons – Inference is based on big rather than theory – The analysis might be business-driven Having said that: How can data on 984 000 workers be used effectively? 15
  • 16. Happiness and Well-being The common survey question: How satisfied are you with your life? BD alternative • 10 million tweets that are coded for happiness (rainbow, love, beauty, hope, wonderful, wine…) and non-happiness (damn, boo, ugly, smoke, hate, lied,…) • Happiest states: Hawaii, Utah, Idaho, Maine, Washington • Saddest states: Louisiana, Mississippi, Maryland, Michigan, Delaware 16
  • 17. The Potential Use of Big Data in Statistics Production ● Produce statistics based on BD that can replace surveys ● Combine BD with admin data, sample surveys, and nonprobability sources in order to improve statistics ● Explore new topics and concepts ● Data mining to identify new patterns and models
  • 18. Examples of Sources of Data ● Censuses ● Other survey programs ● Administrative data systems ● Medical records systems ● Commercially compiled data ● Financial data ● Satellite imagery ● GPS and GIS ● Social media ● Mobile devices ● Wearable measurement devices ● Sensors (Internet of Things) ● Visual data: pictures and video ● Genetic profile data ● Transactional data systems Adapted from Heeringa 2018
  • 19. AIS data ● AIS - Automatic Identification System ● Data can be used to follow all vessels ● Messages from vessels are transmitted with high speed ● Monitor marine traffic in real time Source: www.marinetraffi c.com
  • 20. Improve current statistics and produce new statistics (de Wit et al 2017) ● Statistics on marine traffic to estimate emission and identify areas with heavy traffic ● Port statistics to monitor how vessels move between different ports
  • 21. Combining different data sources- example from Statistics Sweden The left map shows green areas in Lidingö, Stockholm Sweden. Combining the data with data from the real estate register we get the green areas that are accessible to the public in general (the right map). Source: SCB. Bakgrundskarta © Lantmäteriet
  • 22. Combining Different Data Sources- Example from Statistics Netherlands ● Solar energy - power use estimates: – transmission grid load, – metrological data, – areal images, – electricity meter readings, and – energy efficient home improvements. ● How to combine these data sources? That’s the question.
  • 23. Satellite imagery: LANDSAT Crop Layer 23 80 acre sun flower field. Fargo, North Dakota. Source: USDA
  • 24. New set of legal and ethical issues in the big data era ● Data are often collected for one purpose but combined with other data sources and used for another purpose ● Risk of privacy and confidentiality breaches ● The old way – to get access to survey and administrative data by using statistical disclosure control techniques and provision of controlled access through research data centers. ● The new way – unclear legal situation – who owns the new type of data? ● Consent statements that foresee all potential future use of data - too complex for anyone to grasp ● There are no data stewards controlling access to individual data ● GDPR does not explicitly mention BD
  • 25. Take-away points • New survey developments are taking place • Our industry needs innovations, less fighting and more collaboration • We need to merge with other research cultures • We need to know more about combining data sources • We need to account for all major sources of uncertainty that are associated with data collection and analysis of data • We need to develop new theories for handling error structures and combining data sources 25