Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScience . The perspective of European official statistics Fernando Reis, Task-Force Big Data, European Commission (Eurostat)
Where we are and are going for Big Data in OpenScience
Keynote talk at the Big Data Europe SC6 Workshop on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017: The perspective of European official statistics by Fernando Reis, Task-Force Big Data, European Commission (Eurostat).
EDF2014: Nicolas Lemcke Horst, Ambassador of the Danish Basic Data Programme,...
Similar to Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScience . The perspective of European official statistics Fernando Reis, Task-Force Big Data, European Commission (Eurostat)
Similar to Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScience . The perspective of European official statistics Fernando Reis, Task-Force Big Data, European Commission (Eurostat) (20)
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScience . The perspective of European official statistics Fernando Reis, Task-Force Big Data, European Commission (Eurostat)
1. Where we are and are going
for Big Data in OpenScience
The perspective of European
official statistics
Fernando Reis, Task-Force Big Data
European Commission (Eurostat)
Big Data Europe Workshop
Amsterdam, 11 September 2017
2. Where we are
• Public use files for Eurostat micro-data
• They are public
• Used for training purposes and data discovery
prior to getting access to scientific use files
• May be randomly generated from the real
microdata so to preserve statistical properties of
real data
• EU-SILC and EU-LFS
• https://ec.europa.eu/eurostat/cros/content/publ
ic-use-files-eurostat-microdata-0_en
• Not big data (for now)!
3. Where we are
• Scientific use files for Eurostat micro-data
Microdata
confidential
data
for statistical
purposes
national
statistical
production
secure data
exchange
(within ESS)
for scientific
purposes
scientific use files
(pseudonymised
datasets sent to
researchers on DVD)
secure use files
(datasets accessed
in Eurostat's Safe
Centre, outputs checked
for confidentiality)
public use
files
(anonymised
datasets;
identification of
statistical units is
not possible)
4. Where we are
• Scientific use files for Eurostat micro-data
• Access provided to entities that do research;
Eurostat is checking if the entity can be considered research entity (according
to predefined criteria)
• Entity signs agreement that they will use the data
properly - this is a prerequisite for access
• Individual researchers submit projects where they
explain why they need access to microdata
• These projects are verified by Eurostat AND
National statistical offices
If a country disagrees, its data removed from file
• Around 1200 projects running using our microdata
• Not big data (for now)!
5. Where we are
• Legal study on access to big data for statistics
• Purpose
- Identify obstacles and enabling factors in current and upcoming
relevant legislation (MS and EU) regarding the access and use of Big
Data for official statistics (incl. production and dissemination) for four
private data sources: telecom, internet, utilities, payment
• Analysis
- Statistical legislation at EU level
- National legal framework for production of official statistics, including
provisions that may prevent or limit use of big data sources
- EU data protection legislation (Directive 95/46/EC and GDPR)
- National legal framework for personal data protection, including
derogations in case of processing for statistical purposes
- Other relevant legislation (copyright, database legislation)
- National legal framework for traffic and location data
- Existing practices at NSIs
6. Where we are
• Legal study on access to big data for statistics
• Legal obstacles in Member States?
- Not that many true legal obstacles, neither in statistics legislation, nor
in sector legislation
- But there are concerns both for NSI and data sources (mainly for
personal data and confidential business information)…
- Issues: retention period in mobile network data, data minimisation
(burden), transparency towards data subjects
- Statistical confidentiality sufficiently guaranteed? Recital 162 GDPR:
The statistical purpose implies that the result of processing for
statistical purposes is not personal data
- Yet… the potential of big data is currently not being fully
exploited
7. Where we are
• Legal study on access to big data for statistics
• NSI can often compel big data sources to
communicate data to the NSI, but…
- For data sources the rules may not be clear enough
- For NSI the rules may not be strong enough
- Adopting the required legal instrument can require substantial time
and effort (e.g. part of annual program)
- The national DPA may need to be consulted first and may lay down
access modalities and restrictions
- Communication of aggregated data by data sources may not be
possible if they identify too small subgroups (Belgian DPA: at least 30
users in case of location data from MNO)
- Need for continuous, flexible and reliable access not guaranteed by
current legal provisions
- Voluntary partnerships are concluded, mainly with MNOs and retail
trade chains
8. Where we are going
• Legislative initiative for data access?
• Separate law on data access?
- Obligation to private sources to license the data they have for use by
public (statistical) offices
- Right balance between public interest and citizens’ needs to privacy
protection
• Inclusion into specific statistical domain legislation?
- Regulation 2016/792 on consumer prices indexes:
“upon the request of the national bodies responsible for compiling the
harmonised indices, the statistical units shall provide, where available,
electronic records of transactions, such as scanner data, and at the
level of detail necessary in order to produce harmonised indices and to
evaluate compliance with the comparability requirements and the
quality of the harmonised indices”
9. Where we are going
• Open Algorithms (OPAL) Project
• open suite of software and open algorithms
providing access to statistical information extracted
from anonymized, secured and formatted data
• will start with APIs to access indicators such as
population density, mobility, based on mobile
network data
• library of certified open algorithms to extract these
indicators in a governed and trustworthy manner
• http://www.opalproject.org
10.
11. Where we are going
• From Internet of Things to …
• A set of sensors, actuators,
smart objects, data
• communications and
interface technologies that
- allow information to be collected,
tracked and processed across local
and global network infrastructures,
- enabling the future
hyper-connected society
12. Where we are going
• … Smart statistics
• Data capturing, processing
and analysis will be
embedded in the system
itself
• Intelligence along data
life-cycle enhanced with
cognitive processes
13. Where we are going
• Smart statistics proof-of-concept
Proofs-of- concept
•Give life to an idea
•Provide evidence that IoT
data (eco)systems can be
used for official statistics
•Sandbox infrastructure
•…
Prototypes
•Functional model of
producing statistics
leveraging BD
•Monitored use
•Sandbox infrastructure
•Methodology under
construction
•Quality under evaluation
• Limited number of NSI
Working products
•Fully operational
•Up-sized prototype
•Unmonitored use
•UI
•IT infrastructure
•Methodology
•Quality
•Integration with other
statistics
•ESS
?
?
14. Thank you for your attention
Fernando Reis
Eurostat Task Force on Big Data
https://github.com/reisfe/
https://twitter.com/reisfe/
https://linkedin.com/in/reisfe/
fernando.reis@ec.europa.eu