SlideShare une entreprise Scribd logo
1  sur  21
Archivi amministrativi per le statistiche
                             Eurostat


                    Stefania Cardinaleschi, Vincenzo Spinelli
                               Istat – Istituto Nazionale di Statistica


                                 SIS VSP 19-20 Aprile 2012



Sessione: Utilizzo statistico di archivi amministrativi
Outline


The context
    • Definition of Structural
    Earning Survey (SES)
               • Process flow for
Focus on         the Education public sector
    • Data integration by Record Linkage
    • Estimation of local units



                                           2
The Context

LAbour MArket Statistics (LAMAS)
the Council Regulation (EC) No 530/1999
“…needs information on the level and composition of labour costs and
on the structure and distribution of earnings in order to assess the
economic development in the Member States…”




Labour Cost Survey (LCS)                Structure of Earnings Survey(SES)
“The statistics on the level            “The statistics on the structure and
and composition of labour               distribution of earnings
costs

The objective of this legislation is to provide accurate and
harmonised data on earnings in EU Member States and other
countries for policy-making and research purposes (gender pay
gap, file MFR, ..

                                                                3
SES - outline


Objective
• to provide accurate and harmonised microdata on earnings for
  scientific purpose
• to monitor the structure and distribution of earnings, taking into
  account job-related factors
• to provide information on several individual characteristics of
  employees such as gender, age, occupation, education, length of
  stay in service and others

Coverage

• reporting units consisted of enterprises with 10+ employees; results
  related to local units
• C-K of Nace rev.1 plus M-O Nace rev.1 from 2006
• Private sector plus public sector from 2006



                                                                 4
Definition of SES by Eurostat

      Structure of the Survey
      SES 2010

Private sector                   Public sector
and S13 list (excluded P.A.      estimates through administrative
and Education)                   data on all institutions
direct survey through a          specifically
questionnaire
                                 • Education
Firms chosen from the official
                                 they cover the 11% of the total
list of firms (ASIA 2009)
                                 employment




                                                             5
Private sector and S13 List

Two stage sampling design: a sample of employees in a sample of
enterprises

                           enterprises

10-249 employees                         >249 employees
                first stage: the enterprises

stratified sample                        census
by economic activity
   dimension
   geographical position

  second stage: employees (october) belonging to the chosen
  units
                 two chances by the enterprises:
                 1. simple random sample
                 2. they could be given a list of the VAT
                    code of employees to interview

                                                            6
SES sample design – private sector
Sample design Second stage of sampling

Number of employees interviewed by dimension of enterprise to which
they belong

                Enterprises Dimension   Number of employees
                        10-19                   all
                        20-49                   20
                        50-99                   25
                       100-249                  35
                       250-499                  40
                       500-999                  50
                     1000-1999                  60
                     2000-3999                  65
                     4000-7499                  75
                     7500-9999                 100
                       >10000                  200


                                                          7
SES Education - public sector

Education - Public sector: Estimates based on data derived from
 integration among administrative, fiscal and statistic sources
Administrative and fiscal data:
• 770 Form Tax Register by MEF (2010)
• Payroll dataset by MEF/Service Personale Tesoro List employement
  teaching and not teaching by Ministero dell’Istruzione, dell’Università e
  della Ricerca (2010-2011)


Statistical surveys (2010):

• Eu-Silc Panel Survey Statistics on Income and Living Conditions
• Labour force survey




                                                                      8
The Context

   Process flow                              Education



             Data acquisition                    School Employment


                                           770          Payroll + List (MIUR)




          Estimation of Census


      Integration with survey data          Eu-silc           LFS



   Next steps (sampling, checking, ....)

                                                                9
The Context

   Process flow                              Integration with survey data


                      sampling                             Census
          eusilc                     fl
                                                           Eu-silc     LFS
          isco_2                   isco_3
         manag_2                 manag_3
          isced_2                 isced_3
                                   anz_3
   Part-time full time _2   Part-time full time_3
          tipo_2                   tipo_3
         cittad_2                 cittad_3
           ore_2                   ore_3
                                 orestra_3
         bonus_2                  bonus_3               SES - Education
          RLM_2                   RLM_3

                       checking
                                                                      10
Data integration in SES


The context:
   Archives coming from heterogeneous sources.

Objective:
   Assignment to the statistical units (employees)
   in Census R85 of some features available in LFS
   data.

Problem:
   The two sources (SES and LFS) do not use the
   same key fields to identify their statistical units.


                                                 11
Data integration in SES

Warning:
   Eu-silc can be considered as a “special case” of
   LFS for the integration problem, and, for this
   reason, it is not further mentioned in this
   presentation.
Choice:
   The key field in LFS is “personal code”, valid
   only in this context. While census R85 is based
   on “fiscal code” a well-defined key for physical
   persons in administrative and fiscal archives.
   This is why we want to define a mapping from
   “personal code” to fiscal code and not vice versa.

                                                12
Data integration in SES

how can we integrate these archives?
        (LFS) Personal code   Birth date Sex Name & surname




Census (R85)



                                Personal code (LFS) and Fiscal
                                code (R85) and cannot be
                                compared directly!



                                                         13
Data integration in SES

Hypothesis:
•Census is error free and must be considered as the benchmark for
LFS archives.

•In LFS archives the personal data are affected by random errors;

•the sistematic ones (or bias) must be corrected outside this
context;

•The errors are not uniformly distributed in all the fields of the
personal data of LFS. Errors in (Name, surname) fields are more
likely than in birth date/place or gender.

 Consequence:
•We define and assign a “fiscal code” to each personal code
(statistical unit) in LFS.


                                                           14
Data integration in SES

       “Naïve” solution to matching problem
 begin
   •normalization step for personal data in LFS archive




  •definition of fiscal code on normalized personal data




  •<the archives from LFS and R85 can be joined by fiscal code>
 end

                                                           15
Data integration in SES

                                 Results

LFS personal data (reference year 2010) : 92,129 records.


Before normalization step : Error rate: 27.8%
         66,481 records can be matched in fiscal archives (i.e. Modello 770/2010).



After normalization step : Error rate: 18.5% (-9.3%)




                                                                     16
Estimation of local units

                    school 1
   Local unit 1


                      school 2
   MIUR


                   school 5

   Local unit 2
                        school 4


                      school 3

But: what if there are multi-level local units?

                                                  17
Estimation of local units


                   Local units structures


                                                  local unit




          school               addresses

A constraint must hold in this list: every school must be
“linked” to one and only one local unit

In other words, every school must belong to a cluster
having a unique “center of mass” (a school itself)
                                                      18
Estimation of local units

  From local units to graphs                          Local unit =
                                                      connected components



                                                      BIEE002018


                                                                    BIEE002029




  AGEE00101V                AGEE001042          BIEE002007              BIEE00203A




               AGEE00100T                BIEE00206D           BIEE00205C


This list can be seen as a graph G: the vertices are the
codes of the schools and the (oriented) edges are the
couples in each row of the list.
                                                                   19
Estimation of local units

Result
  The local units in R85 are the connected components in G
  such that they are (oriented) tree with one root.


Search of connected
components in G:
there are many algorithms to
compute the connected
components of a graph in
linear time using either
breadth-first search or
depth-first search.
                                          local units

                                                        20
Estimation of local units

                           Inputs
There are 36,923 schools (reference year 2010), i.e., vertices in G.
There are 24,031 (oriented) edges in G.


      Before the clustering Algorithm….
We get 11,892 connected components, such that 3,126 are singletons.

The average size of these components is 3, while the largest
components has 21 vertices.
There are 512 components having less than 10 employees.

                        Results
We considered 11,380 local units (i.e. 35,152 schools) in SES 2010.


                                                                 21

Contenu connexe

En vedette

Droit des cartels et de la concurrence déloyale
Droit des cartels et de la concurrence déloyaleDroit des cartels et de la concurrence déloyale
Droit des cartels et de la concurrence déloyalefredericborel
 
Projet Fin d'année version finale
Projet Fin d'année version finaleProjet Fin d'année version finale
Projet Fin d'année version finaleHoussem AZZOUZ
 

En vedette (13)

sisvsp2012 sessione6 biggeri_laureti_secondi
sisvsp2012 sessione6 biggeri_laureti_secondisisvsp2012 sessione6 biggeri_laureti_secondi
sisvsp2012 sessione6 biggeri_laureti_secondi
 
sisvsp2012_sessione11_rigatti luchini_lago_buffa
sisvsp2012_sessione11_rigatti luchini_lago_buffasisvsp2012_sessione11_rigatti luchini_lago_buffa
sisvsp2012_sessione11_rigatti luchini_lago_buffa
 
sisvsp2012_sessione9_giusti_marchetti_pratesi_
sisvsp2012_sessione9_giusti_marchetti_pratesi_sisvsp2012_sessione9_giusti_marchetti_pratesi_
sisvsp2012_sessione9_giusti_marchetti_pratesi_
 
sisvsp2012_sessione3_da valle_mosca
sisvsp2012_sessione3_da valle_moscasisvsp2012_sessione3_da valle_mosca
sisvsp2012_sessione3_da valle_mosca
 
sisvsp2012_sessione7_montella_righi
sisvsp2012_sessione7_montella_righisisvsp2012_sessione7_montella_righi
sisvsp2012_sessione7_montella_righi
 
sisvsp2012 sessione1_lucarelli_baussola_mussida
sisvsp2012 sessione1_lucarelli_baussola_mussidasisvsp2012 sessione1_lucarelli_baussola_mussida
sisvsp2012 sessione1_lucarelli_baussola_mussida
 
Ws2011 sessione1 benetti_coccia_fabbris_mondauto_strano
Ws2011 sessione1 benetti_coccia_fabbris_mondauto_stranoWs2011 sessione1 benetti_coccia_fabbris_mondauto_strano
Ws2011 sessione1 benetti_coccia_fabbris_mondauto_strano
 
Ws2011 sessione5 barcherini
Ws2011 sessione5 barcheriniWs2011 sessione5 barcherini
Ws2011 sessione5 barcherini
 
Ws2011 sessione1 mundo
Ws2011 sessione1 mundoWs2011 sessione1 mundo
Ws2011 sessione1 mundo
 
Ws2011 sessione3 bonardo_dibella_talucci_galie
Ws2011 sessione3 bonardo_dibella_talucci_galieWs2011 sessione3 bonardo_dibella_talucci_galie
Ws2011 sessione3 bonardo_dibella_talucci_galie
 
Ws2011 sessione8 bergamasco
Ws2011 sessione8 bergamascoWs2011 sessione8 bergamasco
Ws2011 sessione8 bergamasco
 
Droit des cartels et de la concurrence déloyale
Droit des cartels et de la concurrence déloyaleDroit des cartels et de la concurrence déloyale
Droit des cartels et de la concurrence déloyale
 
Projet Fin d'année version finale
Projet Fin d'année version finaleProjet Fin d'année version finale
Projet Fin d'année version finale
 

Similaire à sisvsp2012_ sessione14_ cardinaleschi_spinelli

Comments on Labour Indicators Proposed in the FAO Rural Livelihoods Informati...
Comments on Labour Indicators Proposed in the FAO Rural Livelihoods Informati...Comments on Labour Indicators Proposed in the FAO Rural Livelihoods Informati...
Comments on Labour Indicators Proposed in the FAO Rural Livelihoods Informati...ExternalEvents
 
BSD Users Group Workshop Presentation 29.4.14 - Michael Anyadike Danes
BSD Users Group Workshop Presentation 29.4.14 - Michael Anyadike Danes BSD Users Group Workshop Presentation 29.4.14 - Michael Anyadike Danes
BSD Users Group Workshop Presentation 29.4.14 - Michael Anyadike Danes enterpriseresearchcentre
 
Workforce data: supporting local and national workforce planning S23
Workforce data: supporting local and national workforce planning S23 Workforce data: supporting local and national workforce planning S23
Workforce data: supporting local and national workforce planning S23 Sophie40
 
Evolution of labor share in Poland. New evidence from firm level data.
Evolution of labor share in Poland. New evidence from firm level data.Evolution of labor share in Poland. New evidence from firm level data.
Evolution of labor share in Poland. New evidence from firm level data.GRAPE
 
Employer Employee linked data in Italy availability and usage by institusions
Employer Employee linked data in Italy availability and usage by institusionsEmployer Employee linked data in Italy availability and usage by institusions
Employer Employee linked data in Italy availability and usage by institusionsStructuralpolicyanalysis
 
NAWB Strengthening LMI Connections
NAWB Strengthening LMI ConnectionsNAWB Strengthening LMI Connections
NAWB Strengthening LMI ConnectionsGary Crossley
 
Nora condon's ppt (fas, expert group & iraal)
Nora condon's ppt (fas, expert group & iraal)Nora condon's ppt (fas, expert group & iraal)
Nora condon's ppt (fas, expert group & iraal)mrdmcproductions
 
Sorcha Foster, The risk of automation of work in Ireland
Sorcha Foster, The risk of automation of work in IrelandSorcha Foster, The risk of automation of work in Ireland
Sorcha Foster, The risk of automation of work in IrelandNUI Galway
 
The Robot Revolution: Managerial and employment consequences for firms - Brya...
The Robot Revolution: Managerial and employment consequences for firms - Brya...The Robot Revolution: Managerial and employment consequences for firms - Brya...
The Robot Revolution: Managerial and employment consequences for firms - Brya...OECD CFE
 
Letizia Bertazzon, Sandra Rainero - Making the best use of labour market Data
Letizia Bertazzon, Sandra Rainero - Making the best use of labour market DataLetizia Bertazzon, Sandra Rainero - Making the best use of labour market Data
Letizia Bertazzon, Sandra Rainero - Making the best use of labour market DataOECD CFE
 
Driving Productivity Growth: The Importance of Firm-Specific Knowledge Assets
Driving Productivity Growth: The Importance of Firm-Specific Knowledge AssetsDriving Productivity Growth: The Importance of Firm-Specific Knowledge Assets
Driving Productivity Growth: The Importance of Firm-Specific Knowledge AssetsStructuralpolicyanalysis
 

Similaire à sisvsp2012_ sessione14_ cardinaleschi_spinelli (20)

Comments on Labour Indicators Proposed in the FAO Rural Livelihoods Informati...
Comments on Labour Indicators Proposed in the FAO Rural Livelihoods Informati...Comments on Labour Indicators Proposed in the FAO Rural Livelihoods Informati...
Comments on Labour Indicators Proposed in the FAO Rural Livelihoods Informati...
 
BSD Users Group Workshop Presentation 29.4.14 - Michael Anyadike Danes
BSD Users Group Workshop Presentation 29.4.14 - Michael Anyadike Danes BSD Users Group Workshop Presentation 29.4.14 - Michael Anyadike Danes
BSD Users Group Workshop Presentation 29.4.14 - Michael Anyadike Danes
 
Workforce data: supporting local and national workforce planning S23
Workforce data: supporting local and national workforce planning S23 Workforce data: supporting local and national workforce planning S23
Workforce data: supporting local and national workforce planning S23
 
Evolution of labor share in Poland. New evidence from firm level data.
Evolution of labor share in Poland. New evidence from firm level data.Evolution of labor share in Poland. New evidence from firm level data.
Evolution of labor share in Poland. New evidence from firm level data.
 
Employer Employee linked data in Italy availability and usage by institusions
Employer Employee linked data in Italy availability and usage by institusionsEmployer Employee linked data in Italy availability and usage by institusions
Employer Employee linked data in Italy availability and usage by institusions
 
NAWB Strengthening LMI Connections
NAWB Strengthening LMI ConnectionsNAWB Strengthening LMI Connections
NAWB Strengthening LMI Connections
 
20191127 s1 p-lmis platforms_en
20191127 s1 p-lmis platforms_en20191127 s1 p-lmis platforms_en
20191127 s1 p-lmis platforms_en
 
Nora condon's ppt (fas, expert group & iraal)
Nora condon's ppt (fas, expert group & iraal)Nora condon's ppt (fas, expert group & iraal)
Nora condon's ppt (fas, expert group & iraal)
 
Education beis
Education beisEducation beis
Education beis
 
17_New_Zealand.pdf
17_New_Zealand.pdf17_New_Zealand.pdf
17_New_Zealand.pdf
 
CV FIAZ AHMED SAJID
CV FIAZ AHMED SAJIDCV FIAZ AHMED SAJID
CV FIAZ AHMED SAJID
 
Sorcha Foster, The risk of automation of work in Ireland
Sorcha Foster, The risk of automation of work in IrelandSorcha Foster, The risk of automation of work in Ireland
Sorcha Foster, The risk of automation of work in Ireland
 
The Robot Revolution: Managerial and employment consequences for firms - Brya...
The Robot Revolution: Managerial and employment consequences for firms - Brya...The Robot Revolution: Managerial and employment consequences for firms - Brya...
The Robot Revolution: Managerial and employment consequences for firms - Brya...
 
Letizia Bertazzon, Sandra Rainero - Making the best use of labour market Data
Letizia Bertazzon, Sandra Rainero - Making the best use of labour market DataLetizia Bertazzon, Sandra Rainero - Making the best use of labour market Data
Letizia Bertazzon, Sandra Rainero - Making the best use of labour market Data
 
Sap HR questions
Sap HR questionsSap HR questions
Sap HR questions
 
Brian Fabo
Brian FaboBrian Fabo
Brian Fabo
 
ABCs of IRT
ABCs of IRTABCs of IRT
ABCs of IRT
 
Nssecdetails (1)
Nssecdetails (1)Nssecdetails (1)
Nssecdetails (1)
 
Explaining the decline in earnings inequality in Brazil
Explaining the decline in earnings inequality in BrazilExplaining the decline in earnings inequality in Brazil
Explaining the decline in earnings inequality in Brazil
 
Driving Productivity Growth: The Importance of Firm-Specific Knowledge Assets
Driving Productivity Growth: The Importance of Firm-Specific Knowledge AssetsDriving Productivity Growth: The Importance of Firm-Specific Knowledge Assets
Driving Productivity Growth: The Importance of Firm-Specific Knowledge Assets
 

Plus de Gruppo Valorizzazione delle Statistiche Pubbliche

Plus de Gruppo Valorizzazione delle Statistiche Pubbliche (20)

The jobs crisis : trends and policy
The jobs crisis : trends and policyThe jobs crisis : trends and policy
The jobs crisis : trends and policy
 
sisvsp2012_sessione7_albisinni_marzilli_pintaldi
sisvsp2012_sessione7_albisinni_marzilli_pintaldisisvsp2012_sessione7_albisinni_marzilli_pintaldi
sisvsp2012_sessione7_albisinni_marzilli_pintaldi
 
sisvsp2012_sessione6_vignani_auci
sisvsp2012_sessione6_vignani_aucisisvsp2012_sessione6_vignani_auci
sisvsp2012_sessione6_vignani_auci
 
sisvsp2012_sessione6_serafini
sisvsp2012_sessione6_serafinisisvsp2012_sessione6_serafini
sisvsp2012_sessione6_serafini
 
sisvsp2012_sessione6_righi_recchini
sisvsp2012_sessione6_righi_recchinisisvsp2012_sessione6_righi_recchini
sisvsp2012_sessione6_righi_recchini
 
sisvsp2012_sessione1_gallo_oteri_scalisi
sisvsp2012_sessione1_gallo_oteri_scalisisisvsp2012_sessione1_gallo_oteri_scalisi
sisvsp2012_sessione1_gallo_oteri_scalisi
 
sisvsp2012_sessione1_calzola
sisvsp2012_sessione1_calzolasisvsp2012_sessione1_calzola
sisvsp2012_sessione1_calzola
 
sisvsp2012_sessione1_biffignandi_toninelli
sisvsp2012_sessione1_biffignandi_toninellisisvsp2012_sessione1_biffignandi_toninelli
sisvsp2012_sessione1_biffignandi_toninelli
 
sisvsp2012 sessione5_cardacino_vignola
sisvsp2012 sessione5_cardacino_vignolasisvsp2012 sessione5_cardacino_vignola
sisvsp2012 sessione5_cardacino_vignola
 
sisvsp2012sessione3_bruzzone_tuoto_cibella_valentini_pappagallo_baldassarre
sisvsp2012sessione3_bruzzone_tuoto_cibella_valentini_pappagallo_baldassarresisvsp2012sessione3_bruzzone_tuoto_cibella_valentini_pappagallo_baldassarre
sisvsp2012sessione3_bruzzone_tuoto_cibella_valentini_pappagallo_baldassarre
 
sisvsp2012_sessione3_rossetti
sisvsp2012_sessione3_rossettisisvsp2012_sessione3_rossetti
sisvsp2012_sessione3_rossetti
 
sisvsp2012_sessione3_mazziotta_bernardini_de gaetano_soriani
sisvsp2012_sessione3_mazziotta_bernardini_de gaetano_sorianisisvsp2012_sessione3_mazziotta_bernardini_de gaetano_soriani
sisvsp2012_sessione3_mazziotta_bernardini_de gaetano_soriani
 
sisvsp2012_sessione3_da valle_faustini_tessitore_valentini
sisvsp2012_sessione3_da valle_faustini_tessitore_valentinisisvsp2012_sessione3_da valle_faustini_tessitore_valentini
sisvsp2012_sessione3_da valle_faustini_tessitore_valentini
 
sisvsp2012_sessione4_viviani_mantegazza_pisani
sisvsp2012_sessione4_viviani_mantegazza_pisanisisvsp2012_sessione4_viviani_mantegazza_pisani
sisvsp2012_sessione4_viviani_mantegazza_pisani
 
sisvsp2012_sessione4_fusco_de francesco_moretti_mortara_broccoli
sisvsp2012_sessione4_fusco_de francesco_moretti_mortara_broccolisisvsp2012_sessione4_fusco_de francesco_moretti_mortara_broccoli
sisvsp2012_sessione4_fusco_de francesco_moretti_mortara_broccoli
 
sisvsp2012_sessione4_bini_nascia_zeli
sisvsp2012_sessione4_bini_nascia_zelisisvsp2012_sessione4_bini_nascia_zeli
sisvsp2012_sessione4_bini_nascia_zeli
 
sisvsp2012_sessione9_montella_dishnica
sisvsp2012_sessione9_montella_dishnicasisvsp2012_sessione9_montella_dishnica
sisvsp2012_sessione9_montella_dishnica
 
sisvsp2012_sessione9_d'angiolini_passacantilli_de salvo
sisvsp2012_sessione9_d'angiolini_passacantilli_de salvosisvsp2012_sessione9_d'angiolini_passacantilli_de salvo
sisvsp2012_sessione9_d'angiolini_passacantilli_de salvo
 
sisvsp2012_sessione9_collesi_cotterli
sisvsp2012_sessione9_collesi_cotterlisisvsp2012_sessione9_collesi_cotterli
sisvsp2012_sessione9_collesi_cotterli
 
sisvsp2012_sessione1_lucarelli_baussola_mussida
sisvsp2012_sessione1_lucarelli_baussola_mussidasisvsp2012_sessione1_lucarelli_baussola_mussida
sisvsp2012_sessione1_lucarelli_baussola_mussida
 

Dernier

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

sisvsp2012_ sessione14_ cardinaleschi_spinelli

  • 1. Archivi amministrativi per le statistiche Eurostat Stefania Cardinaleschi, Vincenzo Spinelli Istat – Istituto Nazionale di Statistica SIS VSP 19-20 Aprile 2012 Sessione: Utilizzo statistico di archivi amministrativi
  • 2. Outline The context • Definition of Structural Earning Survey (SES) • Process flow for Focus on the Education public sector • Data integration by Record Linkage • Estimation of local units 2
  • 3. The Context LAbour MArket Statistics (LAMAS) the Council Regulation (EC) No 530/1999 “…needs information on the level and composition of labour costs and on the structure and distribution of earnings in order to assess the economic development in the Member States…” Labour Cost Survey (LCS) Structure of Earnings Survey(SES) “The statistics on the level “The statistics on the structure and and composition of labour distribution of earnings costs The objective of this legislation is to provide accurate and harmonised data on earnings in EU Member States and other countries for policy-making and research purposes (gender pay gap, file MFR, .. 3
  • 4. SES - outline Objective • to provide accurate and harmonised microdata on earnings for scientific purpose • to monitor the structure and distribution of earnings, taking into account job-related factors • to provide information on several individual characteristics of employees such as gender, age, occupation, education, length of stay in service and others Coverage • reporting units consisted of enterprises with 10+ employees; results related to local units • C-K of Nace rev.1 plus M-O Nace rev.1 from 2006 • Private sector plus public sector from 2006 4
  • 5. Definition of SES by Eurostat Structure of the Survey SES 2010 Private sector Public sector and S13 list (excluded P.A. estimates through administrative and Education) data on all institutions direct survey through a specifically questionnaire • Education Firms chosen from the official they cover the 11% of the total list of firms (ASIA 2009) employment 5
  • 6. Private sector and S13 List Two stage sampling design: a sample of employees in a sample of enterprises enterprises 10-249 employees >249 employees first stage: the enterprises stratified sample census by economic activity dimension geographical position second stage: employees (october) belonging to the chosen units two chances by the enterprises: 1. simple random sample 2. they could be given a list of the VAT code of employees to interview 6
  • 7. SES sample design – private sector Sample design Second stage of sampling Number of employees interviewed by dimension of enterprise to which they belong Enterprises Dimension Number of employees 10-19 all 20-49 20 50-99 25 100-249 35 250-499 40 500-999 50 1000-1999 60 2000-3999 65 4000-7499 75 7500-9999 100 >10000 200 7
  • 8. SES Education - public sector Education - Public sector: Estimates based on data derived from integration among administrative, fiscal and statistic sources Administrative and fiscal data: • 770 Form Tax Register by MEF (2010) • Payroll dataset by MEF/Service Personale Tesoro List employement teaching and not teaching by Ministero dell’Istruzione, dell’Università e della Ricerca (2010-2011) Statistical surveys (2010): • Eu-Silc Panel Survey Statistics on Income and Living Conditions • Labour force survey 8
  • 9. The Context Process flow Education Data acquisition School Employment 770 Payroll + List (MIUR) Estimation of Census Integration with survey data Eu-silc LFS Next steps (sampling, checking, ....) 9
  • 10. The Context Process flow Integration with survey data sampling Census eusilc fl Eu-silc LFS isco_2 isco_3 manag_2 manag_3 isced_2 isced_3   anz_3 Part-time full time _2 Part-time full time_3 tipo_2 tipo_3 cittad_2 cittad_3 ore_2 ore_3   orestra_3 bonus_2 bonus_3 SES - Education RLM_2 RLM_3 checking 10
  • 11. Data integration in SES The context: Archives coming from heterogeneous sources. Objective: Assignment to the statistical units (employees) in Census R85 of some features available in LFS data. Problem: The two sources (SES and LFS) do not use the same key fields to identify their statistical units. 11
  • 12. Data integration in SES Warning: Eu-silc can be considered as a “special case” of LFS for the integration problem, and, for this reason, it is not further mentioned in this presentation. Choice: The key field in LFS is “personal code”, valid only in this context. While census R85 is based on “fiscal code” a well-defined key for physical persons in administrative and fiscal archives. This is why we want to define a mapping from “personal code” to fiscal code and not vice versa. 12
  • 13. Data integration in SES how can we integrate these archives? (LFS) Personal code Birth date Sex Name & surname Census (R85) Personal code (LFS) and Fiscal code (R85) and cannot be compared directly! 13
  • 14. Data integration in SES Hypothesis: •Census is error free and must be considered as the benchmark for LFS archives. •In LFS archives the personal data are affected by random errors; •the sistematic ones (or bias) must be corrected outside this context; •The errors are not uniformly distributed in all the fields of the personal data of LFS. Errors in (Name, surname) fields are more likely than in birth date/place or gender. Consequence: •We define and assign a “fiscal code” to each personal code (statistical unit) in LFS. 14
  • 15. Data integration in SES “Naïve” solution to matching problem begin •normalization step for personal data in LFS archive •definition of fiscal code on normalized personal data •<the archives from LFS and R85 can be joined by fiscal code> end 15
  • 16. Data integration in SES Results LFS personal data (reference year 2010) : 92,129 records. Before normalization step : Error rate: 27.8% 66,481 records can be matched in fiscal archives (i.e. Modello 770/2010). After normalization step : Error rate: 18.5% (-9.3%) 16
  • 17. Estimation of local units school 1 Local unit 1 school 2 MIUR school 5 Local unit 2 school 4 school 3 But: what if there are multi-level local units? 17
  • 18. Estimation of local units Local units structures local unit school addresses A constraint must hold in this list: every school must be “linked” to one and only one local unit In other words, every school must belong to a cluster having a unique “center of mass” (a school itself) 18
  • 19. Estimation of local units From local units to graphs Local unit = connected components BIEE002018 BIEE002029 AGEE00101V AGEE001042 BIEE002007 BIEE00203A AGEE00100T BIEE00206D BIEE00205C This list can be seen as a graph G: the vertices are the codes of the schools and the (oriented) edges are the couples in each row of the list. 19
  • 20. Estimation of local units Result The local units in R85 are the connected components in G such that they are (oriented) tree with one root. Search of connected components in G: there are many algorithms to compute the connected components of a graph in linear time using either breadth-first search or depth-first search. local units 20
  • 21. Estimation of local units Inputs There are 36,923 schools (reference year 2010), i.e., vertices in G. There are 24,031 (oriented) edges in G. Before the clustering Algorithm…. We get 11,892 connected components, such that 3,126 are singletons. The average size of these components is 3, while the largest components has 21 vertices. There are 512 components having less than 10 employees. Results We considered 11,380 local units (i.e. 35,152 schools) in SES 2010. 21

Notes de l'éditeur

  1. TUTOR CICLO