Powerpoint exploring the locations used in television show Time Clash
Ws2001 sessione8 cibella_tuoto
1. Nicoletta Cibella , Tiziana Tuoto Istituto Nazionale di Statistica – ISTAT – Direzione centrale per le tecnologie e il supporto metodologico (DCMT) RELAIS, a powerful instrument to support public statistic RELAIS, un valido strumento di supporto alla statistica pubblica
2.
3.
4.
5. Possible Solutions for Record Linkage A very jeopardized picture, not only in Istat. Different approaches to deal with record linkage: Exact RL - Deterministic RL - Probabilistic RL (Fellegi and Sunter theory) - Bayesian RL - Machine Learning - Knowledge Representation … No particular technique has emerged as the best solution for all cases (maybe because such a solution does not exist…) Several software and tools proposed, based on different approaches, free or commercial. Nicoletta Cibella, VSP, APRILE 2011
6.
7.
8.
9. 2. Choose the most appropriate techniques Nicoletta Cibella, VSP, APRILE 2011
10. 3. Build ad-hoc RL workflows Nicoletta Cibella, VSP, APRILE 2011 Preprocessing Search Space Reduction Comparison Function Decision Model Normalization UpperLowerCase Blocking SNM Edit Distance Jaro Equality Probabilistic Deterministic RecLink WF Appl2 SNM Probabilistic RecLink WF Appl1 Normalization UpperLowerCase Blocking Jaro Deterministic Equality
11.
12. RELAIS and the open-source EUPL: European Union Public Licence Winning choice of the open-source philosophy and of the overcoming of ad-hoc approaches Sharing experiences and solutions with NSIs of Spain, UK, Tunisia, Brazil, … Training on the job in Uk on January 2011 and in Latvia on July Thanks to the modular approach and the OS, adding new techniques to the pool already available is really easy Nicoletta Cibella, VSP, APRILE 2011
13.
14. Relational database architecture - to optimize the performances with respect to the management of huge amount of data through the whole record linkage project (input, intermediate phase and output). Two modalities to process blocks: a) step by step executions when blocks are few or in exploratory phase and b) one-shot execution to deal with a large amount of blocks (on Spanish NSI suggestion). Explicit management of the output and residual files to iterate several processes and back-up management. Adds on RELAIS 2.0 Nicoletta Cibella, VSP, APRILE 2011
15. RELAIS 2.1 is already available on OSOR and Istat websites. Relational database support: input of data from database Oracle or MySQL. New default input values for the parameter estimation of the probabilistic model and new definition of the candidate pairs for the optimal 1:1 reduction. More than one variable for search space reduction by sorted neighborhood method. Minor bugs have been solved. RELAIS 2.1 in May 2010 Nicoletta Cibella, VSP, APRILE 2011