SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Methodological
innovations to estimate
illegal economy
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
0
o A research directed by Guido M. Rey has resulted in the volume «La
mafia come impresa. Analisi del sistema economico criminale e dele
politiche di contrasto» (2017)
o In the chapter «Dalle parole ai numeri : estrarre dati dalle sentenze della
magistratura» the results obtained from the analysis of about 5,000
judgements issued by the Corte di Cassazione are presented.
o Increase the results obtained from the text mining of sentences through
the interaction of multiple data sources.
o Evaluation of completeness and reliability of data.
o Organize database(s) aimed at estimating statistical models
1
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
1
Aims
Starting point
Goals
2
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
2
Exercise:
Integration of data from multiple sources
① Judgments issued by the Corte di Cassazione (www.italgiure.it) : Open Data PA
② Orbis : database of economic enterprises accessible with the resources of EMBeDS
(Economics and Management in the era of Data Science), project winner in the MIUR
selection of Departments of Excellence 2018-2022 http://embeds.santannapisa.it/
A subset of 308 sentences has been extracted from the selected 4,632 judgments (from 2012
to September 2016) with one or more of the words “corruzione”, “concussione”, “turbativa” e
“appalto”.
• Issued in 2014
• Containing references to professional roles held in the Public Administration
oCreation of a Corpus with the texts of the judgements
oVocabulary (words and lemma)
oGrammatical and semantic Tagging
oIdentification of Multiwords and segments
oText mining
Through the TalTaC2
package
3
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
3
Step 1:
Import texts of sentences and text mining
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
4
Information chart
Come si può vedere dalla figura seguente, il centro delle informazioni è costituito dal singolo
evento criminoso, che coinvolge attori (singoli o aggregati), che viene individuato / sanzionato,
che si svolge in un luogo geografico specifico, in una data (o periodo) certa, con determinate
modalità, con un valore economico determinato.
Fa parte
/lavora
per
Evento criminoso
persona
persona
 Tribunale
 Polizia

Sanzionato
/Individuato
Valore economico
Euro
coinvolge
quando
dove
come
Ai
danni
di
Insie
me a
Ass
criminale
Ente
Pubblico
Azienda
luogo
periodo
WHO
WHEN
WHERE
WHAT
HOW
Economic value
5
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
5
Guidelines followed for matching with Orbis
-- The matching procedure must be automatic or automatable: repeatable with lists
obtained from a higher number of judgments and without the intervention of
"manual" choices
-- The presence of data / information on natural persons in clear does not pose privacy
problems, because this information is not extracted "per se" but it constitutes the premise for
obtaining a correct and reliable matching: the data are still treated in a statistical way
(anonymously)
6
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
6
Step 2:
Matching with Orbis (1)
«Batch search» (automatic) in two consecutive steps:
 Companies : list obtained from Taltac2 by exporting name and identification of
the sentence
 Persons (defendants): list of defendants obtained by Taltac by exporting
graphic forms with semantic tagging «defendants» (multiword graphic form
with name and surname or surname and name) and date of birth
7
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
7
Step 2:
Matching with Orbis (2)
RESULTS of the «Batch search» (automatic) in two consecutive steps:
 Companies Input : 400 companies of wihich 228 with A score
186 unique companies
(due to the presence of the company name in several judgments
or the name written by judges with more variations)
 Person Input (defendants): 408 defendants (unique, no repetitions)
16 validated records (automatic comparison between date of birth and part of
the social security number) + 6 individual companies
A Excellent total score >= 95%
B Good total score between 85 and 94%
The automated process produces a
matching score for each record.
Our quality indicator uses the
following scoring criteria:
8
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
8
Step 3:
Information contribution from Orbis:
variables with high information potential
What data do we add to those already available?
 Company status
 Business size
 Statistical classification of activities
 Start year
 Budget data
 ….
BUT ALSO THE NAMES OF THE TOP MANAGEMENT AND OWNERS
Again with a view to anonymous treatment, they can be used to identify a network of
companies.
Not interesting "per se" (we are not a detective agency) but holders of other individual
companies and / or family (founded after the outcome of the judgment).
NB: the names of the defendants are clear in the source Corte di Cassazione, as it is the
last court level.
9
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
9
TalTaC results:
The automatic classification of judgments
10
Cluster 1 (n=119) :
presence of organized crime
Cluster 2 (n=177) :
concussion /corruption in the PA
cosca pubblico ufficiale
associazione mafiosa concussione
associazione privato
Nome1 costrizione
sodalizio corruzione
partecipazione induzione
conversazione servizio
estorsione CP
ndrangheta ufficio
clan abuso
Nome2 prescrizione
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
10
How to interpret clusters
First 11 words characterizing the 2 main identified clusters
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
11
Not just text mining but help in the interpretation
The interaction between the results of the textual analysis and the new
information that can be acquired with other databases (administrative or not) is
the novelty of the approach that is presented.
The questions we would like to answer:
Companies present in sentences have characteristics different from those not
present?
Do the companies, belonging to a cluster and present in the judgments, differ?
Example: Different by company size, economic sector, geographical location?
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
12
Regions and companies by cluster
Region
Cluster 1
Reati + org crim
Cluster 2
Reati e PA Total
# sentenze # imprese # sentenze # imprese # sentenze # imprese
Abruzzo 1 1 1 1
Calabria 11 33 1 1 12 34
Campania 6 21 8 13 14 34
Emilia-Romagna 2 7 2 7
Lazio 1 1 5 6 6 7
Liguria 1 1 1 2 2 3
Lombardia 1 13 6 17 7 30
Marche 3 8 3 8
Molise 1 1 1 1
Piemonte 1 10 1 10
Puglia 5 14 5 14
Sardegna 1 1 1 1
Sicilia 5 13 5 11 10 24
Toscana 1 1 4 10 5 11
Veneto 4 17 4 17
Total 26 83 48 119 74 202
Dati provvisori
e parziali
13
National legal form
Number of
companies
Consortium + Consortium with external activity 4
Cooperative company ( SCARL + SCARLPA) 4
Joint stock company - SPA 25
Limited liability company - SRL 121
Limited partnership - SAS 2
One-person company with limited liability - SRLU 21
One-person joint stock company - SPA 3
Sole proprietorship 2
n.d. 4
Total 186
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
13
Companies by national legal form
Provisional and partial data
To be added 22 one-
person companies
obtained from the
list of defendants
14
Status number of companies
Active 135
Active (default of payment) 1
Bankruptcy 1
Dissolved 5
Dissolved (bankruptcy) 16
Dissolved (liquidation) 5
Dissolved (merger or take-over) 6
In liquidation 11
Status unknown 6
Totale 186
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
14
Companies by status
Provisional and partial data
15
Areas
Status
Active Others Status unknown Total
ITC - Northwest 29 12 1 42
ITH - Northeast 22 12 34
ITI - Centre 33 9 42
ITF - South 26 8 4 38
ITG - Insular Italy 15 4 1 20
(blank) 10 0 10
Total 135 45 6 186
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
15
Companies by Geographical Areas and status
Provisional and partial data
Others:
Active (default of payment)
Bankruptcy
Dissolved
Dissolved (bankruptcy)
Dissolved (liquidation)
Dissolved (merger or take-over)
In liquidation
16
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
16
Discussion
The potential sources of data and information are many and each one is organized
according to its own purposes.
The use for statistical purposes obliges to have to take into account some aspects,
sometimes neglected when talking about Big Data or Open Data:
• The completeness of the information
• The time base of the information acquired or possibly acquired
17
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
17
Final goal : the «statistical» DataBase
The database thus obtained will allow reconstructions and analysis starting from
any element (Company, Public Body, persons, period, place, etc) provided that it
is correctly identified as such within the texts of the judgments.
It is, therefore, necessary to use several tools:
Text mining for processing the information contained in the texts of
the sentences and transform them into data that can be analysed
statistically
Validate and integrate this data with other information and data from
other administrative databases / records.
The greater the completeness and reliability of the other databases, the greater
the information value of the statistical analysis carried out on the statistical
database.
Maria Francesca Romano
Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna
18
Credits
Un ringraziamento a:
Fabrizio Alboni
Daniela Arlia
Antonella Baldassarini
Lorenzo Bartalini
Pietro Battiston
Sergio Bolasco
Alberto di Martino
Giuseppe Di Vetta
Pasquale Pavone
Guido M. Rey

Contenu connexe

Similaire à Maria F.Romano, Methodological innovations to estimate illegal economy

Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY Indiagauravmiishra701
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY Indiaaparnatikekar4
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY Indiasathish kriishnan
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaNishantSisodiya
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaNina Yadav
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaNishantSisodiya
 
Forensic Technology & Discovery Services: The Intelligent Connection - EY India
Forensic Technology & Discovery Services: The Intelligent Connection - EY IndiaForensic Technology & Discovery Services: The Intelligent Connection - EY India
Forensic Technology & Discovery Services: The Intelligent Connection - EY Indiasathish kriishnan
 
Managing Information Risk in Financial Services
Managing Information Risk in Financial Services Managing Information Risk in Financial Services
Managing Information Risk in Financial Services Andrew Smart
 
Factors of Doing Business, Case Study Kosovo
Factors of Doing Business, Case Study KosovoFactors of Doing Business, Case Study Kosovo
Factors of Doing Business, Case Study KosovoAJHSSR Journal
 
2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...
2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...
2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...Bruce Collins
 
Estimating The Size of the Irish Population
Estimating The Size of the Irish PopulationEstimating The Size of the Irish Population
Estimating The Size of the Irish PopulationAlan McSweeney
 
Information Innovation: Turning Insights into Opportunities
Information Innovation: Turning Insights into OpportunitiesInformation Innovation: Turning Insights into Opportunities
Information Innovation: Turning Insights into OpportunitiesHubbard One
 
Tisski Ltd Freedom of Information White Paper
Tisski Ltd Freedom of Information White PaperTisski Ltd Freedom of Information White Paper
Tisski Ltd Freedom of Information White PaperKatie Weir
 
Out of the shadows with fiscal compliance technology White Paper Retail Innov...
Out of the shadows with fiscal compliance technology White Paper Retail Innov...Out of the shadows with fiscal compliance technology White Paper Retail Innov...
Out of the shadows with fiscal compliance technology White Paper Retail Innov...Marie Ivarsson
 
Forward thinking q42016
Forward thinking q42016Forward thinking q42016
Forward thinking q42016Nichole Jordan
 
VAT fraud detection : the mysterious case of the missing trader
VAT fraud detection : the mysterious case of the missing traderVAT fraud detection : the mysterious case of the missing trader
VAT fraud detection : the mysterious case of the missing traderLinkurious
 

Similaire à Maria F.Romano, Methodological innovations to estimate illegal economy (20)

Peta Pilot
Peta PilotPeta Pilot
Peta Pilot
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Evolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY IndiaEvolution of Forensic Data Analytics - EY India
Evolution of Forensic Data Analytics - EY India
 
Forensic Technology & Discovery Services: The Intelligent Connection - EY India
Forensic Technology & Discovery Services: The Intelligent Connection - EY IndiaForensic Technology & Discovery Services: The Intelligent Connection - EY India
Forensic Technology & Discovery Services: The Intelligent Connection - EY India
 
Managing Information Risk in Financial Services
Managing Information Risk in Financial Services Managing Information Risk in Financial Services
Managing Information Risk in Financial Services
 
Factors of Doing Business, Case Study Kosovo
Factors of Doing Business, Case Study KosovoFactors of Doing Business, Case Study Kosovo
Factors of Doing Business, Case Study Kosovo
 
2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...
2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...
2016 - TIA NSW 9th Annual Tax Forum - Using data and analytics to support pub...
 
Estimating The Size of the Irish Population
Estimating The Size of the Irish PopulationEstimating The Size of the Irish Population
Estimating The Size of the Irish Population
 
CASE Network Studies and Analyses 340 - The Polish tax system - What has been...
CASE Network Studies and Analyses 340 - The Polish tax system - What has been...CASE Network Studies and Analyses 340 - The Polish tax system - What has been...
CASE Network Studies and Analyses 340 - The Polish tax system - What has been...
 
Just in case, December 2016
Just in case, December 2016Just in case, December 2016
Just in case, December 2016
 
Information Innovation: Turning Insights into Opportunities
Information Innovation: Turning Insights into OpportunitiesInformation Innovation: Turning Insights into Opportunities
Information Innovation: Turning Insights into Opportunities
 
Tisski Ltd Freedom of Information White Paper
Tisski Ltd Freedom of Information White PaperTisski Ltd Freedom of Information White Paper
Tisski Ltd Freedom of Information White Paper
 
Out of the shadows with fiscal compliance technology White Paper Retail Innov...
Out of the shadows with fiscal compliance technology White Paper Retail Innov...Out of the shadows with fiscal compliance technology White Paper Retail Innov...
Out of the shadows with fiscal compliance technology White Paper Retail Innov...
 
Forward thinking q42016
Forward thinking q42016Forward thinking q42016
Forward thinking q42016
 
VAT fraud detection : the mysterious case of the missing trader
VAT fraud detection : the mysterious case of the missing traderVAT fraud detection : the mysterious case of the missing trader
VAT fraud detection : the mysterious case of the missing trader
 
Getting to grips with the BEPS Action Plan
Getting to grips with the BEPS Action PlanGetting to grips with the BEPS Action Plan
Getting to grips with the BEPS Action Plan
 

Plus de Istituto nazionale di statistica

Plus de Istituto nazionale di statistica (20)

Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
14a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica1414a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica14
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 

Dernier

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 

Dernier (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 

Maria F.Romano, Methodological innovations to estimate illegal economy

  • 1. Methodological innovations to estimate illegal economy Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 0
  • 2. o A research directed by Guido M. Rey has resulted in the volume «La mafia come impresa. Analisi del sistema economico criminale e dele politiche di contrasto» (2017) o In the chapter «Dalle parole ai numeri : estrarre dati dalle sentenze della magistratura» the results obtained from the analysis of about 5,000 judgements issued by the Corte di Cassazione are presented. o Increase the results obtained from the text mining of sentences through the interaction of multiple data sources. o Evaluation of completeness and reliability of data. o Organize database(s) aimed at estimating statistical models 1 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 1 Aims Starting point Goals
  • 3. 2 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 2 Exercise: Integration of data from multiple sources ① Judgments issued by the Corte di Cassazione (www.italgiure.it) : Open Data PA ② Orbis : database of economic enterprises accessible with the resources of EMBeDS (Economics and Management in the era of Data Science), project winner in the MIUR selection of Departments of Excellence 2018-2022 http://embeds.santannapisa.it/ A subset of 308 sentences has been extracted from the selected 4,632 judgments (from 2012 to September 2016) with one or more of the words “corruzione”, “concussione”, “turbativa” e “appalto”. • Issued in 2014 • Containing references to professional roles held in the Public Administration
  • 4. oCreation of a Corpus with the texts of the judgements oVocabulary (words and lemma) oGrammatical and semantic Tagging oIdentification of Multiwords and segments oText mining Through the TalTaC2 package 3 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 3 Step 1: Import texts of sentences and text mining
  • 5. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 4 Information chart Come si può vedere dalla figura seguente, il centro delle informazioni è costituito dal singolo evento criminoso, che coinvolge attori (singoli o aggregati), che viene individuato / sanzionato, che si svolge in un luogo geografico specifico, in una data (o periodo) certa, con determinate modalità, con un valore economico determinato. Fa parte /lavora per Evento criminoso persona persona  Tribunale  Polizia  Sanzionato /Individuato Valore economico Euro coinvolge quando dove come Ai danni di Insie me a Ass criminale Ente Pubblico Azienda luogo periodo WHO WHEN WHERE WHAT HOW Economic value
  • 6. 5 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 5 Guidelines followed for matching with Orbis -- The matching procedure must be automatic or automatable: repeatable with lists obtained from a higher number of judgments and without the intervention of "manual" choices -- The presence of data / information on natural persons in clear does not pose privacy problems, because this information is not extracted "per se" but it constitutes the premise for obtaining a correct and reliable matching: the data are still treated in a statistical way (anonymously)
  • 7. 6 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 6 Step 2: Matching with Orbis (1) «Batch search» (automatic) in two consecutive steps:  Companies : list obtained from Taltac2 by exporting name and identification of the sentence  Persons (defendants): list of defendants obtained by Taltac by exporting graphic forms with semantic tagging «defendants» (multiword graphic form with name and surname or surname and name) and date of birth
  • 8. 7 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 7 Step 2: Matching with Orbis (2) RESULTS of the «Batch search» (automatic) in two consecutive steps:  Companies Input : 400 companies of wihich 228 with A score 186 unique companies (due to the presence of the company name in several judgments or the name written by judges with more variations)  Person Input (defendants): 408 defendants (unique, no repetitions) 16 validated records (automatic comparison between date of birth and part of the social security number) + 6 individual companies A Excellent total score >= 95% B Good total score between 85 and 94% The automated process produces a matching score for each record. Our quality indicator uses the following scoring criteria:
  • 9. 8 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 8 Step 3: Information contribution from Orbis: variables with high information potential What data do we add to those already available?  Company status  Business size  Statistical classification of activities  Start year  Budget data  …. BUT ALSO THE NAMES OF THE TOP MANAGEMENT AND OWNERS Again with a view to anonymous treatment, they can be used to identify a network of companies. Not interesting "per se" (we are not a detective agency) but holders of other individual companies and / or family (founded after the outcome of the judgment). NB: the names of the defendants are clear in the source Corte di Cassazione, as it is the last court level.
  • 10. 9 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 9 TalTaC results: The automatic classification of judgments
  • 11. 10 Cluster 1 (n=119) : presence of organized crime Cluster 2 (n=177) : concussion /corruption in the PA cosca pubblico ufficiale associazione mafiosa concussione associazione privato Nome1 costrizione sodalizio corruzione partecipazione induzione conversazione servizio estorsione CP ndrangheta ufficio clan abuso Nome2 prescrizione Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 10 How to interpret clusters First 11 words characterizing the 2 main identified clusters
  • 12. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 11 Not just text mining but help in the interpretation The interaction between the results of the textual analysis and the new information that can be acquired with other databases (administrative or not) is the novelty of the approach that is presented. The questions we would like to answer: Companies present in sentences have characteristics different from those not present? Do the companies, belonging to a cluster and present in the judgments, differ? Example: Different by company size, economic sector, geographical location?
  • 13. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 12 Regions and companies by cluster Region Cluster 1 Reati + org crim Cluster 2 Reati e PA Total # sentenze # imprese # sentenze # imprese # sentenze # imprese Abruzzo 1 1 1 1 Calabria 11 33 1 1 12 34 Campania 6 21 8 13 14 34 Emilia-Romagna 2 7 2 7 Lazio 1 1 5 6 6 7 Liguria 1 1 1 2 2 3 Lombardia 1 13 6 17 7 30 Marche 3 8 3 8 Molise 1 1 1 1 Piemonte 1 10 1 10 Puglia 5 14 5 14 Sardegna 1 1 1 1 Sicilia 5 13 5 11 10 24 Toscana 1 1 4 10 5 11 Veneto 4 17 4 17 Total 26 83 48 119 74 202 Dati provvisori e parziali
  • 14. 13 National legal form Number of companies Consortium + Consortium with external activity 4 Cooperative company ( SCARL + SCARLPA) 4 Joint stock company - SPA 25 Limited liability company - SRL 121 Limited partnership - SAS 2 One-person company with limited liability - SRLU 21 One-person joint stock company - SPA 3 Sole proprietorship 2 n.d. 4 Total 186 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 13 Companies by national legal form Provisional and partial data To be added 22 one- person companies obtained from the list of defendants
  • 15. 14 Status number of companies Active 135 Active (default of payment) 1 Bankruptcy 1 Dissolved 5 Dissolved (bankruptcy) 16 Dissolved (liquidation) 5 Dissolved (merger or take-over) 6 In liquidation 11 Status unknown 6 Totale 186 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 14 Companies by status Provisional and partial data
  • 16. 15 Areas Status Active Others Status unknown Total ITC - Northwest 29 12 1 42 ITH - Northeast 22 12 34 ITI - Centre 33 9 42 ITF - South 26 8 4 38 ITG - Insular Italy 15 4 1 20 (blank) 10 0 10 Total 135 45 6 186 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 15 Companies by Geographical Areas and status Provisional and partial data Others: Active (default of payment) Bankruptcy Dissolved Dissolved (bankruptcy) Dissolved (liquidation) Dissolved (merger or take-over) In liquidation
  • 17. 16 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 16 Discussion The potential sources of data and information are many and each one is organized according to its own purposes. The use for statistical purposes obliges to have to take into account some aspects, sometimes neglected when talking about Big Data or Open Data: • The completeness of the information • The time base of the information acquired or possibly acquired
  • 18. 17 Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 17 Final goal : the «statistical» DataBase The database thus obtained will allow reconstructions and analysis starting from any element (Company, Public Body, persons, period, place, etc) provided that it is correctly identified as such within the texts of the judgments. It is, therefore, necessary to use several tools: Text mining for processing the information contained in the texts of the sentences and transform them into data that can be analysed statistically Validate and integrate this data with other information and data from other administrative databases / records. The greater the completeness and reliability of the other databases, the greater the information value of the statistical analysis carried out on the statistical database.
  • 19. Maria Francesca Romano Institute of Economics & EMbeDS - Scuola Superiore Sant’Anna 18 Credits Un ringraziamento a: Fabrizio Alboni Daniela Arlia Antonella Baldassarini Lorenzo Bartalini Pietro Battiston Sergio Bolasco Alberto di Martino Giuseppe Di Vetta Pasquale Pavone Guido M. Rey