SlideShare a Scribd company logo
1 of 33
Download to read offline
A clustering-based approach to
detect probable outcomes of lawsuits
Undergraduate thesis/final project
Escola de Informática Aplicada - UNIRIO
Author: Daniel Lemes Gribel <daniel.gribel@uniriotec.br>
Comission:
Leonardo G. Azevedo 1,2
(supervisor)
Maíra A. C. Gatti 2
(supervisor)
Adriana C. de F. Alvim 1
Sean W. M. Siqueira 1
1
UNIRIO, 2
IBM Research
December 19, 2014 1
The project idea
IBM Research, 2013: inspired from a Social Media Simulator
(SMSim project) developed to predict Twitter users behavior.
First idea: to model judges behavior and then predict lawsuits
outcomes through multi-agent simulation, as SMSim.
New proposal: develop an approach to suggest possible
outcomes for a given lawsuit based on modelling, similarity
detection and clustering.
2
Project contributions
Results shown that, by analysing past data, was possible to
verify the most likely outcome and to detect its uncertainty
degree.
3
Problem statement
Large amount of unstructured data coming from the numerous
lawsuits ⇒ Large number of hidden or unknown information
★ How do we know which similar lawsuits can be a reference
to a new lawsuit?
★ How do we estimate the time for taking the decisions?
★ How do we estimate a likelihood for the possible emergent
results?
4
The STF and its responsibilities
The Brazilian Supreme Court (STF) is an organism part of the
Brazilian Judiciary System, responsible for the safeguarding
and interpreting of the Constitution. STF decides matters
related to the Constitution or when there is doubt or controversy
regarding legal actions ².
² STF. Institucional. 2011. Available from internet: http://www.stf.jus.br/portal/cms/verTexto.asp?
servico=sobreStfConhecaStfInstitucional
5
STF judgement configuration
Nowadays, STF is constituted by 11 judges, who act in its Panels as
well as in its Plenary.
1. Monocratic: decision taken by a single judge.
2. Collegial: there is a rapporteur (one of them), and each judge
votes individually, prevailing the majority decision.
a. First Panel (Primeira Turma): 5 judges.
b. Second Panel (Segunda Turma): 5 judges.
c. Plenary: 11 judges – currently, there is an open position.
6
Law classes
There are several lawsuit classes in the Brazilian judicial system:
Habeas Corpus, Interlocutory Appeal, Extraordinary Appeal, etc.
In this work, only lawsuits belonging to the Appeal class are
considered *.
* The choice of Appeal class was supported by some conversation with a professor and a student of
Law School in Fundação Getúlio Vargas (FGV).
7
Law classes
Appeal: “the instrument to cause a review of a decision by the
same judicial authority, or other hierarchically higher, in order to
obtain their reform or modification” ³
● +50% of ~1.5M lawsuits judged by STF - which is
important in terms of the heterogeneity of the data.
● Have similar dynamics in their life cycles - which is
important in terms of pattern detection.
³ Moacyr Amaral Santos, professor, lawyer and minister of the Supreme Court.
8
Mental modelling
1. Look for an appeal lawsuit page in the STF website and
identify its meta-data: lawsuit id, period (start and end date),
state of origin, rapporteur, author, defendant, type (area of
Law) and subjects associated to the lawsuit.
2. Identify the summary and the claim of the lawsuit, found in
a document called “Acórdão”.
3. Extract decisions and votes from “Acórdão”.
9
Mental modelling
10
Classification and clustering
Clustering goals 4
:
1. Development of a typology or classification.
2. Investigation of conceptual schemes for grouping entities.
3. Hypothesis generation through data exploration.
4. Hypothesis testing, or the attempt to determine if types
defined through other procedures are in fact present in a
dataset.
4
ALDENDERFER, M. S.; BLASHFIELD, R. K. Cluster Analysis. Beverly Hills: Sage, 1984.
11
Classification and clustering
12
Adapted from WOOYOUNG, K. Parallel Clustering Algorithms: Survey. Available from internet:
http://www.solver.com/hierarchical-clustering-intro
Hierarchical clustering
13
A B C D E
A,B D,E
C,D,E
A,B,C,
D,E
Agglomerative Divisive
tree cut
tree cut
Adapted from Frontline Solvers. Cluster Analysis. Available from internet: http://www.solver.
com/hierarchical-clustering-intro
Hierarchical clustering
+ Advantages:
● Does not require pre-defined
number of clusters.
● Accepts any valid measure of
distance.
● Less influenced by cluster
shapes and less sensitive to
handle clusters with different
densities.
14
- Disadvantages:
● Complexity, which in general
is ≥ O(n²), which makes them
too slow for large datasets.
Ward’s algorithm
Ward’s minimum variance criterion, a particularization of the
Ward general method, the objective function is to minimize the
total within-cluster variance.
As a general result, Ward’s minimum variance method leads to
compact and spherical clusters.
15
Single-linkage algorithm
In Single-linkage clustering, the
objective function is defined by those
two elements (one in each cluster) that
are closest to each other.
16
The shortest of these links causes the fusion of the two
clusters whose elements are involved.
Complete-linkage algorithm
In Complete-linkage clustering, the
objective function is defined by those
two elements (one in each cluster) that
are farthest away from each other.
17
The shortest of these links causes the fusion of the two
clusters whose elements are involved.
Proposed solution
18
Similarity calculation
From the modelled dataset, calculate the similarities between
lawsuits:
1. Each pair of lawsuit receives a similarity coefficient regarding
to a property.
2. Then, a mean (resultant) matrix is obtained from each
property matrix.
Output: Similarity matrix
19
Similarity calculation
Similarity metric - Jaccard index:
20
Mean similarity:
Lawsuits clustering
From the similarities observed, run the hierarchical clustering
algorithm.
Output: lawsuits classified into clusters.
21
Lawsuit instance assigning
From the detected clusters, calculate the similarities between
the new lawsuit instance and the other lawsuits already
classified.
Output: new instance assigned to the most similar cluster.
22
Decisions compilation
Considering a list of judges that will decide the lawsuit:
1. Collect their past votes observed in the cluster.
2. Compute the degree of agreement between them.
For each judge jx
, compare his/her decisions with each decision taken by
another judge composing input, lawsuit by lawsuit.
Ratio no
of commum votes/no
of commum decisions determines the
degree of agreement for each judge.
Output: the likely outcome – a number between 0 and 1,
indicating the probable decision.
23
Datasets
lawsuit_16.csv: 16 lawsuits
decision_16.csv: 24 decisions
Lawsuits: lawsuit id, start/end date of lawsuit, state of origin,
rapporteur, defendant, author, type, subjects, summary and
claim.
Decisions: associated lawsuit id, decision id, type of decision,
date, votes tuple <judge name, vote> and resultant decision.
24
Similarity analysis
25Rapporteur Summary
completely similar
completely different
Similarity analysis
26Mean similarity Mean similarity (Pearson correlation)
completely similar
completely different
Clustering analysis
27
completely similar
completely different
Agglomerative algorithms performances
28
Prediction results
29
Prediction results
30
reveals an…
Optimization
problem!
● The correct choice of the number k of clusters is not trivial, depending on the distribution of
points in a dataset and on the desired clustering resolution.
● Possible approach: define a search space, overvalue a k, and then develop optimization
heuristics to determine a new stopping point (k2
) when the algorithm finds a good solution.
● A stopping point, in this case, could be when the algorithm finds a cluster that is similar
enough to the instance been tested and has difficulties to improve this best rate found.
Main contributions
● By analysing past data, it is possible that other similar cases
were already judged.
● Results shown that was possible to verify the most likely
outcome and to detect the degree of uncertainty of the
outcome.
● Prediction results were satisfied: lawsuit instances were
correctly assigned to clusters and similarity comparison
revealed a good coefficient between lawsuits.
31
Future work
● Use more sophisticated machine learning techniques.
● Investigate a more efficient clustering method than the
hierarchical clustering - consider optimization issues.
● Discriminate decisions by type.
● Develop a better mechanism to find lawsuits properties
weights.
● Have a training and a testing dataset. Then, use evaluation
metrics to check if predictions match real outcomes.
● Investigate stochastic simulation approaches.
32
Code and datasets at bitbucket.org Git repository.
Contact daniel.gribel@uniriotec.br to have access!
Thank you! Questions?
33

More Related Content

Viewers also liked

Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondFrank Kelly
 
Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and ClusteringAnkur Shrivastava
 
QAP: Metodos construtivos, 2-opt, Busca tabu
QAP: Metodos construtivos, 2-opt, Busca tabuQAP: Metodos construtivos, 2-opt, Busca tabu
QAP: Metodos construtivos, 2-opt, Busca tabuDaniel Gribel
 
Cluster spss week7
Cluster spss week7Cluster spss week7
Cluster spss week7Birat Sharma
 
Text clustering
Text clusteringText clustering
Text clusteringKU Leuven
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification Mahmoud Alfarra
 
Academic Plan GFC Presentation March 21, 2011
Academic Plan GFC Presentation March 21, 2011Academic Plan GFC Presentation March 21, 2011
Academic Plan GFC Presentation March 21, 2011universityofalberta
 
comunicacion social y periodismo CUN
comunicacion social y periodismo CUNcomunicacion social y periodismo CUN
comunicacion social y periodismo CUNcesarAmontenegroV
 
How i use internet!
How i use internet!How i use internet!
How i use internet!margietzo
 
愛搜索招商創業
愛搜索招商創業愛搜索招商創業
愛搜索招商創業Rich Shien
 
Uses and gratifications
Uses and gratifications Uses and gratifications
Uses and gratifications sarahlambe
 
Wacom PL in Sport
Wacom PL in SportWacom PL in Sport
Wacom PL in Sporttmccool7
 
Enterprise Mobility Guide 2011 from Sybase, an SAP Company
Enterprise Mobility Guide 2011 from Sybase, an SAP CompanyEnterprise Mobility Guide 2011 from Sybase, an SAP Company
Enterprise Mobility Guide 2011 from Sybase, an SAP CompanySybase, an SAP Company
 
The Mobile Learning Tipping Point
The Mobile Learning Tipping PointThe Mobile Learning Tipping Point
The Mobile Learning Tipping PointAllen Partridge
 
Federal Statutes, Codes, & Regulations: LexisNexis Academic
Federal Statutes, Codes, & Regulations: LexisNexis AcademicFederal Statutes, Codes, & Regulations: LexisNexis Academic
Federal Statutes, Codes, & Regulations: LexisNexis Academicstaffordlibrary
 
Federal & State Cases: LexisNexis
Federal & State Cases: LexisNexisFederal & State Cases: LexisNexis
Federal & State Cases: LexisNexisstaffordlibrary
 
Introduction to websites
Introduction to websitesIntroduction to websites
Introduction to websitesUCTI
 

Viewers also liked (20)

Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyond
 
Document Classification and Clustering
Document Classification and ClusteringDocument Classification and Clustering
Document Classification and Clustering
 
QAP: Metodos construtivos, 2-opt, Busca tabu
QAP: Metodos construtivos, 2-opt, Busca tabuQAP: Metodos construtivos, 2-opt, Busca tabu
QAP: Metodos construtivos, 2-opt, Busca tabu
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Cluster spss week7
Cluster spss week7Cluster spss week7
Cluster spss week7
 
Text clustering
Text clusteringText clustering
Text clustering
 
Document clustering and classification
Document clustering and classification Document clustering and classification
Document clustering and classification
 
Academic Plan GFC Presentation March 21, 2011
Academic Plan GFC Presentation March 21, 2011Academic Plan GFC Presentation March 21, 2011
Academic Plan GFC Presentation March 21, 2011
 
comunicacion social y periodismo CUN
comunicacion social y periodismo CUNcomunicacion social y periodismo CUN
comunicacion social y periodismo CUN
 
How i use internet!
How i use internet!How i use internet!
How i use internet!
 
愛搜索招商創業
愛搜索招商創業愛搜索招商創業
愛搜索招商創業
 
Uses and gratifications
Uses and gratifications Uses and gratifications
Uses and gratifications
 
N5 v
N5 vN5 v
N5 v
 
Wacom PL in Sport
Wacom PL in SportWacom PL in Sport
Wacom PL in Sport
 
PRESENT PERFECT
PRESENT PERFECTPRESENT PERFECT
PRESENT PERFECT
 
Enterprise Mobility Guide 2011 from Sybase, an SAP Company
Enterprise Mobility Guide 2011 from Sybase, an SAP CompanyEnterprise Mobility Guide 2011 from Sybase, an SAP Company
Enterprise Mobility Guide 2011 from Sybase, an SAP Company
 
The Mobile Learning Tipping Point
The Mobile Learning Tipping PointThe Mobile Learning Tipping Point
The Mobile Learning Tipping Point
 
Federal Statutes, Codes, & Regulations: LexisNexis Academic
Federal Statutes, Codes, & Regulations: LexisNexis AcademicFederal Statutes, Codes, & Regulations: LexisNexis Academic
Federal Statutes, Codes, & Regulations: LexisNexis Academic
 
Federal & State Cases: LexisNexis
Federal & State Cases: LexisNexisFederal & State Cases: LexisNexis
Federal & State Cases: LexisNexis
 
Introduction to websites
Introduction to websitesIntroduction to websites
Introduction to websites
 

Similar to A clustering-based approach to detect probable outcomes of lawsuits

An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396Nida Rashid
 
An Algorithm Analysis on Data Mining
An Algorithm Analysis on Data MiningAn Algorithm Analysis on Data Mining
An Algorithm Analysis on Data Miningpaperpublications3
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.pptbutest
 
10 Algorithms in data mining
10 Algorithms in data mining10 Algorithms in data mining
10 Algorithms in data miningGeorge Ang
 
Undergraduated Thesis
Undergraduated ThesisUndergraduated Thesis
Undergraduated ThesisVictor Li
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection IJSTA
 
Multilevel techniques for the clustering problem
Multilevel techniques for the clustering problemMultilevel techniques for the clustering problem
Multilevel techniques for the clustering problemcsandit
 
Document clustering for forensic analysis an approach for improving computer ...
Document clustering for forensic analysis an approach for improving computer ...Document clustering for forensic analysis an approach for improving computer ...
Document clustering for forensic analysis an approach for improving computer ...JPINFOTECH JAYAPRAKASH
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)Pratik Meshram
 
How To Begin An Ap English Essay. Online assignment writing service.
How To Begin An Ap English Essay. Online assignment writing service.How To Begin An Ap English Essay. Online assignment writing service.
How To Begin An Ap English Essay. Online assignment writing service.Kathleen Ward
 
soft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxsoft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxnaveen356604
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectValerii Klymchuk
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
 

Similar to A clustering-based approach to detect probable outcomes of lawsuits (20)

ICBAI Paper (1)
ICBAI Paper (1)ICBAI Paper (1)
ICBAI Paper (1)
 
Purely Procedural Preferences - Beyond Procedural Equity and Reciprocity
Purely Procedural Preferences - Beyond Procedural Equity and ReciprocityPurely Procedural Preferences - Beyond Procedural Equity and Reciprocity
Purely Procedural Preferences - Beyond Procedural Equity and Reciprocity
 
An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396An Algorithm Analysis on Data Mining-396
An Algorithm Analysis on Data Mining-396
 
An Algorithm Analysis on Data Mining
An Algorithm Analysis on Data MiningAn Algorithm Analysis on Data Mining
An Algorithm Analysis on Data Mining
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Final proj 2 (1)
Final proj 2 (1)Final proj 2 (1)
Final proj 2 (1)
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
 
10 Algorithms in data mining
10 Algorithms in data mining10 Algorithms in data mining
10 Algorithms in data mining
 
Undergraduated Thesis
Undergraduated ThesisUndergraduated Thesis
Undergraduated Thesis
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
3 classification
3  classification3  classification
3 classification
 
Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection
 
Dont vote evolve
Dont vote evolveDont vote evolve
Dont vote evolve
 
Multilevel techniques for the clustering problem
Multilevel techniques for the clustering problemMultilevel techniques for the clustering problem
Multilevel techniques for the clustering problem
 
Document clustering for forensic analysis an approach for improving computer ...
Document clustering for forensic analysis an approach for improving computer ...Document clustering for forensic analysis an approach for improving computer ...
Document clustering for forensic analysis an approach for improving computer ...
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
 
How To Begin An Ap English Essay. Online assignment writing service.
How To Begin An Ap English Essay. Online assignment writing service.How To Begin An Ap English Essay. Online assignment writing service.
How To Begin An Ap English Essay. Online assignment writing service.
 
soft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxsoft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptx
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support Project
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

A clustering-based approach to detect probable outcomes of lawsuits

  • 1. A clustering-based approach to detect probable outcomes of lawsuits Undergraduate thesis/final project Escola de Informática Aplicada - UNIRIO Author: Daniel Lemes Gribel <daniel.gribel@uniriotec.br> Comission: Leonardo G. Azevedo 1,2 (supervisor) Maíra A. C. Gatti 2 (supervisor) Adriana C. de F. Alvim 1 Sean W. M. Siqueira 1 1 UNIRIO, 2 IBM Research December 19, 2014 1
  • 2. The project idea IBM Research, 2013: inspired from a Social Media Simulator (SMSim project) developed to predict Twitter users behavior. First idea: to model judges behavior and then predict lawsuits outcomes through multi-agent simulation, as SMSim. New proposal: develop an approach to suggest possible outcomes for a given lawsuit based on modelling, similarity detection and clustering. 2
  • 3. Project contributions Results shown that, by analysing past data, was possible to verify the most likely outcome and to detect its uncertainty degree. 3
  • 4. Problem statement Large amount of unstructured data coming from the numerous lawsuits ⇒ Large number of hidden or unknown information ★ How do we know which similar lawsuits can be a reference to a new lawsuit? ★ How do we estimate the time for taking the decisions? ★ How do we estimate a likelihood for the possible emergent results? 4
  • 5. The STF and its responsibilities The Brazilian Supreme Court (STF) is an organism part of the Brazilian Judiciary System, responsible for the safeguarding and interpreting of the Constitution. STF decides matters related to the Constitution or when there is doubt or controversy regarding legal actions ². ² STF. Institucional. 2011. Available from internet: http://www.stf.jus.br/portal/cms/verTexto.asp? servico=sobreStfConhecaStfInstitucional 5
  • 6. STF judgement configuration Nowadays, STF is constituted by 11 judges, who act in its Panels as well as in its Plenary. 1. Monocratic: decision taken by a single judge. 2. Collegial: there is a rapporteur (one of them), and each judge votes individually, prevailing the majority decision. a. First Panel (Primeira Turma): 5 judges. b. Second Panel (Segunda Turma): 5 judges. c. Plenary: 11 judges – currently, there is an open position. 6
  • 7. Law classes There are several lawsuit classes in the Brazilian judicial system: Habeas Corpus, Interlocutory Appeal, Extraordinary Appeal, etc. In this work, only lawsuits belonging to the Appeal class are considered *. * The choice of Appeal class was supported by some conversation with a professor and a student of Law School in Fundação Getúlio Vargas (FGV). 7
  • 8. Law classes Appeal: “the instrument to cause a review of a decision by the same judicial authority, or other hierarchically higher, in order to obtain their reform or modification” ³ ● +50% of ~1.5M lawsuits judged by STF - which is important in terms of the heterogeneity of the data. ● Have similar dynamics in their life cycles - which is important in terms of pattern detection. ³ Moacyr Amaral Santos, professor, lawyer and minister of the Supreme Court. 8
  • 9. Mental modelling 1. Look for an appeal lawsuit page in the STF website and identify its meta-data: lawsuit id, period (start and end date), state of origin, rapporteur, author, defendant, type (area of Law) and subjects associated to the lawsuit. 2. Identify the summary and the claim of the lawsuit, found in a document called “Acórdão”. 3. Extract decisions and votes from “Acórdão”. 9
  • 11. Classification and clustering Clustering goals 4 : 1. Development of a typology or classification. 2. Investigation of conceptual schemes for grouping entities. 3. Hypothesis generation through data exploration. 4. Hypothesis testing, or the attempt to determine if types defined through other procedures are in fact present in a dataset. 4 ALDENDERFER, M. S.; BLASHFIELD, R. K. Cluster Analysis. Beverly Hills: Sage, 1984. 11
  • 12. Classification and clustering 12 Adapted from WOOYOUNG, K. Parallel Clustering Algorithms: Survey. Available from internet: http://www.solver.com/hierarchical-clustering-intro
  • 13. Hierarchical clustering 13 A B C D E A,B D,E C,D,E A,B,C, D,E Agglomerative Divisive tree cut tree cut Adapted from Frontline Solvers. Cluster Analysis. Available from internet: http://www.solver. com/hierarchical-clustering-intro
  • 14. Hierarchical clustering + Advantages: ● Does not require pre-defined number of clusters. ● Accepts any valid measure of distance. ● Less influenced by cluster shapes and less sensitive to handle clusters with different densities. 14 - Disadvantages: ● Complexity, which in general is ≥ O(n²), which makes them too slow for large datasets.
  • 15. Ward’s algorithm Ward’s minimum variance criterion, a particularization of the Ward general method, the objective function is to minimize the total within-cluster variance. As a general result, Ward’s minimum variance method leads to compact and spherical clusters. 15
  • 16. Single-linkage algorithm In Single-linkage clustering, the objective function is defined by those two elements (one in each cluster) that are closest to each other. 16 The shortest of these links causes the fusion of the two clusters whose elements are involved.
  • 17. Complete-linkage algorithm In Complete-linkage clustering, the objective function is defined by those two elements (one in each cluster) that are farthest away from each other. 17 The shortest of these links causes the fusion of the two clusters whose elements are involved.
  • 19. Similarity calculation From the modelled dataset, calculate the similarities between lawsuits: 1. Each pair of lawsuit receives a similarity coefficient regarding to a property. 2. Then, a mean (resultant) matrix is obtained from each property matrix. Output: Similarity matrix 19
  • 20. Similarity calculation Similarity metric - Jaccard index: 20 Mean similarity:
  • 21. Lawsuits clustering From the similarities observed, run the hierarchical clustering algorithm. Output: lawsuits classified into clusters. 21
  • 22. Lawsuit instance assigning From the detected clusters, calculate the similarities between the new lawsuit instance and the other lawsuits already classified. Output: new instance assigned to the most similar cluster. 22
  • 23. Decisions compilation Considering a list of judges that will decide the lawsuit: 1. Collect their past votes observed in the cluster. 2. Compute the degree of agreement between them. For each judge jx , compare his/her decisions with each decision taken by another judge composing input, lawsuit by lawsuit. Ratio no of commum votes/no of commum decisions determines the degree of agreement for each judge. Output: the likely outcome – a number between 0 and 1, indicating the probable decision. 23
  • 24. Datasets lawsuit_16.csv: 16 lawsuits decision_16.csv: 24 decisions Lawsuits: lawsuit id, start/end date of lawsuit, state of origin, rapporteur, defendant, author, type, subjects, summary and claim. Decisions: associated lawsuit id, decision id, type of decision, date, votes tuple <judge name, vote> and resultant decision. 24
  • 26. Similarity analysis 26Mean similarity Mean similarity (Pearson correlation) completely similar completely different
  • 30. Prediction results 30 reveals an… Optimization problem! ● The correct choice of the number k of clusters is not trivial, depending on the distribution of points in a dataset and on the desired clustering resolution. ● Possible approach: define a search space, overvalue a k, and then develop optimization heuristics to determine a new stopping point (k2 ) when the algorithm finds a good solution. ● A stopping point, in this case, could be when the algorithm finds a cluster that is similar enough to the instance been tested and has difficulties to improve this best rate found.
  • 31. Main contributions ● By analysing past data, it is possible that other similar cases were already judged. ● Results shown that was possible to verify the most likely outcome and to detect the degree of uncertainty of the outcome. ● Prediction results were satisfied: lawsuit instances were correctly assigned to clusters and similarity comparison revealed a good coefficient between lawsuits. 31
  • 32. Future work ● Use more sophisticated machine learning techniques. ● Investigate a more efficient clustering method than the hierarchical clustering - consider optimization issues. ● Discriminate decisions by type. ● Develop a better mechanism to find lawsuits properties weights. ● Have a training and a testing dataset. Then, use evaluation metrics to check if predictions match real outcomes. ● Investigate stochastic simulation approaches. 32
  • 33. Code and datasets at bitbucket.org Git repository. Contact daniel.gribel@uniriotec.br to have access! Thank you! Questions? 33