SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
NLP applied to French legal
decisions
Demonstration of a significant bias of some French court
of appeal judges in decisions about the rights of asylum.
Feb 17th, 2016
www.supralegem.fr
Team
● Anthony Sypniewski @ Google (NYC)
○ Software Engineer, Data Scientist
○ Supra Legem: Dev the website (back end, front end)
● Michaël Benesty @ Deloitte (Paris) [this is me]
○ Tax law associate, former CPA/Financial auditor
○ XGBoost + FeatureHashing R packages co-author, DMLC member
○ Supra Legem: Dev ETL & machine learning
None of the team member employers is linked in any way to this personal project.
Opinions expressed here are team’s only.
Plan
1. Brief overview of the French legal system
2. The intuition behind word2vec
3. How we designed our fancy neural network
4. Presentation of the dataset and our bias analysis result
French legal system… in 1 schema
ORDINARY COURTS ADMINISTRATIVE
COURTSCIVIL LAW CRIME LAW
Cour de cassation : chambers
SupremeCourts1stDEGREE
Labour Commercial 3 Civil chambers Criminal
2ndDEGREE
Cour d’appel : chambers
Labour Commercial Civil Criminal
Cour d’
assises
Tribunal de
Commerce
Tribunal de Grande
Instance
Tribunal
Correctionnel
Cour d’
assises
Conseil de Prud’hommes Tribunal d’Instance
Tribunal de
Police
Juge de proximite
Conseil d’Etat
Litigation division
Cour administrative d’appel
Tribunal administratif
A simple asylum
seeker journey
(or not)
When an asylum seeker (or any
other undocumented people)
receives a deportation order
(dark red boxes), one of the
options is to ask an
administrative court judge to
cancel it.Those judge decisions
are the one we will analyze in the
next slides.
For the most intrepid, a readable version of this
schema is available on the Senate website
Basic intuition behind word2vec : feed forward (1/3)
Famous algorithm which assigns similar vectors to similar words from a corpus (what means similar?). Below is a
theoric simplified version of word2vec which corresponds to 1 context word with Continuous Bag of Words (C-BOW).
Task is to predict a word from its context.
Ex.: The court of appeal judge is Mr. Toto.
Based on the distributional hypothesis (Harris, 1954): you shall know a word by the company it keeps (Firth, 1957)
● Input layer: 1 hot encoded context word (indice in dictionary)
● W: context dense word matrice (whole dic)
● Hidden layer: context word dense vector
● W’: output dense word matrice (whole dic)
● Output layer: P(output | context word)
Basic intuition behind word2vec : feed forward (2/3)
Objective :
Loss : minimize E with
Back propagation :
With :
can be interpreted as the prediction error
Continue back propagation
Output vector update :
Continue back propagation
Parameter update
Basic intuition behind word2vec : feed forward (3/3)
Possible intuitions* from previous slide (during back propagation):
● e is the error rate;
● Output Word vector (w) tends to look like its context vector (w’);
● Context word will be adjusted by the sum of the output vectors (w’) weighted by the prediction error
(e). Therefore, if a low probability is given to the word to predict, the adjustment will make context
vector looks like more that output vector. In some way the context vector will be closer to it (cosine
distance). Vice versa is true for high prediction for the wrong word.
● Combined, words sharing same distribution of contexts finish with similar vectors
Some interesting readings about Word2Vec may include:
● word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method (2014,
Yoav Goldberg, Omer Levy)
● Neural word embedding as implicit matrix factorization (2014, Yoav Goldberg, Omer Levy)
*: this is a bad and partial summary from word2vec Parameter Learning Explained (2016, Xin Rong)
Recurrent Neural Network (RNN) structure is not that far
from a shallow neural networks
● U matrix is initialized with vectors learned by word2vec
● x is just a 1 hot encoded vector of the word indice in the whole corpus dictionary
● The model is basically stored in W
● W is shared among steps -> RNN is like deep learning with shared weights.
● Because of noise on long sequences, vanishing / exploding gradient issue is exacerbated
Gated Recurrent Unit (GRU) improves long dependencies
GRU has 2 gates:
● Reset gate r: determines how to combine the new input with the
previous memory
● Update gate z: how much of the previous memory to keep around
Unlike RNN, GRU can choose what to learn from the present and what to
throw away from the past.
Advantages:
● Important features are protected against being overwritten
● During back propagation, there are shortcut paths that bypass
multiple temporal steps -> avoid vanishing gradients as a result of
passing through multiple bounded nonlinearities
Generalizing the model by multi-input and multi task learning
Why multi input?
● Splitting the text in different parts for long
documents (in a way which makes sense)
○ Possible to have parameters specific to
each part (if parts are very different)
○ Split text to avoid long distance
dependency issue
● Provide non time-series data to help
understanding text document
○ Categorical data related to document
context
Why multitask?
● Better model generalization as each task
has its own bias
○ Kind of ensemble approach, but during the
learning process
● One gradient descent process per task: it
is like* having a bigger dataset
*: Ok this is not the same thing but still, multitasking may
provide better improvement than just increasing epoch
numbers.
What the final model looks like
Text 1
Text 2
Merge
+ Dropout
Merge
+ Dropout
Softmax Task 1
Softmax Task 2
Softmax Task 3
Context
Dense
+ PReLU
+ Dropout
* 3
● Task 1: learn category of the applicant
● Task 2: learn category of the defendant
● Task 3: learn category of the decision
solution
● Text 1: claims from the applicant
● Text 2: solution of the decision
● Context : categorical information about the
decision (court, judge name, thema, …)
Text 1 and Text 2 are extracted from full decision by another learning.
Bidirectional GRU
Bidirectional GRU
Results
The dataset is probably quite easy to
get for a model: the vocabulary is
stable, there is no irony or double
negations and text formula and
structure are reused among decisions.
Fancy model helps too. Simple GRU on
task 2 only with default value (from
Keras) and no use of word2vec gave
slightly more than 80% accuracy.
Task 3 is the easiest task as the
structure of Text 2 is very stable. On
the other side there are 11 categories
which is higher than Task 1 & 2.
Learning on: training: 70%, validation: 10%, test: 20%
Task Accuracy
Accuracy after
binarization
Task 1
(multiclass 6 cat)
0.971 0.978
Task 2
(multiclass 6 cat)
0.918 0.946
Task 3
(multiclass 11 cat)
0.945 Not performed
Binarization means that the task 1 and 2 have been recasted to a
binary classification: the applicant (or defendant) may be a private
person or the administration. Basically there were 5 categories of
administration, and they have been merged to one.
Same model is used in both cases. Therefore all classifications
errors between the 5 administration categories disappear
mechanically. It explains entirely the accuracy improvement.
Legal decisions from administrative courts of appeal are *partially* available in open
data
● Open data decisions are provided by @Etalab there
● Supralegem.fr website covers [2000-2015], it represents 250K decisions
● On 2012-2015
○ ⅓ to ⅔ of the decisions issued by each appeal court per year is available
○ Court of appeal judge names are provided* for > 98% of the decisions (before 2012 < 95%)
● Analysis periods:
○ per judge: [2012-2015] because before too many judge names are missing
○ per court: [2009-2015] because before data are missing for some courts
● Important questions about open data decisions:
○ How are they selected?
○ Who selects them?
○ Why a part of the decision are not distributed?
*: judge names are included in the open data distributed and is not learned.
Basic statistics
about asylum
rejection rate per
court
Decision selection criteria:
● from a court of appeal
● marked as asylum category
● contain “quitter le territoire” &
“étranger” & “asile”
● applicant: natural person,
defendant: administration
4 courts have a rejection rate increasing, 3 are
stable, 1 is decreasing. Seems to match the
political context.
Is there a bias with
some judges*
regarding asylum?
● Rejection rates of deportation
order per judge/year
● Selected 3 highest & lowest rates
from top 20% in quantity of
decisions [2012-2015]
Case documents are never public but would be
required to have a deep analysis of this apparent
bias.
Not shown here: judges from the same court may
have very different reject rates in asylum. However
there may be some (unknown) good reasons to
explain these gaps.
On the right, tweets from a judge assistant about
the practice of some judges of systematically
pushing to refuse canceling deportation orders
(before hearing the case). “OQTF” means Ordre de
Quitter le Territoire Français. Storify link
*: from administrative courts of appeal
Judge
Adm.
court of
appeal
%
2012
%
2013
%
2014
%
2015
Decision
quantity
[12-15]
Guerrive Marseille 78 41 43 60 453
Cherrier Marseille NA NA 67 63 233
Krulic Paris NA NA 60 73 198
Tandonnet
Turot
Paris 90 97 98 100* 227
Pellissier Nancy NA 93 92 96 304
Helmholtz Versailles NA 93 92 91 201
*: Tandonnet Turot 2015 rate corresponds to few decisions and is not
significant.
The (im)possible legal consequences of a bias
● Article 6.1 of the European Convention on Human Rights (ECHR)
In the determination of his civil rights and obligations or of any criminal charge against him,
everyone is entitled to a fair and public hearing within a reasonable time by an independent and
impartial tribunal established by law…
● ECHR, Remli Vs France, 23/04/96: France is convicted for subjective partiality (racism of a juror)
● Article L721-1 of administrative justice code: if one has doubts about impartiality of a judge, one
can ask for its recusal
● Interesting report from French private law Supreme court about judge impartiality
Truth is that there is (very) little chance that a French judge recognizes a bias from statistics related to
a colleague
SupraLegem.fr
The result of this work is provided to
everyone on a dedicated website.
Tags learned are provided as filters.
A viz of the results is generated on
each search to show patterns, if any.
As of Feb 17th, 2016, 2015 is not
complete, but this should be fixed in
coming days.
Thanks!
Contact us
contact@supralegem.fr
www.supralegem.fr

Contenu connexe

Tendances

Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedJonathan Mugan
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionBigML, Inc
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBigML, Inc
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learningbutest
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBigML, Inc
 
Lesson 3 ai in the enterprise
Lesson 3   ai in the enterpriseLesson 3   ai in the enterprise
Lesson 3 ai in the enterpriseankit_ppt
 
Supervised learning
Supervised learningSupervised learning
Supervised learningankit_ppt
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBigML, Inc
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Sri Ambati
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learningKnoldus Inc.
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities台灣資料科學年會
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBhaskar Mitra
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
ML Interpretability Inside Out
ML Interpretability Inside OutML Interpretability Inside Out
ML Interpretability Inside OutMara Graziani
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE cscpconf
 
Result analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataResult analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataijistjournal
 

Tendances (20)

Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly Detection
 
BSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic ModelingBSSML16 L4. Association Discovery and Topic Modeling
BSSML16 L4. Association Discovery and Topic Modeling
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly Detection
 
Lesson 3 ai in the enterprise
Lesson 3   ai in the enterpriseLesson 3   ai in the enterprise
Lesson 3 ai in the enterprise
 
ML Basics
ML BasicsML Basics
ML Basics
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
ML Interpretability Inside Out
ML Interpretability Inside OutML Interpretability Inside Out
ML Interpretability Inside Out
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE
 
Result analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataResult analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted data
 

En vedette

The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...Daniel Katz
 
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...Daniel Katz
 
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...Daniel Katz
 
Artificial Intelligence and Law - 
A Primer
Artificial Intelligence and Law - 
A Primer Artificial Intelligence and Law - 
A Primer
Artificial Intelligence and Law - 
A Primer Daniel Katz
 
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)Amazon Web Services
 
Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz +...
Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz +...Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz +...
Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz +...Daniel Katz
 
Predictive Coding and E-Discovery in 2015 and Beyond - LegalTechNYC 2013 ( Da...
Predictive Coding and E-Discovery in 2015 and Beyond - LegalTechNYC 2013 ( Da...Predictive Coding and E-Discovery in 2015 and Beyond - LegalTechNYC 2013 ( Da...
Predictive Coding and E-Discovery in 2015 and Beyond - LegalTechNYC 2013 ( Da...Daniel Katz
 
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)Daniel Katz
 
Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...
Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...
Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...Daniel Katz
 
{Law, Tech, Design, Delivery} Observations Regarding Innovation in the Legal ...
{Law, Tech, Design, Delivery} Observations Regarding Innovation in the Legal ...{Law, Tech, Design, Delivery} Observations Regarding Innovation in the Legal ...
{Law, Tech, Design, Delivery} Observations Regarding Innovation in the Legal ...Daniel Katz
 
Predicting judicial decisions of the European Court of Human Rights: a Natura...
Predicting judicial decisions of the European Court of Human Rights: a Natura...Predicting judicial decisions of the European Court of Human Rights: a Natura...
Predicting judicial decisions of the European Court of Human Rights: a Natura...Nikolaos Aletras
 
Sinks Method Paper Presentation @ Duke Political Networks Conference 2010
Sinks Method Paper Presentation @ Duke Political Networks Conference 2010Sinks Method Paper Presentation @ Duke Political Networks Conference 2010
Sinks Method Paper Presentation @ Duke Political Networks Conference 2010Daniel Katz
 
Technology, Data and Computation Session @ The World Bank - Law, Justice, and...
Technology, Data and Computation Session @ The World Bank - Law, Justice, and...Technology, Data and Computation Session @ The World Bank - Law, Justice, and...
Technology, Data and Computation Session @ The World Bank - Law, Justice, and...Daniel Katz
 
What is Computational Legal Studies? Presentation @ University of Houston - ...
What is Computational Legal Studies?  Presentation @ University of Houston - ...What is Computational Legal Studies?  Presentation @ University of Houston - ...
What is Computational Legal Studies? Presentation @ University of Houston - ...Daniel Katz
 
Social Media Made Simple
Social Media Made Simple Social Media Made Simple
Social Media Made Simple weatrust
 
Bommarito Presentation for University of Houston Computational Law Conference
Bommarito Presentation for University of Houston Computational Law ConferenceBommarito Presentation for University of Houston Computational Law Conference
Bommarito Presentation for University of Houston Computational Law Conferencemjbommar
 
Link Your Personal Profile to Your LinkedIn Business Page
Link Your Personal Profile to Your LinkedIn Business PageLink Your Personal Profile to Your LinkedIn Business Page
Link Your Personal Profile to Your LinkedIn Business PageImpression Marketing
 
Measure Twice, Cut Once - Solving the Legal Profession Biggest Challenges Tog...
Measure Twice, Cut Once - Solving the Legal Profession Biggest Challenges Tog...Measure Twice, Cut Once - Solving the Legal Profession Biggest Challenges Tog...
Measure Twice, Cut Once - Solving the Legal Profession Biggest Challenges Tog...Daniel Katz
 
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...Daniel Katz
 

En vedette (20)

The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
 
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...
 
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...
 
Artificial Intelligence and Law - 
A Primer
Artificial Intelligence and Law - 
A Primer Artificial Intelligence and Law - 
A Primer
Artificial Intelligence and Law - 
A Primer
 
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)
 
Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz +...
Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz +...Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz +...
Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz +...
 
Predictive Coding and E-Discovery in 2015 and Beyond - LegalTechNYC 2013 ( Da...
Predictive Coding and E-Discovery in 2015 and Beyond - LegalTechNYC 2013 ( Da...Predictive Coding and E-Discovery in 2015 and Beyond - LegalTechNYC 2013 ( Da...
Predictive Coding and E-Discovery in 2015 and Beyond - LegalTechNYC 2013 ( Da...
 
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
 
Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...
Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...
Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...
 
{Law, Tech, Design, Delivery} Observations Regarding Innovation in the Legal ...
{Law, Tech, Design, Delivery} Observations Regarding Innovation in the Legal ...{Law, Tech, Design, Delivery} Observations Regarding Innovation in the Legal ...
{Law, Tech, Design, Delivery} Observations Regarding Innovation in the Legal ...
 
Foto cv
Foto cvFoto cv
Foto cv
 
Predicting judicial decisions of the European Court of Human Rights: a Natura...
Predicting judicial decisions of the European Court of Human Rights: a Natura...Predicting judicial decisions of the European Court of Human Rights: a Natura...
Predicting judicial decisions of the European Court of Human Rights: a Natura...
 
Sinks Method Paper Presentation @ Duke Political Networks Conference 2010
Sinks Method Paper Presentation @ Duke Political Networks Conference 2010Sinks Method Paper Presentation @ Duke Political Networks Conference 2010
Sinks Method Paper Presentation @ Duke Political Networks Conference 2010
 
Technology, Data and Computation Session @ The World Bank - Law, Justice, and...
Technology, Data and Computation Session @ The World Bank - Law, Justice, and...Technology, Data and Computation Session @ The World Bank - Law, Justice, and...
Technology, Data and Computation Session @ The World Bank - Law, Justice, and...
 
What is Computational Legal Studies? Presentation @ University of Houston - ...
What is Computational Legal Studies?  Presentation @ University of Houston - ...What is Computational Legal Studies?  Presentation @ University of Houston - ...
What is Computational Legal Studies? Presentation @ University of Houston - ...
 
Social Media Made Simple
Social Media Made Simple Social Media Made Simple
Social Media Made Simple
 
Bommarito Presentation for University of Houston Computational Law Conference
Bommarito Presentation for University of Houston Computational Law ConferenceBommarito Presentation for University of Houston Computational Law Conference
Bommarito Presentation for University of Houston Computational Law Conference
 
Link Your Personal Profile to Your LinkedIn Business Page
Link Your Personal Profile to Your LinkedIn Business PageLink Your Personal Profile to Your LinkedIn Business Page
Link Your Personal Profile to Your LinkedIn Business Page
 
Measure Twice, Cut Once - Solving the Legal Profession Biggest Challenges Tog...
Measure Twice, Cut Once - Solving the Legal Profession Biggest Challenges Tog...Measure Twice, Cut Once - Solving the Legal Profession Biggest Challenges Tog...
Measure Twice, Cut Once - Solving the Legal Profession Biggest Challenges Tog...
 
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
 

Similaire à NLP applied to French legal decisions

#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentationparlamind
 
ECT114 Week3 Homework Essay
ECT114 Week3 Homework EssayECT114 Week3 Homework Essay
ECT114 Week3 Homework EssayKelly Ratkovic
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance957671457
 
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022Kwanghee Choi
 
Fault Tolerant Leader Election in Distributed Systems
Fault Tolerant Leader Election in Distributed SystemsFault Tolerant Leader Election in Distributed Systems
Fault Tolerant Leader Election in Distributed SystemsAIRCC Publishing Corporation
 
Fault Tolerant Leader Election in Distributed Systems
Fault Tolerant Leader Election in Distributed SystemsFault Tolerant Leader Election in Distributed Systems
Fault Tolerant Leader Election in Distributed SystemsAIRCC Publishing Corporation
 
FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS
FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMSFAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS
FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMSijcseit
 
Understanding RFID Counting Protocols.ppt
Understanding RFID Counting Protocols.pptUnderstanding RFID Counting Protocols.ppt
Understanding RFID Counting Protocols.pptnovrain1
 
A Tool For Helping Teach A Programming Method
A Tool For Helping Teach A Programming MethodA Tool For Helping Teach A Programming Method
A Tool For Helping Teach A Programming MethodTina Gabel
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Miningijcsit
 
Chapter 11c coordination agreement
Chapter 11c coordination agreementChapter 11c coordination agreement
Chapter 11c coordination agreementAbDul ThaYyal
 
Supreme court dialogue classification using machine learning models
Supreme court dialogue classification using machine learning models Supreme court dialogue classification using machine learning models
Supreme court dialogue classification using machine learning models IJECEIAES
 
Trustless off chain computing on the blockchain
Trustless off chain computing on the blockchainTrustless off chain computing on the blockchain
Trustless off chain computing on the blockchainEspeo Software
 
Submit by 21918Phase IProject SelectionThe first step w.docx
Submit by 21918Phase IProject SelectionThe first step w.docxSubmit by 21918Phase IProject SelectionThe first step w.docx
Submit by 21918Phase IProject SelectionThe first step w.docxpicklesvalery
 
Recognize, assess, reduce, and manage technical debt
Recognize, assess, reduce, and manage technical debtRecognize, assess, reduce, and manage technical debt
Recognize, assess, reduce, and manage technical debtJim Bethancourt
 
Workplace Violence Analysis
Workplace Violence AnalysisWorkplace Violence Analysis
Workplace Violence AnalysisAmy Holmes
 

Similaire à NLP applied to French legal decisions (20)

#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation
 
ECT114 Week3 Homework Essay
ECT114 Week3 Homework EssayECT114 Week3 Homework Essay
ECT114 Week3 Homework Essay
 
BloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for FinanceBloombergGPT.pdfA Large Language Model for Finance
BloombergGPT.pdfA Large Language Model for Finance
 
Nt1330 Unit 4 Dthm Paper
Nt1330 Unit 4 Dthm PaperNt1330 Unit 4 Dthm Paper
Nt1330 Unit 4 Dthm Paper
 
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
Decision CAMP 2014 - Benjamin Grosof Janine Bloomfield - Explanation-based E-...
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
Fault Tolerant Leader Election in Distributed Systems
Fault Tolerant Leader Election in Distributed SystemsFault Tolerant Leader Election in Distributed Systems
Fault Tolerant Leader Election in Distributed Systems
 
Fault Tolerant Leader Election in Distributed Systems
Fault Tolerant Leader Election in Distributed SystemsFault Tolerant Leader Election in Distributed Systems
Fault Tolerant Leader Election in Distributed Systems
 
FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS
FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMSFAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS
FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS
 
Essay On Moving Truck
Essay On Moving TruckEssay On Moving Truck
Essay On Moving Truck
 
Understanding RFID Counting Protocols.ppt
Understanding RFID Counting Protocols.pptUnderstanding RFID Counting Protocols.ppt
Understanding RFID Counting Protocols.ppt
 
dss
dssdss
dss
 
A Tool For Helping Teach A Programming Method
A Tool For Helping Teach A Programming MethodA Tool For Helping Teach A Programming Method
A Tool For Helping Teach A Programming Method
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Mining
 
Chapter 11c coordination agreement
Chapter 11c coordination agreementChapter 11c coordination agreement
Chapter 11c coordination agreement
 
Supreme court dialogue classification using machine learning models
Supreme court dialogue classification using machine learning models Supreme court dialogue classification using machine learning models
Supreme court dialogue classification using machine learning models
 
Trustless off chain computing on the blockchain
Trustless off chain computing on the blockchainTrustless off chain computing on the blockchain
Trustless off chain computing on the blockchain
 
Submit by 21918Phase IProject SelectionThe first step w.docx
Submit by 21918Phase IProject SelectionThe first step w.docxSubmit by 21918Phase IProject SelectionThe first step w.docx
Submit by 21918Phase IProject SelectionThe first step w.docx
 
Recognize, assess, reduce, and manage technical debt
Recognize, assess, reduce, and manage technical debtRecognize, assess, reduce, and manage technical debt
Recognize, assess, reduce, and manage technical debt
 
Workplace Violence Analysis
Workplace Violence AnalysisWorkplace Violence Analysis
Workplace Violence Analysis
 

Dernier

SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 

Dernier (17)

SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 

NLP applied to French legal decisions

  • 1. NLP applied to French legal decisions Demonstration of a significant bias of some French court of appeal judges in decisions about the rights of asylum. Feb 17th, 2016 www.supralegem.fr
  • 2. Team ● Anthony Sypniewski @ Google (NYC) ○ Software Engineer, Data Scientist ○ Supra Legem: Dev the website (back end, front end) ● Michaël Benesty @ Deloitte (Paris) [this is me] ○ Tax law associate, former CPA/Financial auditor ○ XGBoost + FeatureHashing R packages co-author, DMLC member ○ Supra Legem: Dev ETL & machine learning None of the team member employers is linked in any way to this personal project. Opinions expressed here are team’s only.
  • 3. Plan 1. Brief overview of the French legal system 2. The intuition behind word2vec 3. How we designed our fancy neural network 4. Presentation of the dataset and our bias analysis result
  • 4. French legal system… in 1 schema ORDINARY COURTS ADMINISTRATIVE COURTSCIVIL LAW CRIME LAW Cour de cassation : chambers SupremeCourts1stDEGREE Labour Commercial 3 Civil chambers Criminal 2ndDEGREE Cour d’appel : chambers Labour Commercial Civil Criminal Cour d’ assises Tribunal de Commerce Tribunal de Grande Instance Tribunal Correctionnel Cour d’ assises Conseil de Prud’hommes Tribunal d’Instance Tribunal de Police Juge de proximite Conseil d’Etat Litigation division Cour administrative d’appel Tribunal administratif
  • 5. A simple asylum seeker journey (or not) When an asylum seeker (or any other undocumented people) receives a deportation order (dark red boxes), one of the options is to ask an administrative court judge to cancel it.Those judge decisions are the one we will analyze in the next slides. For the most intrepid, a readable version of this schema is available on the Senate website
  • 6. Basic intuition behind word2vec : feed forward (1/3) Famous algorithm which assigns similar vectors to similar words from a corpus (what means similar?). Below is a theoric simplified version of word2vec which corresponds to 1 context word with Continuous Bag of Words (C-BOW). Task is to predict a word from its context. Ex.: The court of appeal judge is Mr. Toto. Based on the distributional hypothesis (Harris, 1954): you shall know a word by the company it keeps (Firth, 1957) ● Input layer: 1 hot encoded context word (indice in dictionary) ● W: context dense word matrice (whole dic) ● Hidden layer: context word dense vector ● W’: output dense word matrice (whole dic) ● Output layer: P(output | context word)
  • 7. Basic intuition behind word2vec : feed forward (2/3) Objective : Loss : minimize E with Back propagation : With : can be interpreted as the prediction error Continue back propagation Output vector update : Continue back propagation Parameter update
  • 8. Basic intuition behind word2vec : feed forward (3/3) Possible intuitions* from previous slide (during back propagation): ● e is the error rate; ● Output Word vector (w) tends to look like its context vector (w’); ● Context word will be adjusted by the sum of the output vectors (w’) weighted by the prediction error (e). Therefore, if a low probability is given to the word to predict, the adjustment will make context vector looks like more that output vector. In some way the context vector will be closer to it (cosine distance). Vice versa is true for high prediction for the wrong word. ● Combined, words sharing same distribution of contexts finish with similar vectors Some interesting readings about Word2Vec may include: ● word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method (2014, Yoav Goldberg, Omer Levy) ● Neural word embedding as implicit matrix factorization (2014, Yoav Goldberg, Omer Levy) *: this is a bad and partial summary from word2vec Parameter Learning Explained (2016, Xin Rong)
  • 9. Recurrent Neural Network (RNN) structure is not that far from a shallow neural networks ● U matrix is initialized with vectors learned by word2vec ● x is just a 1 hot encoded vector of the word indice in the whole corpus dictionary ● The model is basically stored in W ● W is shared among steps -> RNN is like deep learning with shared weights. ● Because of noise on long sequences, vanishing / exploding gradient issue is exacerbated
  • 10. Gated Recurrent Unit (GRU) improves long dependencies GRU has 2 gates: ● Reset gate r: determines how to combine the new input with the previous memory ● Update gate z: how much of the previous memory to keep around Unlike RNN, GRU can choose what to learn from the present and what to throw away from the past. Advantages: ● Important features are protected against being overwritten ● During back propagation, there are shortcut paths that bypass multiple temporal steps -> avoid vanishing gradients as a result of passing through multiple bounded nonlinearities
  • 11. Generalizing the model by multi-input and multi task learning Why multi input? ● Splitting the text in different parts for long documents (in a way which makes sense) ○ Possible to have parameters specific to each part (if parts are very different) ○ Split text to avoid long distance dependency issue ● Provide non time-series data to help understanding text document ○ Categorical data related to document context Why multitask? ● Better model generalization as each task has its own bias ○ Kind of ensemble approach, but during the learning process ● One gradient descent process per task: it is like* having a bigger dataset *: Ok this is not the same thing but still, multitasking may provide better improvement than just increasing epoch numbers.
  • 12. What the final model looks like Text 1 Text 2 Merge + Dropout Merge + Dropout Softmax Task 1 Softmax Task 2 Softmax Task 3 Context Dense + PReLU + Dropout * 3 ● Task 1: learn category of the applicant ● Task 2: learn category of the defendant ● Task 3: learn category of the decision solution ● Text 1: claims from the applicant ● Text 2: solution of the decision ● Context : categorical information about the decision (court, judge name, thema, …) Text 1 and Text 2 are extracted from full decision by another learning. Bidirectional GRU Bidirectional GRU
  • 13. Results The dataset is probably quite easy to get for a model: the vocabulary is stable, there is no irony or double negations and text formula and structure are reused among decisions. Fancy model helps too. Simple GRU on task 2 only with default value (from Keras) and no use of word2vec gave slightly more than 80% accuracy. Task 3 is the easiest task as the structure of Text 2 is very stable. On the other side there are 11 categories which is higher than Task 1 & 2. Learning on: training: 70%, validation: 10%, test: 20% Task Accuracy Accuracy after binarization Task 1 (multiclass 6 cat) 0.971 0.978 Task 2 (multiclass 6 cat) 0.918 0.946 Task 3 (multiclass 11 cat) 0.945 Not performed Binarization means that the task 1 and 2 have been recasted to a binary classification: the applicant (or defendant) may be a private person or the administration. Basically there were 5 categories of administration, and they have been merged to one. Same model is used in both cases. Therefore all classifications errors between the 5 administration categories disappear mechanically. It explains entirely the accuracy improvement.
  • 14. Legal decisions from administrative courts of appeal are *partially* available in open data ● Open data decisions are provided by @Etalab there ● Supralegem.fr website covers [2000-2015], it represents 250K decisions ● On 2012-2015 ○ ⅓ to ⅔ of the decisions issued by each appeal court per year is available ○ Court of appeal judge names are provided* for > 98% of the decisions (before 2012 < 95%) ● Analysis periods: ○ per judge: [2012-2015] because before too many judge names are missing ○ per court: [2009-2015] because before data are missing for some courts ● Important questions about open data decisions: ○ How are they selected? ○ Who selects them? ○ Why a part of the decision are not distributed? *: judge names are included in the open data distributed and is not learned.
  • 15. Basic statistics about asylum rejection rate per court Decision selection criteria: ● from a court of appeal ● marked as asylum category ● contain “quitter le territoire” & “étranger” & “asile” ● applicant: natural person, defendant: administration 4 courts have a rejection rate increasing, 3 are stable, 1 is decreasing. Seems to match the political context.
  • 16. Is there a bias with some judges* regarding asylum? ● Rejection rates of deportation order per judge/year ● Selected 3 highest & lowest rates from top 20% in quantity of decisions [2012-2015] Case documents are never public but would be required to have a deep analysis of this apparent bias. Not shown here: judges from the same court may have very different reject rates in asylum. However there may be some (unknown) good reasons to explain these gaps. On the right, tweets from a judge assistant about the practice of some judges of systematically pushing to refuse canceling deportation orders (before hearing the case). “OQTF” means Ordre de Quitter le Territoire Français. Storify link *: from administrative courts of appeal Judge Adm. court of appeal % 2012 % 2013 % 2014 % 2015 Decision quantity [12-15] Guerrive Marseille 78 41 43 60 453 Cherrier Marseille NA NA 67 63 233 Krulic Paris NA NA 60 73 198 Tandonnet Turot Paris 90 97 98 100* 227 Pellissier Nancy NA 93 92 96 304 Helmholtz Versailles NA 93 92 91 201 *: Tandonnet Turot 2015 rate corresponds to few decisions and is not significant.
  • 17. The (im)possible legal consequences of a bias ● Article 6.1 of the European Convention on Human Rights (ECHR) In the determination of his civil rights and obligations or of any criminal charge against him, everyone is entitled to a fair and public hearing within a reasonable time by an independent and impartial tribunal established by law… ● ECHR, Remli Vs France, 23/04/96: France is convicted for subjective partiality (racism of a juror) ● Article L721-1 of administrative justice code: if one has doubts about impartiality of a judge, one can ask for its recusal ● Interesting report from French private law Supreme court about judge impartiality Truth is that there is (very) little chance that a French judge recognizes a bias from statistics related to a colleague
  • 18. SupraLegem.fr The result of this work is provided to everyone on a dedicated website. Tags learned are provided as filters. A viz of the results is generated on each search to show patterns, if any. As of Feb 17th, 2016, 2015 is not complete, but this should be fixed in coming days.