SlideShare une entreprise Scribd logo
1  sur  10
Télécharger pour lire hors ligne
Application of Machine Learning to Predict Outcome of US Court of Appeals
Krishna Mohan
Thomson Reuters
krishna.mohan3@tr.com
Nitin Hosurkar
Arrow Electronics
nitin.hosurkar@gmail.com
Pradeepta Mishra
Ma Foi
Pradeepta.mishra1@gmail.com
Abstract:
In 2004, Theodore Ruger et al (Theodore W. Ruger, 2004), made a bold claim that data analytics models
can predict the outcome of US Supreme Court better than experts in legal domain. Historical data was
used to develop decision trees that predicted whether the US Supreme Court would confirm or reverse the
lower court ruling – a binary outcome. This got many in the legal community excited, while many others
received it with cautious skepticism.
In this project, we aim to take prediction of court rulings further and look at the next lower level in US
Court system hierarchy – namely, the US Court of Appeals. Unlike the Supreme Court study which has a
binomial outcome, the US Court of Appeals has 12 possible outcomes. Therefore, the methods,
techniques and interpretation required to develop a predictive model are very different and challenging.
Data obtained over a 7 year period sourced from public domain were cleansed and dimensions reduced
using Chi-Square analysis and Boruta package from R. Classification techniques used include Random
Forest, Neural Network, XG Boost and Ensemble. Prediction accuracy of the models range from 36% to
98%, requiring identification of parameters that ensure robustness of the models. Although there are no
benchmarks available in legal domain to compare our accuracy levels, the results are highly encouraging.
By applying the models on similar data collected from other courts and over longer durations, there is an
opportunity to make them more robust and reliable. With rapid digitization, we see opportunities to apply
similar techniques in India in the near future.
Keywords:
Legal predictive analytics, multinomial classification, judicial analytics, random forest, neural network
Introduction (Section 1):
Every year tens of thousands of cases work their way through the US Judicial system. Very often parties
involved look for higher courts to get ruling in their favor based on expert advice received from lawyers
provided using their prior experience and intuition. The clients’ tangible and intangible stakes also clouds
their decision to pursue the case further. Only later do they realize that an out-of-court settlement would
have resulted in best outcome for all parties involved including the courts.
From the lawyers’ standpoint, it is not just enough to research on previous rulings in strategic
and tactical preparation of their cases. Rather, it would be important to understand the factors or
variables that courts rely on to arrive at their decisions.
Courts tend to document several parameters related to their functioning such as parties involved,
hearing dates, nature of the case, rulings from earlier courts, laws applied, etc. This data can be
leveraged to predict the outcome of future cases based on these parameters by applying data
analytics techniques. This data based approach is far more objective as compared to intuitive
and experience based speculation that has been the norm. Both clients and lawyers can make
decisions with lot more confidence. With reduction in frivolous and outlier cases, Courts would
be able to save their precious resources which can be repurposed for gaining efficiencies within
the system.
In this study, we have attempted to predict the outcome of US Court of Appeals. The outcome
can assume 12 possible ruling values. Therefore, the multi-variate output challenged us to go
well beyond Logistic Regression to techniques such as Random Forest, Neural Network, XG
Boost and Ensemble.
This paper defines the problem statement in Section 2. Related literature review and previous
work done in this area are discussed in Section 3. Data Sources are identified in Section 4. Next
we look into the nature of data and its engineering in Sections 5 and 6 respectively. Now that the
data is ready, we focus on selection criteria for model building techniques in Section 7. The
results are discussed in Section 8 and the overall conclusions drawn in Section 9.
Problem Statement (Section 2):
Develop models that can predict the outcome or treatment of a case by the US Court of Appeals
based on historical data using - basic case characteristics, participants, nature of the case, judges
and votes. Today, experience and intuition are used to make such predictions. This project will
involve data exploration, data engineering and building appropriate predictive models using
various techniques such as Random Forest, Neural Networks, XG Boost and Ensemble.
Literature Review (Section 3):
Prof. Frank B Cross of University of Texas, Austin studied the decision making process in the US Court
of Appeals. (Cross, 2003) He explains that there are four primary theoretical models that determine the
outcome of cases that the court handles. The first is the Legal Model, wherein the decision is made
strictly in accordance to the law. The second theoretical model is the Political Model in which ideology
of judges may be a factor. Third is the Strategic Model of adapting decisions to the preference of the US
Supreme Court. The fourth and last model is the Litigant-driven model in which the strategic decisions of
the parties involved can drive the outcome of a case.
Prof. Cross concludes that legal and political factors are statistically significant determinants of decisions,
while Strategic and Litigant-driver factors have no significance. This leaves a litigant with little
ammunition or tools to influence the outcome in his/her favor. It is possible that the litigants did not quite
have the tools to formulate their strategy to a point wherein the litigant driven factors also become
significant. This may primarily be due to over resilience on a lawyer’s intuition, experience and
expertise. A more objective approach for a litigant would be to use data as an instrument for strategizing.
In this paper, we focus on building one such strategy tool. Being able to predict the outcome of a case in
the US Court of Appeals, becomes an important input for a litigant to better determine his/her options or
strategy. While similar work has been done in the past with regard to predicting whether the US Supreme
Court would confirm or overturn the ruling of a lower court – a binary outcome, in this project we try to
predict multiple outcomes in the US Court of Appeals.
Data Sources (Section 4):
The Judicial Research Initiative (JuRI) at the University of South Carolina, Columbia took up the Appeals
Court Database Project to create an extensive dataset that would facilitate empirical analysis of the
judges’ votes and overall ruling of the Appealate Court. Data on a broad range of variables of theoretical
significance to public law scholars were coded and published. The 1997-2002 database (JuRI_data,
2003) along with codebook (JuRI_Codebook, 2003) effort was lead by Dr. Ashlyn K Kuersten of Western
Michigan University and Susan B. Haire of the University of Georgia.
Data source links relevant to this project are provided below:
 Website: http://artsandsciences.sc.edu/poli/juri/appct.htm
 Codebook: http://artsandsciences.sc.edu/poli/juri/KH_update_codebook.pdf
 Data (stata format): http://www.cas.sc.edu/poli/juri/KH_update_stata.zip
Data and Variables (Section 5):
The raw data file in csv format consists of 2,160 rows and 244 columns. Almost all the variables were
categorical. Variables with more than 15% missing values were removed – all other variables that were
retained had less than 5% missing values. The data also consisted of 5-digit nominal values – each digit
represented a categorical value with as many as 12 sub-category levels. Composite data was decomposed
in separate fields and renamed for better understanding. Many of the categorical variables required
dummy coding, thus vastly enlarging size of our dataset.
The dependent or outcome variable for our study is ‘Treatment’. According to the Codebook, Treatment
can assume one of the 12 possible values that are coded as follows: 0= stay petition or motion granted,
1=affirmed, 2=reversed, 3=reversed and remanded, 4=vacated and remanded, 5=affirmed in part and
reversed in part, 6=affirmed in part, reversed in part and remanded, 7=vacated, 8=petition denied or
appeal dismissed, 9=certification to another court, 10=not ascertained, 11=affirmed, vacated and
remanded.
As illustrated in Fig. 1, 5 of the Treatment values constitute nearly 90% of the outcomes. After careful
study and consideration the commonalities and distinct features, the number of Treatment outcomes were
consolidated to 7 as shown in Fig. 1a. For easier understanding, the nominal values were replaced with
outcome description.
Fig 1: Distribution of Treatment outcome BEFORE consolidation
Fig 1a: Distribution of Treatment outcome AFTER consolidation
Data Engineering (Section 6):
Given that the original dataset had 244 columns, which was vastly expanded after decomposing
composite data and converting categorical variables to dummy variables, it was necessary to organize (see
Fig. 2) them in a manner that was easier to comprehend and perform further analysis such as
dimensionality reduction.
Fig 2: Data organization BEFORE Dimension Reduction
Chi-Square Analysis was performed on the categorical variables to identify predictor variables that
significantly affected the case outcome. The Chi-squared results were additionally corroborated by
performing feature selection using the “Boruta” package in R.
The plot from Boruta package in Fig. 3 shows the variables plotted (on x-axis) against the “Importance”
(on y-axis). The variables marked in GREEN are the most important features selected by the package.
Although Chi-Square analysis and Boruta have helped us arrive at the most significant predictor variables
to be considered for model building, the list is not final yet!! Based on domain knowledge, we decided to
make the following changes:
 The field PRIOR_COURT is nothing but a description of ORIGIN_NUMBER. Therefore, we
will retain PRIOR_COURT and drop ORIGIN_NUMBER.
 Once we know the CIRCUIT_COURT, it is not necessary to use the States under its jurisdiction.
Therefore, we will retain CIRCUIT_COURT and drop CIRCUIT_STATES.
 Replace STATE_VAL with STATE so that we know which State is being referred to. Similarly,
we replaced DISTRICT_VAL with DISTRICT.
Both Chi-Square and Boruta did not select Judges as a significant predictor variable. However, we do
believe NUM_JUDGES should be included in the model.
After completing Feature Engineering steps described above, the significant variables identified were
organized as shown in Fig. 4.
Fig. 3: Boruta Package output graph – Most significant variables
Fig. 4: Data organization AFTER Dimension Reduction
Model Selection (Section 7):
Being a multi-variate classification problem, intuitively there was an inclination to use Multinomial
Logistic Regression. Taking a closer look, it was noticed that under the surface Multinomial Logistic
Regression still functions as a binomial model. Outcome is re-categorized as A versus B, C, D and so on
for each possible outcome. This results in an inevitable loss of information and result in misleading
conclusions. Therefore, it was decided to park Multinomial Logistic Research while more suitable
predictive models were explored using the following selection parameters:
 Size of data – is it large enough to adequately train the model?
 Dimensionality – with 264 columns, do we keep all of them or only the significant variables?
 Would these algorithms be able to effectively handle independent categorical variables?
 What precautions need to be taken to avoid over-fitting?
 Do we have enough machine power namely, speed/performance/memory to run these
complicated algorithms?
Eventually, the classification models considered are: Random Forest, Neural Network and XGBoost.
Upon developing Random Forest models, we found that there is a tendency towards over-fitting with
accuracy rates as high as 99%. However, with randomized selection of rows and columns Random Forest
is expected to be immune to over-fitting. It is possible that our relatively small dataset size of 2160 rows
was a significant contributor to such an outcome. It was decided that further exploration including use of
larger dataset is required before publishing our conclusion on performance of Random Forest model.
In this paper, we will be focusing primarily on Neural Network and XGBoost.
Neural Network: Using Caret package in R, the data was split into Training and Test datasets. For
building the Neural Network model, we used ‘nnet’ package in R. Since our data predominantly
consisted of categorical variables, softmax was set to TRUE. The softmax function is a gradient-log
normalizer of the categorical probability distribution which is used in various probabilistic multiclass
classification methods (Softmax function, 2016). Similarly, the entropy was also set to TRUE.
Starting with the full Training dataset, as outlined in Fig. 5 steps were taken to progressively improve the
model accuracy by:
 Trimming the data to select only the most significant variables
 Balancing the trimmed Training data to have adequate representation of variables
 Oversampling the outcome Treat variable to ensure the nnet Neural Network model has enough
of an opportunity to learn the characteristics of each outcome. This learning is important for the
model to correctly classify outcome of such cases.
 Finally, the model developed using Training dataset was applied on Test dataset. Results
obtained were used for further analysis.
Fig 6: Neural Network Model Tuning
XGBoost: Similar to steps in developing Neural Network model, Training and Test datasets were created
using Caret package in R for XGBoost also. In addition, xgboost package in R was used. The model
tuning approach used in Neural Network was also used for XGBoost which has been depicted in Fig 7.
The Objective was specified as “multi:softprob” for Multiclass Classification within Parameters used to
develop XGBoost model.
Fig 7: XGBoost Model Tuning
Results and Discussion (Section 8):
The Confusion Matrix for the two Multi-nomial Classification models given in Fig. 8 and Fig. 9 helps us
make the following observations:
 The prominent observation looking both the models is the impact that Oversampling had on their
performance. This helped the models to be prepared when encountered with all types of possible
outcomes.
 Both the Machine Learning techniques have performed very well when the outcome is
‘Affirmed’. This is mainly because the high proportion of this outcome gave enough of an
opportunity for the models to learn over several iterations.
 At the same time, outcomes such as ‘Reversed’ and ‘Vacated’ were predicted as ‘Affirmed’ – a
nearly diametrically opposite classification.
 This behavior lends us to think that there is a fine line between a case being classified as either
Affirmed versus Reversed or Vacated. It could be influenced by one or two critical variables
which if identified could significantly simplify the models. We would like to pursue this in our
future efforts.
 Studying the most significant variables indicated by both Neural Network and XG Boost, we
were able to make the following observations:
o The Appeals court (there are 13 Appeals Courts in USA) that is currently hearing the case
significantly affects the outcome. Understanding which Appeals Court is more likely to
rule in an Appellant’s favor would be valuable in working out the strategy for the
Appellant.
o If the previous court was unable to decide on the case and the outcome was ‘Not
Ascertained’, the Appeals Court is likely to give a more decisive ruling.
o Nature of the Applicant also plays a significant role in outcome of the Appeals court. If
the Appellant happens to be a ‘Natural Citizen’, this happens to have a greater
significance on the Appeals court outcome.
o In a panel of Judges, the Directionality of the 3rd
Judge has a significant affect on the
overall outcome of the case.
o Amongst these, the Judge’s assertion on broadest interpretation of First Amendment
protection including Freedom of Speech, Religion and Right to Protest Peacefully are
highly significant.
Fig 8: Neural Network - Confusion Matrix
Fig 9: XG Boost – Confusion Matrix
Conclusion (Section 9):
We have made an effort to develop multi-nomial predictive models that can predict the outcome of cases
handled by US Court of Appeals. These models would enable litigants and lawyers take decisions more
objectively using historical data rather than their experience and intuition. After extensive cleansing of
data and organizing them for better understanding, classification techniques such as Neural Network and
XG Boost were used.
The biggest limitation was the size of data available – total of 2160 rows. Machine learning techniques
we applied such as Neural Network and XG Boost had restricted opportunities to refine their weightages
for all possible outcomes. To address this shortcoming, over-sampled data was used that significantly
improved the model performance.
Overall, the results obtained from these models were very encouraging. The models’ level of Accuracy,
resource usage and consistency validated our intention to demonstrate use of analytics in legal domain.
As part of future studies, we plan to determine the characteristics of each outcome using Decision Trees
and also simplify the models to use fewer variables.
In the light of initiatives such as Digital India, we expect large amounts of legal data to become available
in the coming years. Analytics can enable bring in efficiencies in the Indian legal system to reduce the 3
crore plus cases pending court decisions.
References (Section 10):
Cross, F. B. (2003, December). Decision Making in the US Courts of Appeals. Retrieved from California
Law Review:
http://scholarship.law.berkeley.edu/cgi/viewcontent.cgi?article=1351&context=californialawreview
JuRI. (n.d.). Retrieved from http://artsandsciences.sc.edu/poli/juri/appct.htm
JuRI_Codebook. (2003). KH_codebook. Retrieved from Arts and Sciences, SC:
http://artsandsciences.sc.edu/poli/juri/KH_update_codebook.pdf
JuRI_data. (2003). www.cas.sc.edu. Retrieved from KH_update:
http://www.cas.sc.edu/poli/juri/KH_update_stata.zip
Softmax function. (2016, October 9). Retrieved from Wikipedia:
https://en.wikipedia.org/wiki/Softmax_function
Theodore W. Ruger, P. T. (2004, 01 01). The Supreme Court Forecasting Project: Legal and Political.
Retrieved from Berkeley Law:
http://scholarship.law.berkeley.edu/cgi/viewcontent.cgi?article=1018&context=facpubs

Contenu connexe

Tendances

Query-Based Retrieval of Annotated Document
Query-Based Retrieval of Annotated DocumentQuery-Based Retrieval of Annotated Document
Query-Based Retrieval of Annotated DocumentIRJET Journal
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools forIJDKP
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaIJDKP
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentIJDKP
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection methodIJSRD
 
Credit Scoring Using CART Algorithm and Binary Particle Swarm Optimization
Credit Scoring Using CART Algorithm and Binary Particle Swarm Optimization Credit Scoring Using CART Algorithm and Binary Particle Swarm Optimization
Credit Scoring Using CART Algorithm and Binary Particle Swarm Optimization IJECEIAES
 
Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for properIJDKP
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...IJDKP
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemIJSRD
 
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
 
Pharma data analytics
Pharma data analyticsPharma data analytics
Pharma data analyticsAxon Lawyers
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET Journal
 
Ontology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceOntology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceIJCERT
 

Tendances (19)

Query-Based Retrieval of Annotated Document
Query-Based Retrieval of Annotated DocumentQuery-Based Retrieval of Annotated Document
Query-Based Retrieval of Annotated Document
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools for
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
 
IJET-V2I6P32
IJET-V2I6P32IJET-V2I6P32
IJET-V2I6P32
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Credit Scoring Using CART Algorithm and Binary Particle Swarm Optimization
Credit Scoring Using CART Algorithm and Binary Particle Swarm Optimization Credit Scoring Using CART Algorithm and Binary Particle Swarm Optimization
Credit Scoring Using CART Algorithm and Binary Particle Swarm Optimization
 
Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for proper
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
 
Analysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry SystemAnalysis on Student Admission Enquiry System
Analysis on Student Admission Enquiry System
 
Ijcatr04061009
Ijcatr04061009Ijcatr04061009
Ijcatr04061009
 
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...
 
Ez36937941
Ez36937941Ez36937941
Ez36937941
 
Quality Assurance in Knowledge Data Warehouse
Quality Assurance in Knowledge Data WarehouseQuality Assurance in Knowledge Data Warehouse
Quality Assurance in Knowledge Data Warehouse
 
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
 
Pharma data analytics
Pharma data analyticsPharma data analytics
Pharma data analytics
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current Approaches
 
Ontology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceOntology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold Preference
 

Similaire à ICBAI Paper (1)

Computer Assisted Review and Reasonable Solutions under Rule26
Computer Assisted Review and Reasonable Solutions under Rule26Computer Assisted Review and Reasonable Solutions under Rule26
Computer Assisted Review and Reasonable Solutions under Rule26Michael Geske
 
Quantitative Methods for Lawyers - Class #1 - Why Quantitative Methods + Res...
Quantitative Methods for Lawyers - Class #1 -  Why Quantitative Methods + Res...Quantitative Methods for Lawyers - Class #1 -  Why Quantitative Methods + Res...
Quantitative Methods for Lawyers - Class #1 - Why Quantitative Methods + Res...Daniel Katz
 
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...Daniel Katz
 
Electric Insurance ESI Planning
Electric Insurance   ESI PlanningElectric Insurance   ESI Planning
Electric Insurance ESI PlanningJohn Jablonski
 
Benchmark data collection design 1 data
Benchmark data collection design                       1 data Benchmark data collection design                       1 data
Benchmark data collection design 1 data RAJU852744
 
Applying Data Mining Principles in the Extraction of Digital Evidence
Applying Data Mining Principles in the Extraction of Digital EvidenceApplying Data Mining Principles in the Extraction of Digital Evidence
Applying Data Mining Principles in the Extraction of Digital EvidenceDr. Richard Otieno
 
class-action-lit-study
class-action-lit-studyclass-action-lit-study
class-action-lit-studyWill McLennan
 
Masters Project - FINAL - Public
Masters Project - FINAL - PublicMasters Project - FINAL - Public
Masters Project - FINAL - PublicMichael Hay
 
Emerging Impacts in Open Data in the Judiciary Branches in Argentina, Chile...
Emerging Impacts in Open Data  in the Judiciary Branches in  Argentina, Chile...Emerging Impacts in Open Data  in the Judiciary Branches in  Argentina, Chile...
Emerging Impacts in Open Data in the Judiciary Branches in Argentina, Chile...Open Data Research Network
 
Analysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document ReviewAnalysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document ReviewCynthia King
 
Evidence Data Preprocessing for Forensic and Legal Analytics
Evidence Data Preprocessing for Forensic and Legal AnalyticsEvidence Data Preprocessing for Forensic and Legal Analytics
Evidence Data Preprocessing for Forensic and Legal AnalyticsCSCJournals
 
Topic 5 ReviewThis topic review is a tool designed to prepare st.docx
Topic 5 ReviewThis topic review is a tool designed to prepare st.docxTopic 5 ReviewThis topic review is a tool designed to prepare st.docx
Topic 5 ReviewThis topic review is a tool designed to prepare st.docxjuliennehar
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
 
Network and computer forensics
Network and computer forensicsNetwork and computer forensics
Network and computer forensicsJohnson Ubah
 
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET Journal
 
1115 track 3 gopalan_using our laptop
1115 track 3 gopalan_using our laptop1115 track 3 gopalan_using our laptop
1115 track 3 gopalan_using our laptopRising Media, Inc.
 
Legal Informatics Research Today: Implications for Legal Prediction, 3D Print...
Legal Informatics Research Today: Implications for Legal Prediction, 3D Print...Legal Informatics Research Today: Implications for Legal Prediction, 3D Print...
Legal Informatics Research Today: Implications for Legal Prediction, 3D Print...Robert Richards
 
Comparative Study of Classification Method on Customer Candidate Data to Pred...
Comparative Study of Classification Method on Customer Candidate Data to Pred...Comparative Study of Classification Method on Customer Candidate Data to Pred...
Comparative Study of Classification Method on Customer Candidate Data to Pred...IJECEIAES
 
Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...Mohammad Aslam Shaiekh
 

Similaire à ICBAI Paper (1) (20)

Computer Assisted Review and Reasonable Solutions under Rule26
Computer Assisted Review and Reasonable Solutions under Rule26Computer Assisted Review and Reasonable Solutions under Rule26
Computer Assisted Review and Reasonable Solutions under Rule26
 
Quantitative Methods for Lawyers - Class #1 - Why Quantitative Methods + Res...
Quantitative Methods for Lawyers - Class #1 -  Why Quantitative Methods + Res...Quantitative Methods for Lawyers - Class #1 -  Why Quantitative Methods + Res...
Quantitative Methods for Lawyers - Class #1 - Why Quantitative Methods + Res...
 
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...
 
Electric Insurance ESI Planning
Electric Insurance   ESI PlanningElectric Insurance   ESI Planning
Electric Insurance ESI Planning
 
Benchmark data collection design 1 data
Benchmark data collection design                       1 data Benchmark data collection design                       1 data
Benchmark data collection design 1 data
 
Applying Data Mining Principles in the Extraction of Digital Evidence
Applying Data Mining Principles in the Extraction of Digital EvidenceApplying Data Mining Principles in the Extraction of Digital Evidence
Applying Data Mining Principles in the Extraction of Digital Evidence
 
class-action-lit-study
class-action-lit-studyclass-action-lit-study
class-action-lit-study
 
Masters Project - FINAL - Public
Masters Project - FINAL - PublicMasters Project - FINAL - Public
Masters Project - FINAL - Public
 
Emerging Impacts in Open Data in the Judiciary Branches in Argentina, Chile...
Emerging Impacts in Open Data  in the Judiciary Branches in  Argentina, Chile...Emerging Impacts in Open Data  in the Judiciary Branches in  Argentina, Chile...
Emerging Impacts in Open Data in the Judiciary Branches in Argentina, Chile...
 
Analysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document ReviewAnalysing Predictive Coding Algorithms For Document Review
Analysing Predictive Coding Algorithms For Document Review
 
DOJ
DOJDOJ
DOJ
 
Evidence Data Preprocessing for Forensic and Legal Analytics
Evidence Data Preprocessing for Forensic and Legal AnalyticsEvidence Data Preprocessing for Forensic and Legal Analytics
Evidence Data Preprocessing for Forensic and Legal Analytics
 
Topic 5 ReviewThis topic review is a tool designed to prepare st.docx
Topic 5 ReviewThis topic review is a tool designed to prepare st.docxTopic 5 ReviewThis topic review is a tool designed to prepare st.docx
Topic 5 ReviewThis topic review is a tool designed to prepare st.docx
 
Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 
Network and computer forensics
Network and computer forensicsNetwork and computer forensics
Network and computer forensics
 
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
 
1115 track 3 gopalan_using our laptop
1115 track 3 gopalan_using our laptop1115 track 3 gopalan_using our laptop
1115 track 3 gopalan_using our laptop
 
Legal Informatics Research Today: Implications for Legal Prediction, 3D Print...
Legal Informatics Research Today: Implications for Legal Prediction, 3D Print...Legal Informatics Research Today: Implications for Legal Prediction, 3D Print...
Legal Informatics Research Today: Implications for Legal Prediction, 3D Print...
 
Comparative Study of Classification Method on Customer Candidate Data to Pred...
Comparative Study of Classification Method on Customer Candidate Data to Pred...Comparative Study of Classification Method on Customer Candidate Data to Pred...
Comparative Study of Classification Method on Customer Candidate Data to Pred...
 
Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Application of Secondary Data in Epidemiological Study, Design Protocol and S...
 

ICBAI Paper (1)

  • 1. Application of Machine Learning to Predict Outcome of US Court of Appeals Krishna Mohan Thomson Reuters krishna.mohan3@tr.com Nitin Hosurkar Arrow Electronics nitin.hosurkar@gmail.com Pradeepta Mishra Ma Foi Pradeepta.mishra1@gmail.com Abstract: In 2004, Theodore Ruger et al (Theodore W. Ruger, 2004), made a bold claim that data analytics models can predict the outcome of US Supreme Court better than experts in legal domain. Historical data was used to develop decision trees that predicted whether the US Supreme Court would confirm or reverse the lower court ruling – a binary outcome. This got many in the legal community excited, while many others received it with cautious skepticism. In this project, we aim to take prediction of court rulings further and look at the next lower level in US Court system hierarchy – namely, the US Court of Appeals. Unlike the Supreme Court study which has a binomial outcome, the US Court of Appeals has 12 possible outcomes. Therefore, the methods, techniques and interpretation required to develop a predictive model are very different and challenging. Data obtained over a 7 year period sourced from public domain were cleansed and dimensions reduced using Chi-Square analysis and Boruta package from R. Classification techniques used include Random Forest, Neural Network, XG Boost and Ensemble. Prediction accuracy of the models range from 36% to 98%, requiring identification of parameters that ensure robustness of the models. Although there are no benchmarks available in legal domain to compare our accuracy levels, the results are highly encouraging. By applying the models on similar data collected from other courts and over longer durations, there is an opportunity to make them more robust and reliable. With rapid digitization, we see opportunities to apply similar techniques in India in the near future. Keywords: Legal predictive analytics, multinomial classification, judicial analytics, random forest, neural network Introduction (Section 1): Every year tens of thousands of cases work their way through the US Judicial system. Very often parties involved look for higher courts to get ruling in their favor based on expert advice received from lawyers provided using their prior experience and intuition. The clients’ tangible and intangible stakes also clouds their decision to pursue the case further. Only later do they realize that an out-of-court settlement would have resulted in best outcome for all parties involved including the courts. From the lawyers’ standpoint, it is not just enough to research on previous rulings in strategic and tactical preparation of their cases. Rather, it would be important to understand the factors or variables that courts rely on to arrive at their decisions.
  • 2. Courts tend to document several parameters related to their functioning such as parties involved, hearing dates, nature of the case, rulings from earlier courts, laws applied, etc. This data can be leveraged to predict the outcome of future cases based on these parameters by applying data analytics techniques. This data based approach is far more objective as compared to intuitive and experience based speculation that has been the norm. Both clients and lawyers can make decisions with lot more confidence. With reduction in frivolous and outlier cases, Courts would be able to save their precious resources which can be repurposed for gaining efficiencies within the system. In this study, we have attempted to predict the outcome of US Court of Appeals. The outcome can assume 12 possible ruling values. Therefore, the multi-variate output challenged us to go well beyond Logistic Regression to techniques such as Random Forest, Neural Network, XG Boost and Ensemble. This paper defines the problem statement in Section 2. Related literature review and previous work done in this area are discussed in Section 3. Data Sources are identified in Section 4. Next we look into the nature of data and its engineering in Sections 5 and 6 respectively. Now that the data is ready, we focus on selection criteria for model building techniques in Section 7. The results are discussed in Section 8 and the overall conclusions drawn in Section 9. Problem Statement (Section 2): Develop models that can predict the outcome or treatment of a case by the US Court of Appeals based on historical data using - basic case characteristics, participants, nature of the case, judges and votes. Today, experience and intuition are used to make such predictions. This project will involve data exploration, data engineering and building appropriate predictive models using various techniques such as Random Forest, Neural Networks, XG Boost and Ensemble. Literature Review (Section 3): Prof. Frank B Cross of University of Texas, Austin studied the decision making process in the US Court of Appeals. (Cross, 2003) He explains that there are four primary theoretical models that determine the outcome of cases that the court handles. The first is the Legal Model, wherein the decision is made strictly in accordance to the law. The second theoretical model is the Political Model in which ideology of judges may be a factor. Third is the Strategic Model of adapting decisions to the preference of the US Supreme Court. The fourth and last model is the Litigant-driven model in which the strategic decisions of the parties involved can drive the outcome of a case. Prof. Cross concludes that legal and political factors are statistically significant determinants of decisions, while Strategic and Litigant-driver factors have no significance. This leaves a litigant with little ammunition or tools to influence the outcome in his/her favor. It is possible that the litigants did not quite have the tools to formulate their strategy to a point wherein the litigant driven factors also become significant. This may primarily be due to over resilience on a lawyer’s intuition, experience and expertise. A more objective approach for a litigant would be to use data as an instrument for strategizing. In this paper, we focus on building one such strategy tool. Being able to predict the outcome of a case in the US Court of Appeals, becomes an important input for a litigant to better determine his/her options or strategy. While similar work has been done in the past with regard to predicting whether the US Supreme Court would confirm or overturn the ruling of a lower court – a binary outcome, in this project we try to predict multiple outcomes in the US Court of Appeals.
  • 3. Data Sources (Section 4): The Judicial Research Initiative (JuRI) at the University of South Carolina, Columbia took up the Appeals Court Database Project to create an extensive dataset that would facilitate empirical analysis of the judges’ votes and overall ruling of the Appealate Court. Data on a broad range of variables of theoretical significance to public law scholars were coded and published. The 1997-2002 database (JuRI_data, 2003) along with codebook (JuRI_Codebook, 2003) effort was lead by Dr. Ashlyn K Kuersten of Western Michigan University and Susan B. Haire of the University of Georgia. Data source links relevant to this project are provided below:  Website: http://artsandsciences.sc.edu/poli/juri/appct.htm  Codebook: http://artsandsciences.sc.edu/poli/juri/KH_update_codebook.pdf  Data (stata format): http://www.cas.sc.edu/poli/juri/KH_update_stata.zip Data and Variables (Section 5): The raw data file in csv format consists of 2,160 rows and 244 columns. Almost all the variables were categorical. Variables with more than 15% missing values were removed – all other variables that were retained had less than 5% missing values. The data also consisted of 5-digit nominal values – each digit represented a categorical value with as many as 12 sub-category levels. Composite data was decomposed in separate fields and renamed for better understanding. Many of the categorical variables required dummy coding, thus vastly enlarging size of our dataset. The dependent or outcome variable for our study is ‘Treatment’. According to the Codebook, Treatment can assume one of the 12 possible values that are coded as follows: 0= stay petition or motion granted, 1=affirmed, 2=reversed, 3=reversed and remanded, 4=vacated and remanded, 5=affirmed in part and reversed in part, 6=affirmed in part, reversed in part and remanded, 7=vacated, 8=petition denied or appeal dismissed, 9=certification to another court, 10=not ascertained, 11=affirmed, vacated and remanded. As illustrated in Fig. 1, 5 of the Treatment values constitute nearly 90% of the outcomes. After careful study and consideration the commonalities and distinct features, the number of Treatment outcomes were consolidated to 7 as shown in Fig. 1a. For easier understanding, the nominal values were replaced with outcome description. Fig 1: Distribution of Treatment outcome BEFORE consolidation
  • 4. Fig 1a: Distribution of Treatment outcome AFTER consolidation Data Engineering (Section 6): Given that the original dataset had 244 columns, which was vastly expanded after decomposing composite data and converting categorical variables to dummy variables, it was necessary to organize (see Fig. 2) them in a manner that was easier to comprehend and perform further analysis such as dimensionality reduction. Fig 2: Data organization BEFORE Dimension Reduction Chi-Square Analysis was performed on the categorical variables to identify predictor variables that significantly affected the case outcome. The Chi-squared results were additionally corroborated by performing feature selection using the “Boruta” package in R.
  • 5. The plot from Boruta package in Fig. 3 shows the variables plotted (on x-axis) against the “Importance” (on y-axis). The variables marked in GREEN are the most important features selected by the package. Although Chi-Square analysis and Boruta have helped us arrive at the most significant predictor variables to be considered for model building, the list is not final yet!! Based on domain knowledge, we decided to make the following changes:  The field PRIOR_COURT is nothing but a description of ORIGIN_NUMBER. Therefore, we will retain PRIOR_COURT and drop ORIGIN_NUMBER.  Once we know the CIRCUIT_COURT, it is not necessary to use the States under its jurisdiction. Therefore, we will retain CIRCUIT_COURT and drop CIRCUIT_STATES.  Replace STATE_VAL with STATE so that we know which State is being referred to. Similarly, we replaced DISTRICT_VAL with DISTRICT. Both Chi-Square and Boruta did not select Judges as a significant predictor variable. However, we do believe NUM_JUDGES should be included in the model. After completing Feature Engineering steps described above, the significant variables identified were organized as shown in Fig. 4. Fig. 3: Boruta Package output graph – Most significant variables
  • 6. Fig. 4: Data organization AFTER Dimension Reduction Model Selection (Section 7): Being a multi-variate classification problem, intuitively there was an inclination to use Multinomial Logistic Regression. Taking a closer look, it was noticed that under the surface Multinomial Logistic Regression still functions as a binomial model. Outcome is re-categorized as A versus B, C, D and so on for each possible outcome. This results in an inevitable loss of information and result in misleading conclusions. Therefore, it was decided to park Multinomial Logistic Research while more suitable predictive models were explored using the following selection parameters:  Size of data – is it large enough to adequately train the model?  Dimensionality – with 264 columns, do we keep all of them or only the significant variables?  Would these algorithms be able to effectively handle independent categorical variables?  What precautions need to be taken to avoid over-fitting?  Do we have enough machine power namely, speed/performance/memory to run these complicated algorithms? Eventually, the classification models considered are: Random Forest, Neural Network and XGBoost. Upon developing Random Forest models, we found that there is a tendency towards over-fitting with accuracy rates as high as 99%. However, with randomized selection of rows and columns Random Forest is expected to be immune to over-fitting. It is possible that our relatively small dataset size of 2160 rows was a significant contributor to such an outcome. It was decided that further exploration including use of larger dataset is required before publishing our conclusion on performance of Random Forest model. In this paper, we will be focusing primarily on Neural Network and XGBoost. Neural Network: Using Caret package in R, the data was split into Training and Test datasets. For building the Neural Network model, we used ‘nnet’ package in R. Since our data predominantly consisted of categorical variables, softmax was set to TRUE. The softmax function is a gradient-log normalizer of the categorical probability distribution which is used in various probabilistic multiclass classification methods (Softmax function, 2016). Similarly, the entropy was also set to TRUE. Starting with the full Training dataset, as outlined in Fig. 5 steps were taken to progressively improve the model accuracy by:
  • 7.  Trimming the data to select only the most significant variables  Balancing the trimmed Training data to have adequate representation of variables  Oversampling the outcome Treat variable to ensure the nnet Neural Network model has enough of an opportunity to learn the characteristics of each outcome. This learning is important for the model to correctly classify outcome of such cases.  Finally, the model developed using Training dataset was applied on Test dataset. Results obtained were used for further analysis. Fig 6: Neural Network Model Tuning XGBoost: Similar to steps in developing Neural Network model, Training and Test datasets were created using Caret package in R for XGBoost also. In addition, xgboost package in R was used. The model tuning approach used in Neural Network was also used for XGBoost which has been depicted in Fig 7. The Objective was specified as “multi:softprob” for Multiclass Classification within Parameters used to develop XGBoost model.
  • 8. Fig 7: XGBoost Model Tuning Results and Discussion (Section 8): The Confusion Matrix for the two Multi-nomial Classification models given in Fig. 8 and Fig. 9 helps us make the following observations:  The prominent observation looking both the models is the impact that Oversampling had on their performance. This helped the models to be prepared when encountered with all types of possible outcomes.  Both the Machine Learning techniques have performed very well when the outcome is ‘Affirmed’. This is mainly because the high proportion of this outcome gave enough of an opportunity for the models to learn over several iterations.  At the same time, outcomes such as ‘Reversed’ and ‘Vacated’ were predicted as ‘Affirmed’ – a nearly diametrically opposite classification.  This behavior lends us to think that there is a fine line between a case being classified as either Affirmed versus Reversed or Vacated. It could be influenced by one or two critical variables which if identified could significantly simplify the models. We would like to pursue this in our future efforts.  Studying the most significant variables indicated by both Neural Network and XG Boost, we were able to make the following observations: o The Appeals court (there are 13 Appeals Courts in USA) that is currently hearing the case significantly affects the outcome. Understanding which Appeals Court is more likely to rule in an Appellant’s favor would be valuable in working out the strategy for the Appellant. o If the previous court was unable to decide on the case and the outcome was ‘Not Ascertained’, the Appeals Court is likely to give a more decisive ruling. o Nature of the Applicant also plays a significant role in outcome of the Appeals court. If the Appellant happens to be a ‘Natural Citizen’, this happens to have a greater significance on the Appeals court outcome.
  • 9. o In a panel of Judges, the Directionality of the 3rd Judge has a significant affect on the overall outcome of the case. o Amongst these, the Judge’s assertion on broadest interpretation of First Amendment protection including Freedom of Speech, Religion and Right to Protest Peacefully are highly significant. Fig 8: Neural Network - Confusion Matrix Fig 9: XG Boost – Confusion Matrix Conclusion (Section 9): We have made an effort to develop multi-nomial predictive models that can predict the outcome of cases handled by US Court of Appeals. These models would enable litigants and lawyers take decisions more objectively using historical data rather than their experience and intuition. After extensive cleansing of data and organizing them for better understanding, classification techniques such as Neural Network and XG Boost were used. The biggest limitation was the size of data available – total of 2160 rows. Machine learning techniques we applied such as Neural Network and XG Boost had restricted opportunities to refine their weightages for all possible outcomes. To address this shortcoming, over-sampled data was used that significantly improved the model performance. Overall, the results obtained from these models were very encouraging. The models’ level of Accuracy, resource usage and consistency validated our intention to demonstrate use of analytics in legal domain. As part of future studies, we plan to determine the characteristics of each outcome using Decision Trees and also simplify the models to use fewer variables.
  • 10. In the light of initiatives such as Digital India, we expect large amounts of legal data to become available in the coming years. Analytics can enable bring in efficiencies in the Indian legal system to reduce the 3 crore plus cases pending court decisions. References (Section 10): Cross, F. B. (2003, December). Decision Making in the US Courts of Appeals. Retrieved from California Law Review: http://scholarship.law.berkeley.edu/cgi/viewcontent.cgi?article=1351&context=californialawreview JuRI. (n.d.). Retrieved from http://artsandsciences.sc.edu/poli/juri/appct.htm JuRI_Codebook. (2003). KH_codebook. Retrieved from Arts and Sciences, SC: http://artsandsciences.sc.edu/poli/juri/KH_update_codebook.pdf JuRI_data. (2003). www.cas.sc.edu. Retrieved from KH_update: http://www.cas.sc.edu/poli/juri/KH_update_stata.zip Softmax function. (2016, October 9). Retrieved from Wikipedia: https://en.wikipedia.org/wiki/Softmax_function Theodore W. Ruger, P. T. (2004, 01 01). The Supreme Court Forecasting Project: Legal and Political. Retrieved from Berkeley Law: http://scholarship.law.berkeley.edu/cgi/viewcontent.cgi?article=1018&context=facpubs