Predictive Coding and E-Discovery in 2015 and Beyond - LegalTechNYC 2013 ( Daniel Martin Katz + Michael J. Bommarito II )

•

7 j'aime•8,873 vues

This document discusses predictive coding and machine learning methods for e-discovery. It begins by explaining that predictive coding relies on supervised machine learning methods, which use human-coded training sets to classify documents as relevant or non-relevant. The document then discusses different types of supervised and unsupervised machine learning algorithms that could be used for predictive coding. It suggests that using prior cases to inform feature selection and weightings could make algorithms "smarter." Finally, it considers whether discovery costs will eventually be reduced due to scaling relationships for cost per gigabyte and long-term rates of electronic data growth.

Technologie Formation

@ computationalcomputationallegalstudies.com
Predictive Coding
and E-Discovery
in 2015 and
Beyond
daniel martin katz
michael j bommarito ii

The 2x2 Machine Learning Spectrum
Info Viz & Pattern Detection
Rates of Scaling

In Order to
Understand Where
We Are Heading ...

it is necessary to have
insight regarding how
predictive coding
actually works

Predictive Coding
Relies Upon a
Particular Class of
Machine Learning
Methods

The Current Approach
is drawn from
the family of so called
“supervised methods”

What is the difference
between supervised
and unsupervised?

Develop a Training Set
using human experts

In the simple case,
assign objects to
two piles

yellow = relevant
white = non-relevant
And Return This ...

What Allows A Human
To Separate These
Two Classes of
Documents?

that precise human
process is what
predictive coding is
trying to mimic

Humans are selecting
upon features of
documents

to place those
documents in their
respective bins
(i.e. relevant, non-relevant)

features =?
text,
author,
date,
other metadata

supervised methods
“learn” from the
training data

but there are different
forms of learning by
machines ...

There Is Learning
Within a Matter
(i.e. learning from a
specific training set)

But what about using
prior matters to inform
both feature selection
and the weighting of
those features

In other words, it is
possible to learn from
the experience of
having processed
documents in the past

both inside a given
company but also
across companies ...

It comes from
data aggregation / reusing data

This is Learning and
Rule Propagation
Across Matters

feedback loops are the
best friends of algorithms

feedback loops can help
make algorithms become
much smarter ...

Supervised Unsupervised
Predictive
Coding
The Future
Machine
Learning
Methods
2 x 2
Informed
Naive
Basic
Clustering
Algorithm

Supervised
Statistical models
Bayesian, e.g., Naïve Bayes Classification
Frequentist, e.g., Ordinary Least Squares
Neural Networks (NN)
Support Vector Machines (SVM)
Random Forests (RF)
Genetic Algorithms (GA)
Semi/unsupervised
Neural Networks (NN)
Clustering
K-means
Hierarchical
Radial Basis (RBF)
Graph
Some Machine Learning Algorithms

Think about the task faced
by the intelligence
community ...

how are those
intelligence
analysts aided?

The Visual Cortex is a very
powerful CPU ...

because there are significant
efficiency gains to be
obtained from applications of
sophisticated data
visualization techniques

This Next Generation of
EDiscovery Software is
viz intensive ...

including an even more
enriched notion of time
dynamics ...

Will Discovery Costs
Eventually Be Reduced?

Two Scaling Relationships
that are in question ...

“[I]n 2001, a 300 Gb legal matter would take 200 attorneys a full
year to review, at a cost of about $15 million.
In 2003, a similar-sized matter took 100 attorneys 3 weeks to
complete, at a cost of $6 million.
And in 2006, a 300 Gb investigation took 65 attorneys only 2.5
days to complete, at a cost of $2 million.
And now, cases with several hundreds of Gbs are routine.”
Improving Document Review in E-Discovery
FTI Consulting

Daniel Martin Katz
Michigan State University
Associate Professor of Law
@ computational
computationallegalstudies.com
reinventlaw.com
http://about.me/daniel.martin.katz

Contenu connexe

Tendances

Machine Learning techniques

Jigar Patel

Model evaluation in the land of deep learning

Pramit Choudhary

Emotion detection using cnn.pptx

RADO7900

Model Evaluation in the land of Deep Learning

Pramit Choudhary

Machine learning seminar ppt

RAHUL DANGWAL

Survey on Deep Neural Network Watermarking techniques

Princy Joy

Neural networks and deep learning

RADO7900

CAPTCHA(Completely Automated Public Turing test to Tell Computers and Humans Apart) can be used to protect data from auto bots. Countless kinds of CAPTCHAs are thus designed, while we most frequently utilize text-based scheme because of most convenience and user-friendly way [1]. Currently, various types of CAPTCHAs need corresponding segmentation to identify single character due to the numerous different segmentation ways. Our goal is to defeat the CAPTCHA,thus rstly the CAPTCHAs need to be split into character by character. There isn't a regular segmentation algorithm to obtain the divided characters in all kinds of examples, which means that we have to treat the segmentation individually. In this paper, we build a whole system todefeat the CAPTCHAs as well as achieve state-of-the-art performance.In detail, we present our self-adaptive algorithm to segment different kinds of characters optimally, and then utilize both the existing methods and our own constructed convolutional neural network as an extra classfier. Results are provided showing how our system work well towards defeating these CAPTCHAs.

AN OPTIMIZED SYSTEM TO SOLVE TEXT-BASED CAPTCHA

ijaia

Deep Semi-supervised Learning methods

Princy Joy

New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...

Albert Orriols-Puig

This presentation was created by Ajay, Machine Learning Scientist at MachinePulse, to present at a Meetup on Jan. 30, 2015. These slides provide an overview of widely used machine learning algorithms. The slides conclude with examples of real world applications. Ajay Ramaseshan, is a Machine Learning Scientist at MachinePulse. He holds a Bachelors degree in Computer Science from NITK, Suratkhal and a Master in Machine Learning and Data Mining from Aalto University School of Science, Finland. He has extensive experience in the machine learning domain and has dealt with various real world problems.

Machine Learning and Real-World Applications

MachinePulse

Lecture #1: Introduction to machine learning (ML)

butest

We develop a precise writing survey on sequence-to-sequence learning with neural network and its models. The primary aim of this report is to enhance the knowledge of the sequence-to-sequence neural network and to locate the best way to deal with executing it. Three models are mostly used in sequence-to-sequence neural network applications, namely: recurrent neural networks (RNN), connectionist temporal classification (CTC), and attention model. The evidence we adopted in conducting this survey included utilizing the examination inquiries or research questions to determine keywords, which were used to search for bits of peer-reviewed papers, articles, or books at scholastic directories. Through introductory hunts, 790 papers, and scholarly works were found, and with the assistance of choice criteria and PRISMA methodology, the number of papers reviewed decreased to 16. Every one of the 16 articles was categorized by their contribution to each examination question, and they were broken down. At last, the examination papers experienced a quality appraisal where the subsequent range was from 83.3% to 100%. The proposed systematic review enabled us to collect, evaluate, analyze, and explore different approaches of implementing sequence-to-sequence neural network models and pointed out the most common use in machine learning. We followed a methodology that shows the potential of applying these models to real-world applications.

A systematic review on sequence-to-sequence learning with neural network and ...

IJECEIAES

Applications in Machine Learning

Joel Graff

Applications of Machine Learning

Department of Computer Science, Aalto University

Artificial intelligence engineer course

Ibrahim Khleifat

A Friendly Introduction to Machine Learning

Haptik

Knowledgebase vs Database

CJ Jenkins

Tendances (18)

Machine Learning techniques

Model evaluation in the land of deep learning

Emotion detection using cnn.pptx

Model Evaluation in the land of Deep Learning

Machine learning seminar ppt

Survey on Deep Neural Network Watermarking techniques

Neural networks and deep learning

AN OPTIMIZED SYSTEM TO SOLVE TEXT-BASED CAPTCHA

Deep Semi-supervised Learning methods

New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...

Machine Learning and Real-World Applications

Lecture #1: Introduction to machine learning (ML)

A systematic review on sequence-to-sequence learning with neural network and ...

Applications in Machine Learning

Applications of Machine Learning

Artificial intelligence engineer course

A Friendly Introduction to Machine Learning

Knowledgebase vs Database

En vedette

NLP applied to French legal decisions

Michael BENESTY

Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)

Daniel Katz

Thoughts on Legal Prediction and Legal Metrics - Association of Corporate Cou...

Daniel Katz

Machine Learning as a Service: #MLaaS, Open Source and the Future of (Legal) ...

Daniel Katz

Artificial Intelligence and Law -  A Primer

Daniel Katz

{Law, Tech, Design, Delivery} Observations Regarding Innovation in the Legal ...

Daniel Katz

Foto cv

Carlos Esteban Budde

Sinks Method Paper Presentation @ Duke Political Networks Conference 2010

Daniel Katz

Technology, Data and Computation Session @ The World Bank - Law, Justice, and...

Daniel Katz

What is Computational Legal Studies? Presentation @ University of Houston - ...

Daniel Katz

Law ? Computation: The past, present, and future relationship In this talk, I will present the set of frames through which I view the relationship between law and computation: "law as computation," "computation on law," and "law and computation." By distinguishing these frames and understanding their context, I hope to increase clarity in our discussions, summarize current research, and suggest future avenues for both academic and commerical effort. This talk will include a number of original examples that highlight current possibilities at the forefront of law and computation. Mr. Bommarito is consultant, currently employed in the hedge fund industry, who specializes in collecting, processing, and analyzing information from financial, political, and legal systems. His publications range from graph theory to the Supreme Court to algorithmic trading, and can be found in Quantitative Finance, Physica A, and various law reviews. He holds three degrees from the University of Michigan, including an MSE in Financial Engineering. Outside of academia, Mr. Bommarito’s contributions include co-founding the Computational Legal Studies blog, maintenance of the World Treaty Index, and press coverage on Seeking Alpha, the Financial Times, the New York Times, Zero Hedge, Abnormal Returns, Marginal Revolution, and Wired Magazine.

Bommarito Presentation for University of Houston Computational Law Conference

mjbommar

Measure Twice, Cut Once - Solving the Legal Profession Biggest Challenges Tog...

Daniel Katz

Measuring the Complexity of the Law: The United States Code ( Slides by Danie...

Daniel Katz

Legal Analytics, Machine Learning and Some Comments on the Status of Innovat...

Daniel Katz

Quantitative Legal Prediction - Presentation @ Santa Clara Law - By Daniel Ma...

Daniel Katz

Innovation in the Legal Services Industry - "The Future is Already Here, It i...

Daniel Katz

Legal Analytics - Introduction to the Course - Professor Daniel Martin Katz +...

Daniel Katz

The Three Forms of (Legal) Prediction: Experts, Crowds and Algorithms -- Prof...

Daniel Katz

Law + Complexity & Prediction: Toward a Characterization of Legal Systems as ...

Daniel Katz

Top Fears of Lawyers [Infographic]

MyCase Legal Case and Practice Management Software

En vedette (20)