Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Legal Informatics Research Today: Implications for Legal Prediction, 3D Printing, & eDiscovery
1. Legal Informatics
Research Today:
Implications for Legal
Prediction, 3D Printing &
eDiscovery
Robert Richards
Penn State University
CICL 2013: Conference on Innovation and Communications
Law
2. Agenda
Legal Informatics:
Overview
eDiscovery:
Methods, Recent Research
3D Printing:
How legal tech could apply
Legal Prediction
Methods, Recent Research
3. Legal Informatics: Definition
Legal informatics is:
(1) the study of legal information /
communication systems
(2) the application of ICT
(information / communication
technology) to legal information
5. What is legal information?
Structured data that express:
1. Legal Rules
2. Information about Legal Rules
(1st, 2nd, 3rd, etc. order legal metadata)
3. Evidence
Non-legal data used to support an
assertion about a legal rule
6. What is a legal information /
communication system?
A set of interrelated entities that
receive, process, or output legal
information
Examples:
A law office time/billing system
A database of court decisions
A statistical model predicting a legal
outcome
7. Legal Informatics Viewpoint:
4 Levels
In a domain
Addressing an application area
From one or more sub-
disciplines, by
Employing one or more
methodologies
9. Legal Informatics: Application Areas
Litigation
Compliance
Planning
Interviewing/
Counseling
Negotiation
Education
Governance /
Policy making
10. Legal Informatics: Sub-Disciplines
Artificial Intelligence
Information Retrieval
Text Processing / NLP
Metadata/ Knowledge
Representation
Databases / Storage
Linguistics /
Communication
Human-Computer
Interaction / Information
Behavior
Management /
Sociology of Info
11. Legal Informatics: Methodologies
Prototyping
Statistics /
Probability
Experimentation
Network Analysis
Survey Research
Case Study
Cost-Benefit
Analysis
Ethnography
Interviewing
Doctrinal Analysis
12. Example
Much eDiscovery research
involves…
Law Practice (Domain)
Litigation / Evidence (Application Area)
Information retrieval + text analysis +
knowledge representation /metadata +
management (Sub-Disciplines)
Prototyping + experimentation + statistical
analysis + cost-benefit analysis
(Methodologies)
13. 4-Level Approach Reveals Relationships Between
(Apparently) Dissimilar Research Activities
Scherer, S., Wimmer, M. A., &
Markisic, S. (2013). Bridging
narrative scenario texts and
formal policy modeling through
conceptual policy modeling.
Artificial Intelligence and Law.
doi:10.1007/s10506-013-9142-2
14. Scherer et al. (2013)
ICT
Citizen’s Legal
Narrative Doctrine/Rule
15. Scherer et al.: Public Policy Domain
Methodologies:
Prototyping + Case study
Sub-Disciplines:
Artificial intelligence + Linguistics + Text
Analysis + Knowledge Representation
Application area:
Translating non-legal language to legal
concepts
Domain:
Public policy (e-Participation)
16. Scherer et al.: Law Practice Domain
Methodologies:
Prototyping + Case study
Sub-Disciplines:
Artificial intelligence + Linguistics + Text
Analysis + Knowledge Representation
Application area:
Translating non-legal language to legal
concepts
Domain:
Law practice (Counseling, Interviewing)
22. Cost Motivation
Big Data prohibitive costs of
traditional relevance- and privilege-
review
With data sets of > 106
objects linear
manual review and privilege review
become unsustainably expensive
26. TREC & EDI: Key Findings
Initial Search & Second-Step Relevance
Feedback:
Automated relevance ranking > Boolean query
in re: recall
Interactive Evaluation:
Technology-Assisted Review > Manual
Review
in re: overall results + precision
High Precision + High Recall are possible with
certain topics
27. TREC Key Findings (cont’d)
Predictive coding produced high recall
But most machine learning systems could not correctly
choose correct sample size to maximize precision and
recall.
Machine learning systems that yielded highly
relevant results also yielded highly material
docs
Privilege Review Remains a Key Cost Driver &
Is Under-Automated (Pace & Zakaras, 2012)
Automated privilege review yielded high recall in one
study (but method was not disclosed)
28. eDiscovery: Measurement Error
Low rates of inter-assessor agreement
Found in TREC & EDI studies
Cooperation between parties on evaluation in
tech-assisted review likely to lower measurement
error
This is an emerging best practice (see, e.g., Da
Silva Moore)
29. eDiscovery: Recent Emphases (Baron, 2011)
Process Quality Standards & Best
Practices
Metrics & certification (DESI IV, 2011)
Cooperation between Parties
Sedona Conference (2009)
Improved Search, including Predictive
Coding
DESI V, 2013
Results of TREC & EDI research
Courts are implementing all of these
30. eDiscovery: Recent Emphases:
Sub-Disciplines
Process Quality Standards & Best
Practices
Management
Cooperation between Parties
Management, Information
Retrieval, Knowledge Representation
Improved Search, including Predictive
Coding
Information Retrieval, Text
Analysis, Knowledge
32. Predictive Coding: Diverse Methods
Support Vector
Machines
Latent Semantic
Analysis
Naïve Bayesian
Classifiers
Decision Trees
Neural Networks
Association Rule
Learning
Rule Induction
Genetic Algorithms
33. Predictive Coding: Courts Reading, Citing, &
Applying Legal Informatics Research
Da Silva Moore v. Publicis Groupe
EORHB v. HOA Holdings
Global Aerospace Inc. v. Landow
Aviation
Kleen Products v. Packaging Corp. of
America
34. eDiscovery: Future Research Directions
Evaluation Standards & Certification
Threshold point estimates
Relevance threshold
Sample size threshold
Confidence level, confidence intervals
Typology of Production Requests
Electronic Discovery Institute plans 2nd
study on real e-discovery materials
testing TREC conclusions, with higher ecological validity
35. eDiscovery: Future Research Directions
(cont’d)
Measurement Error:
Modeling it & correcting for it
Designing re-usable test collections
Automated privilege review
Identifying effective methods
Designing test collections to evaluate those methods
36. eDiscovery: Future Research Directions
(cont’d)
Evaluating de-duplication methods
Improved privacy measures to enable
experiments on real-life data sets
Apply other sub-disciplines, including
Information behavior
Diversify methods, including social
network analysis
More research on Early Case
Assessment
37. 3D Printing
Definition
Expected Effects
Lawyers’ Value-Add
Short-Term Application of Legal
Technology
Long-Term Application of Legal Technology
38. 3D Printing: Definition
The generation of physical objects
from computer models, by a layering
process
Also called Additive Manufacturing
(Gibson, Rosen, & Stucker, 2010)
39. 3D Printing: Some Expected Effects
Democratizing manufacturing
More inventors
More innovation
More infringement
More demand for legal compliance
services
More demand for patent legal
40. Patent Lawyers’ Value-Add for
Entrepreneurs / New Inventors
Patent Search
Claim Interpretation
Currency of Information
Customization of Information to
Client’s Circumstances
Strategic Advice (Law + Business)
41. How Might Legal Informatics Affect 3D
Printing?
Legal Informatics is likely to interact
with 3D Printing in two ways:
Short-Term: Unbundling of patent
legal services (Mosten, 1994)
Long-Term: Automated patent
search & Modeling of claim
interpretations incorporated into
CAD software
42. Unbundling of Patent Legal Services
Selling (outdated) patent search
results
Selling (outdated) memoranda
containing claim interpretations
Offering (remotely) updated &
customized search results and
counseling for an extra fee
43. Patent Legal Services Unbundling: 4-Levels
Domain:
Business
Application Areas:
Compliance, Counseling
Sub-Disciplines:
Management, Information Retrieval, Knowledge
Representation
Methodologies:
Prototyping, Case Studies, Doctrinal
Analysis, Cost-Benefit Analysis
44. Automated patent search & modeling of claim
interpretations (Hulicki, 2013; Mulligan & Lee, 2012)
User inputs simulation/design/image
of invention
CAD software analyzes
input, determines domain & patent
search parameters
CAD Software executes patent
search, retrieves relevant patents in
force
CAD software analyzes claims of
45. Automated patent search & modeling of claim
interpretations (cont’d)
CAD Software translates claims into
simulation parameters
For each simulation model, CAD software
calculates probability of liability for patent
infringement & possible exposure
Output displays liability probability +
potential exposure
Lawyer offers (remote) legal counseling for
an extra fee
46. Automated Patent Search & Modeling of
Claim Interpretations: 4-Levels
Domain:
Business
Application Areas:
Compliance, Counseling
Sub-Disciplines:
Artificial Intelligence, Information
Retrieval, Knowledge Representation, Human-
Computer Interaction
Methodologies:
Prototyping, Statistical Modeling, Case
Studies, Experimentation, Ethnography, Intervie
wing
47. Implications of Both Scenarios
More small-scale
inventors/entrepreneurs will have
access to legal compliance
information at an affordable price
Clients can choose to pay more for
higher levels of service
Reform of legal ethics rules may be
required to implement either scenario
48. Legal Prediction
Definition
4-Level View
Temporal Dimensions
Research Results
Possible Effects
Future Research Directions
49. Legal Prediction: Definitions
(1) Methods for calculating the
probability of the occurrence or non-
occurrence of law-related events or
circumstances at a point in time, on
the basis of data acquired at an
earlier point in time
(2) Methods for inferring law-related
attributes of a population from a
sample
50. Legal Prediction: Application Areas
Case Outcome / Litigation
Management
(Blackman et al., 2012; Ruger et
al., 2004; Ribstein, 2012)
Imputing Default Terms in Contracts &
Wills
(Porat & Strahilevitz, 2013)
Legislative Bill Passage
(Tauberer, 2012; Yano et al., 2012)
55. Legal Prediction:
Model vs. Crowdsourcing
Blackman’s FantasySCOTUS vs. Martin, Ruger
et al.
Complementary approaches
56. Legal Prediction:
Three Temporal Dimensions
Synchronic:
Inference from sample to parameters of a static population
Predictive coding, machine learning
Used to collect data set for model
Diachronic Future:
Inference from sample at t to observations at t + 1, where t +
1 is later than today
Forward prediction (Katz)
Often performed on the data set gathered using Synchronic
prediction
Diachronic Past:
Retrospective prediction
Inference from sample at t to observations at t + 1, where t +
1 is earlier than today
57. Legal Prediction:
Some Research Results
Decision Tree > Domain Experts (Ruger et al.)
Crowdsourcing > Domain Experts (Blackman et
al.)
Crowdsourcing = Decision Tree (Blackman et al.)
Stochastic Block Models > case-content based
algorithms (Guimerà & Sales-Pardo)
Stochastic Block Models > Domain Experts
(Guimerà & Sales-Pardo)
58. Legal Prediction: Possible Effects
Lawyer disintermediation (Katz, 2013;
Ribstein, 2012)
Client empowerment (Ribstein, 2012)
Reduction in legal costs (Katz, 2013;
Ribstein, 2012)
Within businesses, distribution of legal
tasks to non-legal personnel
(Ribstein, 2012)
59. Legal Prediction: Future Research
Directions
Analogical reasoning: development of
improved models (Katz)
Crowdsourced prediction markets for
lower-level courts (Blackman et al.)
Automated prediction engines for
lower-level courts (Blackman et al.)
60. References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. (1996). Fast
discovery of association rules. Advances in Knowledge Discovery and Data
Mining, 12:307–328.
Ashley, K. D., & Brüninghaus, S. (2009). Automatically classifying case texts and
predicting outcomes. Artificial Intelligence and Law, 17, 125-165. doi:10.1007/s10506-
009-9077-9
Ashley, K. D., & Bridewell, W. (2010). Emerging AI & Law approaches to automating
analysis and retrieval of electronically stored information in discovery proceedings.
Artificial Intelligence and Law, 18, 311-320. doi:10.1007/s10506-010-9098-4
Barnett, T., Godjevac, S., Renders, J.-M., Privault, C., Schneider, J., & Wickstrom, R.
(2009, June). Machine learning classification for document review. Paper presented at
the DESI III Global E-Discovery/E-Disclosure Workshop: A Pre-Conference Workshop
at the twelfth International Conference on Artificial Intelligence and Law, ICAIL
2009, Barcelona, Spain.
Baron, J. (2011). Law in the age of exabytes: Some further thoughts on ‘information
inflation’ and current issues in e-discovery search. Richmond Journal of Law and
Technology, 17(3), Article 9. Retrieved from http://jolt.richmond.edu/v17i3/article9.pdf
Blackman, J., Aft, A., & Carpenter, C. (2012). FantasySCOTUS: Crowdsourcing a
prediction market for the Supreme Court. Northwestern Journal of Technology and
Intellectual Property, 10(3), Article 3. Retrieved from
http://scholarlycommons.law.northwestern.edu/njtip/vol10/iss3/3
Cohen, W. W. (1995). Fast effective rule induction. In Machine learning: Proceedings
of the twelfth international conference, ML95.
61. References (cont’d)
Conrad, J. (2010). E-discovery revisited: the need for artificial intelligence beyond information
retrieval. Artificial Intelligence and Law, 18, 321-345. doi:10.1007/s10506-010-9096-6
Cormack, G. V., & Grossman, M. R., Hedin, B., & Oard, D. W. (2011). Overview of the TREC 2010
legal track. In The Nineteenth Text Retrieval Conference (TREC 2010) Proceedings. N.p.: NIST.
Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y, 2012).
DESI IV (2011). [Call for papers:] ICAIL 2011 workshop on setting standards for searching
electronically stored information in discovery proceedings (DESI IV Workshop), June
6, 2011, University of Pittsburgh, Pittsburgh, PA.
DESI V (2013). [Call for papers:] ICAIL 2013 workshop on standards for using predictive
coding, machine learning, and other advanced search and review methods in e-discovery (DESI V
workshop), June 14, 2013, Consiglio Nazionale delle Ricerche, Rome, Italy.
Dimmock, S. G., & Gerken, W. C. (2012). Predicting fraud by investment managers. Journal of
Financial Economics, 105, 153-173. doi:10.1016/j.jfineco.2012.01.002
EORHB, Inc. v. HOA Holdings LLC, Civ. Ac. No. 7409-VCL (Del. Ch. Oct. 15, 2012).
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Machine Learning, 29, 131-163.
Gibson, I., Rosen, D. W., & Stucker, B. (2010). Additive manufacturing technologies: Rapid
prototyping to direct digital manufacturing. New York: Springer
Global Aerospace, Inc., v. Landow Aviation, L.P., No. CL 61040 (Va. Cir., Apr. 23, 2012).
Grossman, M. R., & Cormack, G. V. (2011). Technology-assisted review in e-discovery can be
more effective and more efficient than exhaustive manual review. Richmond Journal of Law and
Technology, 17(3), Article 11. Retrieved from http://jolt.richmond.edu/v17i3/article11.pdf
Grossman, M. R., Cormack, G. V., Hedin, B., & Oard, D. W. (2011). Overview of the TREC 2011
legal track. In The Twentieth Text Retrieval Conference (TREC 2011) Proceedings. N.p.: NIST.
62. References (cont’d)
Guimerà, R., & Sales-Pardo, M. (2011). Justice blocks and predictability of U.S. Supreme Court
votes. PLOS ONE, 6(11), e27188. doi:10.1371/journal.pone.0027188
Hulicki, M. (2013, May). Recent judgments of the highest court as a step towards objectification of
patentability. Paper presented at CICL 2013: Conference on Innovation and Communication
Law, Glen Arbor, MI.
In re Actos (Pioglitazone) Products, No. 6:11-md-2299 (M.D. La., July 27, 2012).
Joachims, T. (1998). Text categorization with support vector machines: Learning with many
relevant features. In C. Nédellec & C. Rouveiro (Eds.), Proceedings of the 10th European
Conference on Machine Learning (pp. 137–142).
Katz, D. M. (2013). Quantitative legal prediction—Or—How I learned to stop worrying and start
preparing for the data-driven future of the legal service industry. Emory Law Journal, 62, 101-158.
Kleen Prods. LLC v. Packaging Corp. of Am., No. 10 C 5711 (N.D. Ill., Sept. 28, 2012).
LexMachina. (n.d.). About, technology. Retrieved from https://lexmachina.com/about/
Martin, A. D., & Quinn, K. M. (2002). Dynamic ideal point estimation via Markov chain Monte Carlo
for the U.S. Supreme Court, 1953–1999. Political Analysis, 10, 134-153. doi:10.1093/pan/10.2.134
McShane, B. B., Watson, O. P., Baker, T., & Griffith, S. J. (2012). Predicting securities fraud
settlements and amounts: A hierarchical Bayesian model of federal securities class action lawsuits.
Journal of Empirical Legal Studies, 9, 482-510. doi:10.1111/j.1740-1461.2012.01260.x
Mosten, F. S. (1994). Unbundling of legal services and the family lawyer. Family Law
Quarterly, 28, 421-449.
Mulligan, C., & Lee, T. B. (forthcoming). Scaling the patent system. N.Y.U. Annual Survey of
American Law. Retrieved from http://www.ssrn.com/abstract=2016968
Oard, D. W., Baron, J. R., Hedin, B., Lewis, D. D., & Tomlinson, S. (2010). Evaluation of
information retrieval for e-discovery. Artificial Intelligence and Law, 18, 347-386.
doi:10.1007/s10506-010-9093-9
63. References (cont’d)
Oard, D. W., & Webber, W. (2013). Information retrieval for e-discovery.
Foundations and Trends in Information Retrieval, 7, 1-141. Retrieved from
http://ediscovery.umiacs.umd.edu/pub/ow12fntir.pdf
Pace, N. M., & Zakaras, L. (2012). Where the money goes: Understanding
litigant expenditures for producing electronic discovery. Santa Monica, CA:
Rand Institute for Civil Justice.
Porat, A., & Strahilevitz, L. J. (2013). Personalizing default rules and
disclosure with big data (University of Chicago Coase-Sandor Institute for
Law and Economics working paper no. 634, 2nd series). Retrieved from
http://www.law.uchicago.edu/Lawecon/index.html
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81-
106.
Ribstein, L. (2012). Delawyering the corporation. Wisconsin Law
Review, 2012, 305-332.
Richards, R. (2009, June). What is legal information? Paper presented at the
Conference on Legal Information: Scholarship and Teaching, at the
University of Colorado School of Law, Boulder, CO. Retrieved from
http://legalinformatics.wordpress.com/2009/05/31/what-is-legal-information-
conference-paper/
64. References (cont’d)
Roitblat, H. L., Kershaw, A., & Oot, P. (2010). Document categorization in legal
electronic discovery: Computer classification vs. manual review. Journal of the
American Society for Information Science and Technology, 61, 70-80.
doi/10.1002/asi.21233
Ruger, T. W., Kim, P. T., Martin, A. D., Quinn, K. M. (2004). The Supreme Court
forecasting project: Legal and political science approaches to predicting Supreme
Court decisionmaking. Columbia Law Review, 104, 1150-1210.
Scherer, S., Wimmer, M. A., & Markisic, S. (2013). Bridging narrative scenario texts
and formal policy modeling through conceptual policy modeling. Artificial Intelligence
and Law. doi:10.1007/s10506-013-9142-2
The Sedona Conference. (2009). Commentary on achieving quality in e-discovery. N.
p.: The Sedona Conference.
Tauberer, J. (2012, December 7). Bill prognosis gets a few improvements. GovTrack
Blog [web log post]. Retrieved from http://www.govtrack.us/blog/2012/12/007/bill-
prognosis-gets-a-few-improvements
Webber, W. (2011, July). Re-examining the effectiveness of manual review. Paper
presented at SIGIR 2011 Information Retrieval for E-Discovery (SIRE)
Workshop, Beijing, China.
Yano, T., Smith, N. A., & Wilkerson, J. D. (2012, October). Textual predictors of bill
survival in congressional committees. Paper presented at New Directions in Analyzing
Text as Data 2012, Harvard University, Cambridge, MA. Retrieved from
http://projects.iq.harvard.edu/ptr/files/yanosmithwilkersonbillsurvival.pdf