SlideShare une entreprise Scribd logo
1  sur  35
I, Robot, Esquire 
Information Extraction and Summarization in Legal Documents 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
Jacob Mundt – MLConf ATL
Who we are 
Commercializing machine learning technology developed at 
Columbia University to make legal document review more 
efficient, accurate and cost effective. 
One of four 
national winners in 
Startup America 
DEMO Competition 
One of CIO.com’s 
top ten enterprise 
products at DEMO 
Fall 2012 
Most Promising 
Software Product 
of the Year award 
from Connecticut 
Technology 
Council 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
Completed 
Connecticut 
Innovations’ 
TechStart Fund 
Program 
2
Management Team 
Large law firm experience; 
tech startup experience; 
sales & business 
development experience 
Harvard Law 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
Led R&D team at tech 
company extracting data 
in medical industry 
Columbia Masters; NLP 
researcher 
Founder of Ivy Link (20+ 
staff); Chief of Staff of 
350-person real estate 
private equity firm 
Harvard Law; law firm & 
in-house experience 
Ned Gannon 
CEO 
Adam Nguyen 
COO 
Jake Mundt 
CTO 
3
The Future of Law 
“In contrast, in looking 25 years ahead from 
now, I argue that it would be absurd to expect 
lawyers and courts to carry on operating as 
they do now.” 
—Richard Susskind, Tomorrow’s lawyers 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
4 
“Well, if droids could think, there'd 
be none of us here, would there?” 
— Obi-Wan Kenobi
I, Robot, Esquire - Overview 
Motivation 
Can we use ML and NLP? 
eBrevia Solution – Deep Dive 
Challenges and Lessons Learned 
Future directions 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
5
Corporate Mergers and Due Diligence 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
6 
Business due 
diligence 
Legal due 
diligence 
Closing
Corporate mergers and due diligence 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
7
Legal Due Diligence Process 
Extract Summarize Analyze Advise 
Teams of junior 
attorneys billed out at 
$300-$500/hour poring 
over hundreds of 
contracts in virtual data 
rooms to summarize 
their content and 
identify red flags. 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
8
Legal Due Diligence Summary 
Here come the 
spreadsheets – 
summarize ALL the 
contracts: 
– leases 
– executive 
employment 
agreements 
– supplier agreements 
– Loan/credit 
agreements 
Extract key data 
points 
Also extract any 
clauses that discuss 
particular provisions 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
9
The Stone Age 
On site data 
room with reams 
of documents, 
organized by 
seller 
Buyer’s agents 
travel to evaluate 
the target, under 
constant 
supervision 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
10
State of the Art – Virtual Data Rooms 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
11 
Digitized, but not 
machine readable 
Some simple OCR and 
searching capability 
Commercial systems 
like IntraLinks have 
advanced capabilities, 
but mostly focused on 
security and 
auditability.
The Future is Here 
Misses stems, synonyms, plural forms 
False positives—some common words also have 
special meanings in context. 
Impossible to find dates, parties, dollar amounts, 
or any other generic quantities 
We can do better 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
12
I, Robot, Esquire - Overview 
Motivation 
Can we use ML and NLP? 
eBrevia Solution – Deep Dive 
Challenges and Lessons Learned 
Future directions 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
13
Can we use ML and NLP? 
Actually many sub-problems: 
Classify entire document type— 
discover contracts amongst 
heterogeneous corpus 
Duplicate detection 
Group documents that were based 
on a common form agreement 
Automatically flagging 
questionable docs for further 
review 
Automatic provision extraction 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
14
Why this is Easy 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
15 
Precise, formal 
writing 
Extremely 
structured 
Lots of clause 
reuse
Why this is Hard 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
16 
Precise, formal 
writing 
Extremely 
structured 
Lots of clause 
reuse 
Obfuscation 
High demands 
on recall 
Deep chains of 
defined term 
references
Detecting “Evil” Clauses? 
Lawyers actually prefer 
to make the calls on 
exactly what to 
include, and how to 
advise the client 
Just find the source 
material, and let the 
lawyer decide. 
Determine relevance, 
don’t make value 
judgments 
“Learning to detect 
spyware using end user 
license agreements”, 
Lavesson, et al. (2009) 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
17 
Illustration of Saint Wolfgang and the Devil with the Devil's 
Contract, by Michael Pacher.
I, Robot, Esquire - Overview 
Motivation 
Can we use ML and NLP? 
eBrevia Solution – Deep Dive 
Challenges and Lessons Learned 
Future directions 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
18
eBrevia’s Approach 
Not all provisions are the same! 
Topic modeling 
Information Extraction (IE) 
Rule based approach 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
19 
• Find sentences discussing “change of control” 
• Find restrictions concerning confidential information 
• The contract runs from TIMEX to TIMEX. 
• The monthly rent will start at $X, and increase by no more than 
Y% annually. 
• Find every borrower’s FICO score
Text analysis pipeline 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
20 
OCR 
Sentence 
Segmentation 
NLP Processing 
(POS, NER, 
Parsing) 
Document 
Structure tagging 
General 
Candidate 
detection 
Rule Based 
detection 
Topic classifier 
Candidate 
detection for IE 
Information 
Extraction and 
slot filling
Classifier Features 
Basic textual analysis feature 
– words 
– n-grams 
– positional and morphological 
features. 
– Named entities 
Syntactic features 
– Parts of speech 
– Parse tree and heads 
Structural features 
– First level classifier pass for 
determining document structure 
– Especially important on scanned 
documents where these features 
aren’t readily available 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
21 
The/O buyer/O 
Acme/ORG Inc./ORG 
indemnify 
indemnify 
Client shall indemnify 
N V V 
Section III: Miscellaneous 
1. Lorem ipsum dolor 
a. sit amet, 
consectetur
Hunting for Training Data 
All your customer’s data 
is confidential 
– Redacted contracts 
– Mine the SEC 
Expense of lawyer-labeled 
training data 
– Bootstrapping 
– Co-training with different 
feature sets 
– Active learning 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
22
Hacks and Special Cases 
Very useful, but boring 
Formatting fixes specific to legal documents 
– ALL CAPS 
– Handling of amendments 
– Handwritten signature blocks 
Hand crafted rules very good for high-precision 
heuristics—customers expect the 
software not to miss “easy” provisions. 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
23
I, Robot, Esquire - Overview 
Motivation 
Can we use ML and NLP? 
eBrevia Solution – Deep Dive 
Challenges and Lessons Learned 
Future directions 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
24
The Audacity of Keywords 
Seemingly-reliable keywords, aren’t 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
25 
Phrase Likelihood that 
candidate phrase is 
relevant 
Likelihood candidate 
phrase is irrelevant 
“Change [of|in] Control” 48.4% 51.6% 
“13(d) and 14(d)” 98.7% 1.3% 
A simple keyword based search with an obvious 
keyword wouldn’t even get us to 50% precision! 
Conversely, a human would have never 
discovered this reliable trigram heuristic.
The Tyranny of Paper 
Lawyers still have a lot 
of paper – over 50% of 
the documents uploaded 
to our system are scans. 
OCR on poor quality 
scans works poorly for 
keyword searching but 
decently with ML, with 
properly constructed 
features. 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
26
Welcoming our Robot Lawyer Overlords 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
27 
“[eBrevia’s software] cuts down significantly on time by 
performing 50-60% of the work up front and then you 
work from there.” 
– NY law firm partner 
“Your product is a great fit for our firm’s approach to 
practicing law.” 
– Partner, national law firm
User Interface Notes 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
28
User Interface Notes 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
29 
Highlight in original, formatted document 
Cross-referencing, editing, 
and corrections
User Interface Notes 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
30 
Additional critical features 
Quick Correction 
Level of confidence indications (similar to 
Google voice transcription) 
Good generic text search features to make 
human review easy
I, Robot, Esquire - Overview 
Motivation 
Can we use ML and NLP? 
eBrevia Solution – Deep Dive 
Challenges and Lessons Learned 
Future directions 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
31
Current Research and Future Directions 
Coreference resolution: intra- and inter 
document. Useful for doc references, and 
entity references. 
Machine learning for document cross-referencing 
and definition resolution. 
Automatic summarization of longer 
provisions to provide quick overviews. 
Understanding the lineage of a document – 
where its various pieces came from, and 
how they were changed. 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
32
Feedback Learning from Lawyers 
Some lawyers 
are just bad 
Noise is NOT 
random 
– They fall for 
the same 
“trap” 
– They’re often 
bad in the 
same way 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
33 
So can’t use noise-tolerant learning algorithms to 
deal with this. 
Consensus models, model user reputation/ability
Current Research and Future Directions 
Other upcoming applications for eBrevia’s 
technology: 
Contract management 
Document drafting 
Lease abstraction 
Financial/Compliance 
Consumer applications 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
34
Thank You – Contact Info 
jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 
35

Contenu connexe

Similaire à Jacob Mundt – Chief Technology Officer, eBrevia at MLconf ATL

Arriys Infosoft Consulting Services
Arriys Infosoft Consulting ServicesArriys Infosoft Consulting Services
Arriys Infosoft Consulting Services
itrecruiter_sandy
 
nTech Capability Statement
nTech Capability StatementnTech Capability Statement
nTech Capability Statement
Jimmy Iannuzzi
 

Similaire à Jacob Mundt – Chief Technology Officer, eBrevia at MLconf ATL (20)

STL LItigation Services
STL LItigation ServicesSTL LItigation Services
STL LItigation Services
 
Knowledge Graphs and Generative AI
Knowledge Graphs and Generative AIKnowledge Graphs and Generative AI
Knowledge Graphs and Generative AI
 
workforce analytics using Data Science
workforce analytics using Data Scienceworkforce analytics using Data Science
workforce analytics using Data Science
 
Sumyag profile deck
Sumyag profile deck Sumyag profile deck
Sumyag profile deck
 
Cleared Job Fair Job Seeker Handbook March 4, 2014, BWI, Md
Cleared Job Fair Job Seeker Handbook March 4, 2014, BWI, MdCleared Job Fair Job Seeker Handbook March 4, 2014, BWI, Md
Cleared Job Fair Job Seeker Handbook March 4, 2014, BWI, Md
 
Arriys Infosoft Consulting Services
Arriys Infosoft Consulting ServicesArriys Infosoft Consulting Services
Arriys Infosoft Consulting Services
 
Alexandria ACM Student Chapter | Specification & Verification of Data-Centric...
Alexandria ACM Student Chapter | Specification & Verification of Data-Centric...Alexandria ACM Student Chapter | Specification & Verification of Data-Centric...
Alexandria ACM Student Chapter | Specification & Verification of Data-Centric...
 
Cleared Job Fair Job Seeker Handbook March 7, BWI, MD
Cleared Job Fair Job Seeker Handbook March 7, BWI, MDCleared Job Fair Job Seeker Handbook March 7, BWI, MD
Cleared Job Fair Job Seeker Handbook March 7, BWI, MD
 
How AI and Machine Learning can Transform Organizations
How AI and Machine Learning can Transform OrganizationsHow AI and Machine Learning can Transform Organizations
How AI and Machine Learning can Transform Organizations
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
Resume Vaibhav Patwardhan Updt
Resume Vaibhav Patwardhan UpdtResume Vaibhav Patwardhan Updt
Resume Vaibhav Patwardhan Updt
 
nTech Capability Statement
nTech Capability StatementnTech Capability Statement
nTech Capability Statement
 
Cleared Job Fair Job Seeker Handbook April 3, 2014, Springfield, Va
Cleared Job Fair Job Seeker Handbook April 3, 2014, Springfield, VaCleared Job Fair Job Seeker Handbook April 3, 2014, Springfield, Va
Cleared Job Fair Job Seeker Handbook April 3, 2014, Springfield, Va
 
Business without blind spots
Business without blind spotsBusiness without blind spots
Business without blind spots
 
Age of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide DiscoveryAge of Exploration: How to Achieve Enterprise-Wide Discovery
Age of Exploration: How to Achieve Enterprise-Wide Discovery
 
Crafting Your Oracle License, Contract and Vendor Management Strategy
Crafting Your Oracle License, Contract and Vendor Management StrategyCrafting Your Oracle License, Contract and Vendor Management Strategy
Crafting Your Oracle License, Contract and Vendor Management Strategy
 
Cleared Job Fair Job Seeker Handbook Nov 20, 2014, Crystal City, VA
Cleared Job Fair Job Seeker Handbook Nov 20, 2014, Crystal City, VACleared Job Fair Job Seeker Handbook Nov 20, 2014, Crystal City, VA
Cleared Job Fair Job Seeker Handbook Nov 20, 2014, Crystal City, VA
 
Cyber Maryland Job Fair Job Seeker Handbook Oct 11, 2017, Baltimore, Maryland
Cyber Maryland Job Fair Job Seeker Handbook Oct 11, 2017, Baltimore, MarylandCyber Maryland Job Fair Job Seeker Handbook Oct 11, 2017, Baltimore, Maryland
Cyber Maryland Job Fair Job Seeker Handbook Oct 11, 2017, Baltimore, Maryland
 
Contract Recruiting & Consulting
Contract Recruiting & ConsultingContract Recruiting & Consulting
Contract Recruiting & Consulting
 
Marlabs Spring 2011 Job Opportunities
Marlabs Spring 2011 Job OpportunitiesMarlabs Spring 2011 Job Opportunities
Marlabs Spring 2011 Job Opportunities
 

Plus de MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

Plus de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Dernier

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Jacob Mundt – Chief Technology Officer, eBrevia at MLconf ATL

  • 1. I, Robot, Esquire Information Extraction and Summarization in Legal Documents jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential Jacob Mundt – MLConf ATL
  • 2. Who we are Commercializing machine learning technology developed at Columbia University to make legal document review more efficient, accurate and cost effective. One of four national winners in Startup America DEMO Competition One of CIO.com’s top ten enterprise products at DEMO Fall 2012 Most Promising Software Product of the Year award from Connecticut Technology Council jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential Completed Connecticut Innovations’ TechStart Fund Program 2
  • 3. Management Team Large law firm experience; tech startup experience; sales & business development experience Harvard Law jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential Led R&D team at tech company extracting data in medical industry Columbia Masters; NLP researcher Founder of Ivy Link (20+ staff); Chief of Staff of 350-person real estate private equity firm Harvard Law; law firm & in-house experience Ned Gannon CEO Adam Nguyen COO Jake Mundt CTO 3
  • 4. The Future of Law “In contrast, in looking 25 years ahead from now, I argue that it would be absurd to expect lawyers and courts to carry on operating as they do now.” —Richard Susskind, Tomorrow’s lawyers jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 4 “Well, if droids could think, there'd be none of us here, would there?” — Obi-Wan Kenobi
  • 5. I, Robot, Esquire - Overview Motivation Can we use ML and NLP? eBrevia Solution – Deep Dive Challenges and Lessons Learned Future directions jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 5
  • 6. Corporate Mergers and Due Diligence jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 6 Business due diligence Legal due diligence Closing
  • 7. Corporate mergers and due diligence jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 7
  • 8. Legal Due Diligence Process Extract Summarize Analyze Advise Teams of junior attorneys billed out at $300-$500/hour poring over hundreds of contracts in virtual data rooms to summarize their content and identify red flags. jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 8
  • 9. Legal Due Diligence Summary Here come the spreadsheets – summarize ALL the contracts: – leases – executive employment agreements – supplier agreements – Loan/credit agreements Extract key data points Also extract any clauses that discuss particular provisions jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 9
  • 10. The Stone Age On site data room with reams of documents, organized by seller Buyer’s agents travel to evaluate the target, under constant supervision jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 10
  • 11. State of the Art – Virtual Data Rooms jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 11 Digitized, but not machine readable Some simple OCR and searching capability Commercial systems like IntraLinks have advanced capabilities, but mostly focused on security and auditability.
  • 12. The Future is Here Misses stems, synonyms, plural forms False positives—some common words also have special meanings in context. Impossible to find dates, parties, dollar amounts, or any other generic quantities We can do better jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 12
  • 13. I, Robot, Esquire - Overview Motivation Can we use ML and NLP? eBrevia Solution – Deep Dive Challenges and Lessons Learned Future directions jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 13
  • 14. Can we use ML and NLP? Actually many sub-problems: Classify entire document type— discover contracts amongst heterogeneous corpus Duplicate detection Group documents that were based on a common form agreement Automatically flagging questionable docs for further review Automatic provision extraction jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 14
  • 15. Why this is Easy jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 15 Precise, formal writing Extremely structured Lots of clause reuse
  • 16. Why this is Hard jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 16 Precise, formal writing Extremely structured Lots of clause reuse Obfuscation High demands on recall Deep chains of defined term references
  • 17. Detecting “Evil” Clauses? Lawyers actually prefer to make the calls on exactly what to include, and how to advise the client Just find the source material, and let the lawyer decide. Determine relevance, don’t make value judgments “Learning to detect spyware using end user license agreements”, Lavesson, et al. (2009) jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 17 Illustration of Saint Wolfgang and the Devil with the Devil's Contract, by Michael Pacher.
  • 18. I, Robot, Esquire - Overview Motivation Can we use ML and NLP? eBrevia Solution – Deep Dive Challenges and Lessons Learned Future directions jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 18
  • 19. eBrevia’s Approach Not all provisions are the same! Topic modeling Information Extraction (IE) Rule based approach jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 19 • Find sentences discussing “change of control” • Find restrictions concerning confidential information • The contract runs from TIMEX to TIMEX. • The monthly rent will start at $X, and increase by no more than Y% annually. • Find every borrower’s FICO score
  • 20. Text analysis pipeline jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 20 OCR Sentence Segmentation NLP Processing (POS, NER, Parsing) Document Structure tagging General Candidate detection Rule Based detection Topic classifier Candidate detection for IE Information Extraction and slot filling
  • 21. Classifier Features Basic textual analysis feature – words – n-grams – positional and morphological features. – Named entities Syntactic features – Parts of speech – Parse tree and heads Structural features – First level classifier pass for determining document structure – Especially important on scanned documents where these features aren’t readily available jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 21 The/O buyer/O Acme/ORG Inc./ORG indemnify indemnify Client shall indemnify N V V Section III: Miscellaneous 1. Lorem ipsum dolor a. sit amet, consectetur
  • 22. Hunting for Training Data All your customer’s data is confidential – Redacted contracts – Mine the SEC Expense of lawyer-labeled training data – Bootstrapping – Co-training with different feature sets – Active learning jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 22
  • 23. Hacks and Special Cases Very useful, but boring Formatting fixes specific to legal documents – ALL CAPS – Handling of amendments – Handwritten signature blocks Hand crafted rules very good for high-precision heuristics—customers expect the software not to miss “easy” provisions. jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 23
  • 24. I, Robot, Esquire - Overview Motivation Can we use ML and NLP? eBrevia Solution – Deep Dive Challenges and Lessons Learned Future directions jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 24
  • 25. The Audacity of Keywords Seemingly-reliable keywords, aren’t jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 25 Phrase Likelihood that candidate phrase is relevant Likelihood candidate phrase is irrelevant “Change [of|in] Control” 48.4% 51.6% “13(d) and 14(d)” 98.7% 1.3% A simple keyword based search with an obvious keyword wouldn’t even get us to 50% precision! Conversely, a human would have never discovered this reliable trigram heuristic.
  • 26. The Tyranny of Paper Lawyers still have a lot of paper – over 50% of the documents uploaded to our system are scans. OCR on poor quality scans works poorly for keyword searching but decently with ML, with properly constructed features. jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 26
  • 27. Welcoming our Robot Lawyer Overlords jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 27 “[eBrevia’s software] cuts down significantly on time by performing 50-60% of the work up front and then you work from there.” – NY law firm partner “Your product is a great fit for our firm’s approach to practicing law.” – Partner, national law firm
  • 28. User Interface Notes jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 28
  • 29. User Interface Notes jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 29 Highlight in original, formatted document Cross-referencing, editing, and corrections
  • 30. User Interface Notes jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 30 Additional critical features Quick Correction Level of confidence indications (similar to Google voice transcription) Good generic text search features to make human review easy
  • 31. I, Robot, Esquire - Overview Motivation Can we use ML and NLP? eBrevia Solution – Deep Dive Challenges and Lessons Learned Future directions jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 31
  • 32. Current Research and Future Directions Coreference resolution: intra- and inter document. Useful for doc references, and entity references. Machine learning for document cross-referencing and definition resolution. Automatic summarization of longer provisions to provide quick overviews. Understanding the lineage of a document – where its various pieces came from, and how they were changed. jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 32
  • 33. Feedback Learning from Lawyers Some lawyers are just bad Noise is NOT random – They fall for the same “trap” – They’re often bad in the same way jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 33 So can’t use noise-tolerant learning algorithms to deal with this. Consensus models, model user reputation/ability
  • 34. Current Research and Future Directions Other upcoming applications for eBrevia’s technology: Contract management Document drafting Lease abstraction Financial/Compliance Consumer applications jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 34
  • 35. Thank You – Contact Info jmundt@ebrevia.com | (203) 870-3000 Proprietary & Confidential 35