Near duplicate identification, or ‘NearDup’, is a critically important eDiscovery function that can drastically increase the speed and quality of your review by grouping similar documents, maintaining email threads, retrieving unmarked ‘hot’ documents, and preventing the inadvertent release of critical privileged information. As document collections continue to grow, so does the risk of missing key documents, inconsistently coding productions, and releasing privileged data.
http://www.lexbe.com/resources/ediscovery-webinars/best-practices-neardup/?LEX=slideshare
For a complete listing of our free onDemand presentations see our Lexbe eDiscovery Library here: http://lexbe.com/resources/ediscovery-webinars/?LEX=slideshare
The Lexbe eDiscovery Library is an educational resource including webinars, presentations, MP3 podcasts and other materials covering a variety of practical subjects involving legal document management and e-Discovery. Access is available and free on registrations to lawyers, litigation paralegals and legal assistants, in-house counsel, litigators, litigation support, eDiscovery IT, Information Technology, litigation support and related professionals, who are involved in legal document management, electronic discovery, deposition or trial preparation.
To receive notices of future live and on-Demand webinars as part of the Lexbe eDiscovery Webinar Series, please email us at webinars@lexbe.com or follow-us on LinkedIN: https://www.linkedin.com/company/lexbe
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Lexbe eDiscovery Webinar- Best Practices: NearDup
1. Best Practices: NearDup
Gene Albert
Principal, Lexbe LC
Using Near Duplicate ID to Detect Key Docs, Protect Privilege
& Speed Reviews
July 17, 2014
2. eDiscovery Webinar Series
○ Takes Place Monthly
○ Cover a Variety of Relevant eDiscovery Topics
○ Presentations Available for Download by Registrants.
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
Info
3. eDiscovery Webinar Series
Lexbe is an Austin, TX based eDiscovery software and services provider.
○ Lexbe eDiscovery Platform
Lexbe eDiscovery Platform is a hosted eDiscovery processing and review
tool. Users can load a variety of file types, process for review, OCR for
search, and conduct document reviews, productions, prepare for depos
& analyze transcripts, conduct case analytics, prepare for dispositive
motions, and provide litigation support during trial.
○ Lexbe eDiscovery Services
Lexbe performs large volume document culling, processing from native
to PDF or TIFF, load file creation, high-volume OCR of image files, Rule
26 and project management consulting, and related eDiscovery Services.
About Lexbe
Lexbe Sales
sales@lexbe.com
(800) 401-7809 x22
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
4. If you have any questions or technical issues, please e-mail them to:
webinars@lexbe.com
Questions will be forwarded to Gene and answered during the webinar or via
e-mail if we run out of time.
eDiscovery Webinar Series
Questions & Technical Issues
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
5. ○ Principal of Lexbe LC, a provider of cloud-based litigation review and document
management software & eDiscovery services.
○ Prior business experience in software, medical services and internet-based
businesses. Prior legal experience as in-house counsel and in private practice.
○ Frequent speaker and author on eDiscovery and legal technology issues.
○ Education
MBA, University of Texas (2005)
JD, Southern Methodist University (1983)
BA, University of Texas (1979)
○ Contact
Gene Albert
512-686-3460
gene@lexbe.com
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
eDiscovery Webinar Series
Gene Albert Bio
6. Near Duplicate Detection
○ What is Near Duplicate Identification?
○ When is ‘NearDup’ Needed?
○ Inadvertent Privilege Release Example
○ Using ‘NearDup’ to:
■ Group Similar Documents
■ Find More Key Documents
■ Enable Email Threading
■ Prevent the Inadvertent Release of Privileged Information
○ NearDup Groupings+ service options from Lexbe
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
Agenda
7. What Is It?
○ NearDup technology automatically recognizes similar
documents within an e-discovery document collection
○ Algorithm analyzes, evaluates and compares the
actual text content of the documents to each other
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
Near Duplicate Detection
Unstructured Documents NearDup Groupings
8. What Does It Do?
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
Near Duplicate Detection
NearDup technology will group similar documents, even though not
exactly the same. Examples include:
○ Separately scanned documents.
○ Multiple versions of a Word document that are slightly different
due to minor edits, reformatting, etc.
○ An original document and one with handwritten notes on it.
○ Emails and responses that continue a conversational ‘chain’ or
‘thread’.
9. Data Types and Volume Keep Growing
Digital Information Created, Captured, Replicated Worldwide4
3
2
1
2005 2010 2015
Source: IDC Digital Universe Study (2012)
* 1 Zettabyte = 1 Trillion Gigabytes
Zettabytes*
2.8 zettabytes of information were created
and replicated during 2012, a 56% increase
from 2011 (IDC)
Voip
Email
iPhones
Peer-to-Peer
Online Storage
Digital Cameras
Facebook | LinkedIn
DropBox | Backup Devices
Elastic Storage | SaaS | Google Streets
Personal Blogs | Skype | World Satellite Images
Personal Scanners | Customer Service Recordings
Public Webcams | Google Goggles | Netbooks | Cloud Instance Servers | PaaS
Need for Near Duplicate Detection
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
10. Main Applications of NearDup
There are 4 main applications of NearDup analysis:
1) Grouping similar documents:
○ Bunch highly similar documents together for more efficient coding
and review
2) Finding hidden ‘key’ or ‘hot’ docs:
○ Retrieve and mark unseen documents that have content highly
related to existing ‘hot’ or ‘key’ documents
3) Preventing the inadvertent release of privileged information
○ Be automatically alerted to files containing similar content to
documents that have already been coded as privileged
4) Enable email threading:
○ Maintain relationships between email conversations
Do I Need Near Duplicate Detection?
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
11. Applying Near Duplicate Detection
Large Groupings Accelerate Review
Feature Description
Report identifies Near Dup Groups in
a case based on extracted or OCRed
text
Benefits
⃝ Accelerate document review by
batch coding (using multidoc edit)
larger groups
⃝ Increase coding consistency of
batched documents
⃝ Reduce privilege errors
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
12. Applying Near Duplicate Detection
Find Similar Versions of Key Documents
Example
Similar versions of a Key
Document are shown in
the Document Viewer
Benefits
⃝ Follow the trail from one key document to others.
⃝ Find key documents that would otherwise be missed
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
13. Prevent Inadvertent Privilege Release
Setup &
Planning
Collection
Culling &
Analysis
Processing
Depos &
Motions
Review &
Production
Beware of Inadvertent Privilege Release
○ Larger cases have put a strain on accurate privilege review.
○ Finding 9 versions of a privileged document doesn’t help if you
release version 10.
○ Nothing is more costly than compromising or losing a case
because of privilege disclosure.
○ Claw-back agreements a good idea, but no panacea.
“You can’t unring a bell.”
Applying Near Duplicate Analysis
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
14. Prevent Inadvertent Privilege Release
Applying Near Duplicate Analysis
Example Case:
Thorncreek Apartments III, LLC v. Village of Park Forest (N.D. Ill.
2011)
○ At issue were six documents produced by Defendants to
Plaintiffs, but attorney-client privilege was claimed
○ Court determined that the Defendants were negligent by failing
to check the production database created by a third-party e-
discovery vendor before it became available to opposing counsel
○ Court found waiver, relying in part on long period of time after
production before attempting to clawback documents and failure
to timely prepare a privilege log.
○ Even if the court allowed clawback, the sensitive information
would have already been disseminated.
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
15. Prevent Inadvertent Privilege Release
Setup &
Planning
Collection
Culling &
Analysis
Processing
Depos &
Motions
Review &
Production
Minimizing Risk of Privilege Release
○ Understand the Privilege Review process undertaken in detail.
○ Build dictionary of privileged sources and issues early in doc review.
○ Check for: untrained or sloppy review; unsearchable documents;
incomplete search indices; poor redaction procedures; search not done
in metadata and full-text; privilege text retained in natives, text files,
load files, text-based PDFs.
○ Use specialized computerized privilege checks for container
(email family) consistency, exact-dup and near-dup
identification.
Applying Near Duplicate Analysis
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
16. Prevent Inadvertent Privilege Release
Example
⃝ Privileged documents
found 9 out 10 times,
but one missed
Benefit
⃝ Find privileged documents with text similarity
that can be easily missed otherwise
Applying Near Duplicate Detection
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
17. Applying Near Duplicate Detection
Catch Privilege Inconsistencies
Feature Description
Report identifies inconsistently coded
privilege and work product codings
Benefits
⃝ Reduce privilege errors
⃝ Avoid sole reliance on human
coding consistency
⃝ Establish safeguards to help
maintain privilege
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
18. Applying Near Duplicate Detection
Email Threading
Feature Description
Group email messages that have
similar text representing a
conversation thread
Benefits
⃝ View email chains with similar text
in date & time order
⃝ Avoid confusion of emails only
tangentially related (<50% text
overlap)
⃝ Consistently code email chains for
responsiveness, privilege, attorney-
eyes only, etc.
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
19. Included with Lexbe eDiscovery Platform
Applying Near Duplicate Analysis
○ Near Duplicate Identification is included at no
additional cost in Lexbe eDiscovery Platform.
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
○ You can automatically apply ‘NearDup’ to documents you self-
upload into the platform to group similar documents and
review for privilege coding consistency.
20. Applying ‘NearDup’ in The Cloud
Lexbe eDiscovery Platform
● Self-administration
● Native (Office, etc.) processing
● Automatic OCR
● Early case analysis
● Dual-index search
● Exact & near-dup ID
● Doc Review & issue tagging
● Blended productions
● Transcript management
● Timelining, depo prep
● Dispositive motions
● Trial document management
Cloud-based litigation document
management software
FEATURES
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
21. Included in Processing Services
Applying Near Duplicate Analysis
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
We apply NearDup Groupings+ to the following processing services
at no additional charge:
○ Native Processing+ (TIFF)
Convert Outlook, Microsoft Office, and other native file types for
review in in-house TIFF-based systems
○ Native Processing+ (PDF)
Convert Outlook, Microsoft Office, and other native file types
into searchable PDFs for review
○ Native Extraction+
Prepare case data for native or near native review
22. Security & Data Ownership
What to look for in litigation cloud service offerings:
○ Encryption
Data encrypted (256-bit or above) in-place and in-transit.
○ Data Center Certifications
Data centers should be certified, follow industry best standards, etc.
○ Clear Ownership Rights
Service agreements should clearly acknowledge client data ownership.
○ Redundant Back-Ups; Recovery
Service provider should have robust and redundant backup & recovery protocols.
Applying ‘NearDup’ in The Cloud
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
23. Summary
Use ‘NearDup’ to Improve Doc Reviews
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014
○ Faster Review
Group Incoming Documents by Similarity for faster, more
efficient coding.
○ Find Hot Docs
Find hidden ‘hot’ documents with similar content to files you’
ve already marked as being particularly important to a case.
○ Prevent Privilege Release
Identify documents containing privileged information that
haven’t been consistently tagged before producing them to
opposing counsel
○ Better Email Review
Easily and coherently review through email conversations
threads with different custodian sources.
24. Thank You
Contact Info
Gene Albert:
Lexbe Principal
gene@lexbe.com
(512) 686-3382
Stu Van Dusen:
Marketing Manager
svandusen@lexbe.com
(512) 843-7672
Lexbe Sales: sales@lexbe.com
(800) 401-7809 x22
Webinar Questions: webinars@lexbe.com
Best Practices: ‘NearDup’ Identification | eDiscovery Webinar Series | July 17, 2014